| Title: | Version Management Tools on the File System |
|---|---|
| Description: | Data version management on the file system for smaller projects. Manage data pipeline outputs with symbolic folder links, structured logging and reports, using 'R6' classes for encapsulation and 'data.table' for speed. Directory-specific logs used as source of truth to allow portability of versioned data folders. |
| Authors: | Sam Byrne [aut, cre, cph] (ORCID: <https://orcid.org/0009-0008-1067-307X>) |
| Maintainer: | Sam Byrne <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.1 |
| Built: | 2026-05-24 07:45:43 UTC |
| Source: | https://github.com/cran/vmTools |
Assert a directory exists on disk
assert_dir_exists(x)assert_dir_exists(x)
x |
[chr] A directory path |
[none] stop if assertion fails
Other assertions:
assert_named_list(),
assert_scalar(),
assert_scalar_not_empty(),
assert_type()
Stops if:
x is not a list
x is a data.frame
x has no names
x has any NA names
x has any zero-length names
x has any whitespace-only names
assert_named_list(x)assert_named_list(x)
x |
[list] List to check |
[none] stop if assertion fails
Other assertions:
assert_dir_exists(),
assert_scalar(),
assert_scalar_not_empty(),
assert_type()
Assert an element is atomic and length 1
assert_scalar(x)assert_scalar(x)
x |
[any] Element to check |
[none] stop if assertion fails
Other assertions:
assert_dir_exists(),
assert_named_list(),
assert_scalar_not_empty(),
assert_type()
Assert x is a scalar, and not empty in some way
assert_scalar_not_empty(x)assert_scalar_not_empty(x)
x |
[any] some object to check |
[none] stop if assertion fails
Other assertions:
assert_dir_exists(),
assert_named_list(),
assert_scalar(),
assert_type()
Assert an object is a scalar of a certain type
assert_type(x, type)assert_type(x, type)
x |
[any] Object to check |
type |
[chr] Type to check against |
[none] stop if assertion fails
Other assertions:
assert_dir_exists(),
assert_named_list(),
assert_scalar(),
assert_scalar_not_empty()
Wrapper utility for sanitizing file.path(...) output
clean_path(..., normalize = TRUE, mustWork = FALSE)clean_path(..., normalize = TRUE, mustWork = FALSE)
... |
[chr] paths passed to file.path() |
normalize |
[lgl] pass path to normalizePath()? |
mustWork |
[lgl] passed to normalizePath() |
[chr] full file paths with consistent platform-specific structure
clean_path(tempdir(), "/some/other/path/") # build a single path like file.path clean_path(c(".", tempdir(), "/some/other/path/")) # vectorizedclean_path(tempdir(), "/some/other/path/") # build a single path like file.path clean_path(c(".", tempdir(), "/some/other/path/")) # vectorized
Print a directory tree to stdout
dir_tree(path = ".", level = Inf, prefix = "")dir_tree(path = ".", level = Inf, prefix = "")
path |
[chr] The path to the directory to print |
level |
[int] The maximum depth to print |
prefix |
[chr] The prefix to add to each line |
Used only for signaling/messaging
find_latest_output_dir(root)find_latest_output_dir(root)
root |
[chr] path to root of output results |
[chr] path to latest output directory
Cross platform helper to find number of cores
find_n_cores()find_n_cores()
[int]
directories are assumed to be named in YYYY_MM_DD.VV format with sane year/month/date/version values.
get_latest_output_date_index(dir, date)get_latest_output_date_index(dir, date)
dir |
[chr] path to directory with versioned dirs |
date |
[chr] character in YYYY_MM_DD format |
[int] largest version in directory tree or 0 if there are no version OR the directory tree does not exist
Return on the date-version, not the full path. Does not create a folder.
get_new_version_name(root, date = "today")get_new_version_name(root, date = "today")
root |
[chr] path to root of output results |
date |
[chr] character date in form of "YYYY_MM_DD" or "today". "today" will be interpreted as today's date. |
[chr] new output version of the form "YYYY_MM_DD.VV"
get_new_version_name(root = tempdir(), date = "today") # expect "YYYY_MM_DD.01"get_new_version_name(root = tempdir(), date = "today") # expect "YYYY_MM_DD.01"
Determine if an object is an error
is_an_error(x)is_an_error(x)
x |
[obj] some R object |
[lgl] TRUE / FALSE
Other validations:
validate_dir_exists(),
validate_not_empty()
Is the current OS windows
is_windows()is_windows()
[lgl]
If running on windows, check if the user has admin privileges
is_windows_admin()is_windows_admin()
[lgl] TRUE if the user in on a windows OS and has admin privileges, FALSE otherwise
Very simple replacement for purrr::map_depth to remove package dependency, but not very robust. Internal package use only in select cases.
lapply_depth(.x, .depth, .f, ...)lapply_depth(.x, .depth, .f, ...)
.x |
[list] List to apply function to |
.depth |
[integer] Depth to apply function at |
.f |
[function] Function to apply |
... |
[any] Additional arguments to pass to .f |
[list] List with function applied at target depth
Symlink Tool custom print method
## S3 method for class 'Symlink_Tool' print(x, ...)## S3 method for class 'Symlink_Tool' print(x, ...)
x |
[Symlink_Tool] The SLT class |
... |
[any] Additional arguments to 'print()' |
[stdout]
SLTSLT
Class for lightweight file-system level data versioning, logs and reports without need for a database.
new()
Initialize the SymlinkTool object - an R6 class
The constructor function.
SLT$new( user_root_list = NULL, user_central_log_root = NULL, schema_repair = TRUE, verbose = TRUE, verbose_startup = FALSE, csv_reader = "fread_quiet", timezone = Sys.timezone() )
user_root_list[list] Named list of root directories for pipeline outputs. This is where 'version_name' folders live - these are iterative runs of an analysis pipeline.
user_central_log_root[path] Root directory for the central log. If you have multiple roots in the 'user_root_list', you probably want the central log to live one level above those roots.
schema_repair[logical] Default 'TRUE'. If 'TRUE', the tool will attempt to repair any schema mismatches it finds in the logs when reading and writing e.g. add new columns if the tool schema has columns that existing logs do not. If 'FALSE', the tool will stop and throw an error if it finds a schema mismatch.
verbose[lgl: default TRUE] control message verbosity - if TRUE, standard message, if FALSE, warn only if something is irregular.
verbose_startup[lgl] see start up warnings, if relevant?
csv_reader[chr] The CSV reader to use (also assigns matching CSV writer). CAUTION: DO NOT USE 'data.table::fread' if you have any quotation marks (") in log comments (these lead to exploding series of quotations). https://github.com/Rdatatable/data.table/issues/4779. Otherwise use 'read.csv[2]'. Options:
fread_quiet - 'data.table::fread' and suppress warnings (default)
fread - 'data.table::fread'
read.csv - 'utils::read.csv' - safer
read.csv2 - 'utils::read.csv2' - safer, comma as decimal point, semicolon as field separator
timezone[chr] Default 'America/Los_Angeles'. The timezone to use for datestamps in logs. Must be a valid 'OlsonNames()' string.
[symlink_tool] A symlink tool object. You can instantiate a.k.a. create multiple objects, each of which has different roots and central logs.
try(SLT$new()) # call with no arguments to see instructions # Tool will not instantiate on Windows unless running with Admin permissions # - requirement for symlink creation on Windows
return_dictionaries()
Return the contents of all private dictionaries.
SLT$return_dictionaries(item_names = NULL)
item_names[chr] Default 'NULL'. If 'NULL', show all static internal fields. Otherwise, vector of static field names you want to see.
[list] of all static internal fields
return_dynamic_fields()
Print the contents of all dynamic fields.
SLT$return_dynamic_fields(item_names = NULL)
item_names[chr] Default 'NULL'. If 'NULL', show all dynamic internal fields. Otherwise, vector of dynamic field names you want to see.
[std_out] Print dynamic field values to std_out.
mark_best()
Mark an output folder with a "best" symlink.
Enforces: - maximum of one best model - does not go back through history to make a best model from a prior version (not capable, this is what log_tool is for)
Writes: - appends to a log file in the output folder with a date and time stamp - appends a line to the central log file with a date and time stamp
SLT$mark_best(version_name, user_entry)
version_name[chr] The directory name of the output folder that lives directly under one of the 'root's you define when you instantiate the tool.
user_entry[list] Named list of user-defined fields to append to the log. After making a tool called e.g. slt, call 'slt$return_dictionaries("log_fields_user")' to find which fields a user may add. If you want to make your own version of this class, you may update 'log_schema' in the 'private$DICT' section to allow for them.
[ste_err] Messages about actions taken.
mark_keep()
Mark an output folder with a "keep_<version_name>" symlink
Writes: - appends to a log file in the output folder with a date and time stamp - appends a line to the central log file with a date and time stamp
SLT$mark_keep(version_name, user_entry)
version_name[chr] The directory name of the output folder that lives directly under one of the 'root's you define when you instantiate the tool.
user_entry[list] Named list of user-defined fields to append to the log. After making a tool called e.g. slt, call 'slt$return_dictionaries("log_fields_user")' to find which fields a user may add. If you want to make your own version of this class, you may update 'log_schema' in the 'private$DICT' section to allow for them.
[std_err] Messages about actions taken.
mark_remove()
Mark an output folder with a "remove_<version_name>" symlink
Indication that the results can be deleted - In the future, this will be used to remove old versions of the output, and provide a list of ST-GPR models to delete
Writes: - appends to a log file in the output folder with a date and time stamp - appends a line to the central log file with a date and time stamp
SLT$mark_remove(version_name, user_entry)
version_name[chr] The directory name of the output folder that lives directly under one of the 'root's you define when you instantiate the tool.
user_entry[list] Named list of user-defined fields to append to the log. After making a tool called e.g. slt, call 'slt$return_dictionaries("log_fields_user")' to find which fields a user may add. If you want to make your own version of this class, you may update 'log_schema' in the 'private$DICT' section to allow for them.
[std_err] Messages about actions taken.
unmark()
Remove all symlinks for a single 'version_name' in all 'roots'
Writes: - appends to a log file in the output folder with a date and time stamp - does _not_ append to the central log file
SLT$unmark(version_name, user_entry)
version_name[chr] The directory name of the output folder that lives directly under one of the 'root's you define when you instantiate the tool.
user_entry[list] Named list of user-defined fields to append to the log. After making a tool called e.g. slt, call 'slt$return_dictionaries("log_fields_user")' to find which fields a user may add. If you want to make your own version of this class, you may update 'log_schema' in the 'private$DICT' section to allow for them.
[std_err] Messages about the symlinks removed.
roundup_best()
Find all 'best_' symlinks in all 'roots'
Return both the symlink and the resolved symlink (folder the symlink points to)
SLT$roundup_best()
[list] list of data.tables - one for each 'root'
roundup_keep()
Find all 'keep_' symlinks in all 'roots'
Return both the symlink and the resolved symlink (folder the symlink points to)
SLT$roundup_keep()
[list] list of data.tables - one for each 'root'
roundup_remove()
Find all 'remove_' symlinks in all 'roots'
Return both the symlink and the resolved symlink (folder the symlink points to)
SLT$roundup_remove()
[list] list of data.tables - one for each 'root'
roundup_unmarked()
Find all folders without symlinks in all 'roots'
Useful if you're rapidly iterating, have only marked a couple folders, and want to remove the rest.
SLT$roundup_unmarked()
[list] list of data.tables - one for each 'root'
roundup_by_date()
Find all 'version_name' folders by creation date
Only finds folders that _have a log_, and reads creation date on first row. User may select dates by (using the 'date_selector' argument): - greater than - 'gt' - greater than or equal to - 'gte' - less than - 'nt' - less than or equal to 'nte' - equal to 'e'
SLT$roundup_by_date(user_date, date_selector)
user_date[c("character", "Date", POSIXct", "POSIXt")] A date with class requirements - must be formatted "2020-01-01 or 2020_01_01 or 2020/01/01"
date_selector[chr] See docstring explanation.
[list] list of data.tables - one for each 'root'
get_common_new_version_name()
Get a new YYYY_MM_DD.VV version compatible with _ALL THE TOOL'S ROOTS_
If root1 has 2025_01_01.01 and root2 has 2025_01_01.03, then a new folder would need to be 2025_01_01.04
SLT$get_common_new_version_name(date = "today", root_list = private$DICT$ROOTS)
date[chr] Default "today". The date to use for the new version name. Must be formatted "2020_01_01"
root_list[list] named list of root directories for pipeline
[chr] format YYYY_MM_DD.VV
make_new_version_folder()
Create a new ‘version_name' folder in _ALL THE TOOL’S ROOTS_
Create a new log in each folder. No symlinks are created. No 'user_entry' is used.
SLT$make_new_version_folder(version_name = self$get_common_new_version_name())
version_name[chr] The directory name of the output folder that lives directly under one of the 'root's you define when you instantiate the tool. For convenience, user may leave NULL (default) and 'get_common_new_version_name()' is used on that root.
[std_err] Messages about the folder creation.
make_new_log()
Safely write an empty log file for first pipeline runs
When you start a new pipeline run, make an empty log - helpful if you let this tool manage all your versions - you can roundup version_names by creation date using the log's first entry - the file system doesn't track directory creation dates (at time of writing)
SLT$make_new_log(version_name)
version_name[chr] The directory name of the output folder that lives directly under one of the 'root's you define when you instantiate the tool.
[std_err] Messages about the log creation.
delete_version_folders()
Delete a 'version_name' folder marked with a 'remove_' symlink from _ALL ITS ROOTS_
Removes the symlink(s) and the underlying folder(s), and updates central log if folders were removed.
Writes: - appends a line to the central log file with a date and time stamp
SLT$delete_version_folders(version_name, user_entry, require_user_input = TRUE)
version_name[chr] The directory name of the output folder that lives directly under one of the 'root's you define when you instantiate the tool.
user_entry[list] Named list of user-defined fields to append to the log. After making a tool called e.g. slt, call 'slt$return_dictionaries("log_fields_user")' to find which fields a user may add. If you want to make your own version of this class, you may update 'log_schema' in the 'private$DICT' section to allow for them.
require_user_input[lgl] if 'TRUE', will prompt user to confirm deletion.
[std_err] Messages about deletion events.
make_reports()
Make all reports
Writes all reports to a summary .csv for every 'root' defined in the tool.
SLT$make_reports()
[std_err] Messages about where reports were written.
clone()
The objects of this class are cloneable with this method.
SLT$clone(deep = FALSE)
deepWhether to make a deep clone.
## ------------------------------------------------ ## Method `SLT$new` ## ------------------------------------------------ try(SLT$new()) # call with no arguments to see instructions # Tool will not instantiate on Windows unless running with Admin permissions # - requirement for symlink creation on Windows## ------------------------------------------------ ## Method `SLT$new` ## ------------------------------------------------ try(SLT$new()) # call with no arguments to see instructions # Tool will not instantiate on Windows unless running with Admin permissions # - requirement for symlink creation on Windows
Split a character vector by line breaks
split_line_breaks(string)split_line_breaks(string)
string |
[chr] A character vector. |
All elements of the character vector are split by '\n' into new elements.
Validate whether a directory exists
validate_dir_exists(x, verbose = TRUE)validate_dir_exists(x, verbose = TRUE)
x |
[path] A directory path |
verbose |
[lgl] message to std_out? |
[lgl] TRUE if directory exists, FALSE otherwise
Other validations:
is_an_error(),
validate_not_empty()
Designed to also catch missing args when called inside a function.
validate_not_empty(x)validate_not_empty(x)
x |
[any] some argument to check |
[lgl] FALSE if empty in some way, TRUE otherwise
Other validations:
is_an_error(),
validate_dir_exists()