Design Principles for {epiparameter}

This vignette outlines the design decisions that have been taken during the development of the {epiparameter} R package, and provides some of the reasoning, and possible pros and cons of each decision.

This document is primarily intended to be read by those interested in understanding the code within the package and for potential package contributors.

Scope

The {epiparameter} R package is a library of epidemiological parameters, and provides a class (i.e. data structure) and helper functions for working with epidemiological parameters and distributions. The <epiparameter> class is the main functional object for working with epidemiological parameters and can hold information on delay distributions (e.g. incubation period, serial interval, onset-to-death distribution) and offspring distributions. The class has a number of methods, including allowing the user to easily calculate the PDF, CDF, and quantile, generate random numbers, calculate the distribution mean, and plot the distribution. An <epiparameter> object can be created with the constructor function epiparameter(), and if uncertain whether an object is an <epiparameter>, it can be validated with assert_epiparameter().

The package also converts distribution parameters to summary statistics, and vice versa. This is achieved either by conversion or extraction and both of these methods and the functions used are explained in the Parameter extraction and conversion in {epiparameter} vignette.

Output

The output of the epiparameter() constructor function is an <epiparameter> object. This is a list of nine elements, where each element is either a single type (e.g. character), a non-nested list or another class. Classes as <epiparameter> elements are used when there is existing well developed infrastructure for handling certain data types. The $prod_dist element uses a distribution class – if there is a parameterised distribution available – using either the <distribution> class from {distributional} or the <distcrete> class from {distcrete}. The $citation is handled using the <bibentry> class from the {utils} package (included as part of the base R recommended packages).

Other functions return the simplest type possible, this may be an atomic vector (including single element vectors), or un-nested lists.

Package architecture

Much of the {epiparameter} package is centred around the <epiparameter> class. Here is a diagram showing the class with it’s S3 methods (the diagram below is interactive so can adjusted if labels are overlapping).

#> This diagram is out of date, as new methods have been added to the package which are not included.

Design decisions

  • The <epiparameter> class is designed to be a core unit for working with epidemiological parameters. It is designed in parallel to other epidemiological data structures such as a the <contactmatrix> class from the {contactmatrix} R package. The design principles of the <epiparameter> class are aligned with the <contactmatrix> design principles. These include:

    • A new_*<class>() constructor
    • Two validation functions
      • assert_<class>()
      • test_<class>()
    • An is_<class>() checker to determine if an object is of a given class (without checking the validity of class)
    • Coercion generic as_<class>().
  • The conversion functions (convert_*) are S3 generic functions with methods provided by {epiparameter} for character and <epiparameter> input. This follows the design pattern of other packages, such as {dplyr}, which export their key data transformation functions as S3 generics to allow other developers to extend the conversions to other data objects.

  • The conversion functions are designed to have a single function exported to the user for summary statistics to parameters, and another function exported for parameters to summary statistics. These functions use a switch() to dispatch to the internal conversion functions. This provides a minimal number of conversion functions in the package namespace compared to exporting a conversion function for every distribution.

  • If there are a large number of entries returned when reading epidemiological parameters from the library using the epiparameter_db() function, it can flood the console, due to the default list printing in R. This is the reasoning for the <multi_epiparameter> object which is a minimal class to enable cleaner and more descriptive printing for a large list of <epiparameter> objects. The print.multi_epiparameter() prints a header with metadata on the number of <epiparameter> objects and number of diseases and epidemiological distributions in the list. It also lists all the diseases and epidemiological parameters returned. The footer of the print() function states the number of <epiparameter> objects not shown, guides to use print(n = ...) and parameter_tbl() and a link to the online database vignette (database.Rmd). Information in the header and footer considered metadata or advice is prefixed with #.

  • The package uses S3 classes and S3 dispatch for exported functions, and switch() and do.call() for dispatching to internal functions. This is because it is easier to develop and debug internal functions that do not use S3 dispatch and avoids having to ensure that S3 methods are registered. Examples of S3 dispatch for exported functions are get_parameters() and convert_summary_stats_to_params(). Examples of internal dispatch using switch() and do.call() are clean_params() and convert_params_to_summary_stats.character().

  • The function naming convention is for internal functions to have a dot (.) prefix (e.g. .convert_params_lnorm()). The only function that breaks with this convention is new_epiparameter() as advanced users of the package may want to call in the internal low-level constructor, and adding a dot prefix to this function may make it harder for users to find.

Dependencies

The aim is to restrict the number of dependencies to a minimal required set for ease of maintenance. The current hard dependencies are:

{stats} and {utils} are distributed with the R language so are viewed as a lightweight dependencies, that should already be installed on a user’s machine if they have R. {checkmate} is an input checking package widely used across Epiverse-TRACE packages. {distributional} and {distcrete} are used to import S3 classes for handling and working with distributions. Both are required as only {distcrete} can handle discretised distributions.

Currently {epiparameter} deviates from the Epiverse policy on the number of previous R versions it supports. The {epiparameter} package requires R version >= 4.1.0 which only includes the current version and the last three minor R versions rather than the policy of four minor versions, as of September 2024. The reasons for this change is to enable usage of the base R pipe (|>).