--- title: "Parameter extraction and conversion in {epiparameter}" output: bookdown::html_vignette2: code_folding: show bibliography: references.json link-citations: true vignette: > %\VignetteIndexEntry{Parameter extraction and conversion in {epiparameter}} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The reporting of parameters of distributions in the epidemiological literature is varied. In some instances the parameters themselves will be given, in others, summary statistics of the distribution are provided, such as mean, standard deviation, variance, among others. When working with epidemiological parameters it can be useful to have the values of the parameters for common parameterisations of the distribution. For example, shape and scale parameters for the gamma and Weibull distribution, and the meanlog and sdlog parameters for the lognormal distribution. In {epiparameter} we provide a range of conversion and extraction functions in order to obtain distribution parameters from summary statistics and vice versa. Here we differentiate between conversion and extraction as what can be analytically converted, as conversions, and those that require optimisation as extraction. This vignette aims to describe the functionality of these conversions and extractions provided by the {epiparameter} package. ::: {.alert .alert-info} ### Conversion versus extraction Use conversion when possible over extraction to avoid possible limitations associated with numerical optimisation used in the extraction function `extract_param()`. ::: ```{r setup} library(epiparameter) ``` ## Conversions There are two conversion functions in {epiparameter}: `convert_params_to_summary_stats()` and `convert_summary_stats_to_params()`. `convert_params_to_summary_stats()` converts one set of statistical distribution parameters to common summary statistics, and `convert_summary_stats_to_params()` converts summary statistics to a set of parameters. ### Conversion functions The conversion functions can take two types of inputs as the first argument: a `character` string of the distribution or an `` object. The conversion functions have two arguments. The first (`x`) defines which distribution you want to use and the second (`...`) lets you put as many named parameters or summary statistics as required. The arguments passed into `...` are matched by name, and therefore need to match exactly to the names expected. See the function documentation (`?convert_params_to_summary_stats` and `?convert_summary_stats_to_params` for names). In the case that an `` object is supplied, if it has the parameters or summary statistics required for conversion then nothing needs to be given as extra arguments (i.e. `...`). All currently supported summary statistic conversions in {epiparameter} are given below for each distribution. #### Gamma distribution #### Using a `character` string to name distribution ```{r} convert_params_to_summary_stats("gamma", shape = 2.5, scale = 1.5) ``` ```{r} convert_summary_stats_to_params("gamma", mean = 2, sd = 2) convert_summary_stats_to_params("gamma", mean = 2, var = 2) convert_summary_stats_to_params("gamma", mean = 2, cv = 2) ``` #### Using an `` An example with parameters provided by `` ```{r} ep <- epiparameter( disease = "", pathogen = "", epi_name = "", prob_distribution = create_prob_distribution( prob_distribution = "gamma", prob_distribution_params = c(shape = 2.5, scale = 1.5) ) ) convert_params_to_summary_stats(ep) ``` An example with `` missing parameters and supplied through `...` ```{r} ep <- epiparameter( disease = "", pathogen = "", epi_name = "", prob_distribution = "gamma" ) convert_params_to_summary_stats(ep, shape = 2.5, scale = 1.5) ``` An example with summary statistics provided by `` ```{r} ep <- epiparameter( disease = "", pathogen = "", epi_name = "", prob_distribution = "gamma", summary_stats = create_summary_stats(mean = 3.75, sd = 2.37) ) convert_summary_stats_to_params(ep) ``` An example with `` missing summary statistics and supplied through `...` ```{r} ep <- epiparameter( disease = "", pathogen = "", epi_name = "", prob_distribution = "gamma" ) convert_summary_stats_to_params(ep, mean = 3.75, sd = 2.37) ``` The usage of `` is not repeated for every distribution conversion available in {epiparameter}. The conversions shown below for each distribution are also available when using an `` object, by either having the parameters or summary statistics stored in the `` or supplied via named arguments. #### Lognormal distribution ```{r} convert_params_to_summary_stats("lnorm", meanlog = 2.5, sdlog = 1.5) ``` ```{r} convert_summary_stats_to_params("lnorm", mean = 2, sd = 2) convert_summary_stats_to_params("lnorm", mean = 2, var = 2) convert_summary_stats_to_params("lnorm", mean = 2, cv = 2) convert_summary_stats_to_params("lnorm", median = 2, sd = 2) convert_summary_stats_to_params("lnorm", median = 2, var = 2) ``` #### Weibull distribution ```{r} convert_params_to_summary_stats("weibull", shape = 2.5, scale = 1.5) ``` ```{r} convert_summary_stats_to_params("weibull", mean = 2, sd = 2) convert_summary_stats_to_params("weibull", mean = 2, var = 2) convert_summary_stats_to_params("weibull", mean = 2, cv = 2) ``` #### Negative binomial distribution ```{r} convert_params_to_summary_stats("nbinom", prob = 0.5, dispersion = 0.5) ``` ```{r} convert_summary_stats_to_params("nbinom", mean = 1, sd = 1) convert_summary_stats_to_params("nbinom", mean = 1, var = 1) convert_summary_stats_to_params("nbinom", mean = 1, cv = 1) ``` #### Geometric distribution ```{r} convert_params_to_summary_stats("geom", prob = 0.5) ``` ```{r} convert_summary_stats_to_params("geom", mean = 1) ``` ## Extraction There are two methods of extraction implemented in {epiparameter}. One is to estimate the parameters given the values of two percentiles, and the other is to estimate the parameters given the median and the range of the data. Both of these extractions are implemented in the `extract_param()` function. Here we demonstrate extraction using percentiles. The `type` should be `"percentiles"`, the `values` are the values reported at the percentiles, given as a vector. The percentiles, given between 0 and 1, are specified as a vector in `percentiles`. The example below uses values 1 and 10 at the 2.5th and 97.5th percentile, respectively. ```{r} extract_param( type = "percentiles", values = c(1, 10), distribution = "gamma", percentiles = c(0.025, 0.975) ) ``` In the above example we estimate parameters of the gamma distribution, but extraction is also implemented for the lognormal, normal and Weibull distributions, but specifying `"lnorm"`, `"norm"` or `"weibull"`. A message is shown when running `extract_param()` to make the user aware that the estimates are not completely reliable due to the use of numerical optimisation. Rerunning the function to and finding the same parameters are returned indicates that they have successfully converged. This issue is mostly overcome by the internal setup of the `extract_param()` function which searches for convergence to consistent parameter estimates before returning these to the user. The alternative extraction, by median and range, can be achieved by specifying `type = "range"` and using the `samples` argument instead of the `percentiles` argument. When using `type = "percentiles"` the `samples` argument is ignored and when using `type = "range"` the `percentiles` argument is ignored. ```{r} extract_param( type = "range", values = c(10, 5, 15), distribution = "lnorm", samples = 25 ) ``` In the above section it was mentioned that `extract_param()` has an internal mechanism to check that the parameters have consistently converged to the same estimates over several optimisation iterations. The tolerance of this convergence and number of times the optimisation can be repeated is specified in the `control` argument of `extract_param()`. This is set by default (`tolerance = 1e-5` and `max_iter = 1000`), and thus does not need to be specified by the user (as shown in the above examples). In the case that the maximum number of optimisation iterations is reached, the calculation terminates returning the most recent optimisation result to the user along with a warning message. ```{r} # set seed to ensure warning is produced set.seed(1) # lower maximum iteration to show warning extract_param( type = "range", values = c(10, 1, 25), distribution = "lnorm", samples = 100, control = list(max_iter = 100) ) ``` The reasoning for the default maximum number of iterations is to limit the computation time and prevent the function cycling through optimisation routines without converging on a consistent answer. If the runtime is not important and parameter accuracy is paramount then the maximum number of iterations can be increased and tolerance decreased. These `control` settings work identically for extracting from percentiles or median and range. ### Use cases Here we present examples of published epidemiological parameters and distributions where the functions outlined above can be applied to get the parameters of the distribution. The 75th percentiles reported for a lognormal distribution from @nolenExtendedHumantoHumanTransmission2016 for the incubation period for mpox (monkeypox). ```{r, extract_param for monkeypox percentiles} # Mpox lnorm from 75th percentiles in WHO data extract_param( type = "percentiles", values = c(6, 13), distribution = "lnorm", percentiles = c(0.125, 0.875) ) ``` The median and range are provided by @thornhillMonkeypoxVirusInfection2022 for mpox, and we want to calculate the parameters of a lognormal distribution. ```{r, extract_param for monkeypox median and range} # Mpox lnorm from median and range in 2022: extract_param( type = "range", values = c(7, 3, 20), distribution = "lnorm", samples = 23 ) ``` ::: {.alert .alert-warning} ### Assuming distributions It can be the case that a study will report summary statistics for an unspecified distribution or just for the raw data. In these cases if a parameterised distribution is required for downstream analysis a functional, parametric, form may have to be assumed. If the distribution is a delay distribution (i.e. serial interval or incubation period) it can often be sensible to assume a right-skewed distribution such as: gamma, lognormal or Weibull distributions. These are also the most commonly fit distributions in epidemiological analysis of delay distributions. However, one should take care when assuming a distribution as this may drastically influence the interpretation and application of the epidemiological parameters. ::: @donnellyEpidemiologicalDeterminantsSpread2003 provides the mean and variance of the gamma distribution for the incubation period of SARS. This conversion can be achieved by using the general conversion function (`convert_summary_stats_to_params()`). ```{r, convert parameters} # SARS gamma mean and var to shape and scale convert_summary_stats_to_params("gamma", mean = 6.37, var = 16.7) ``` ## References