| Title: | Estimating Plant Pathogen Epidemiology Parameters from Laboratory Assays |
|---|---|
| Description: | Provides functions for estimating plant pathogen parameters from access period (AP) experiments. Separate functions are implemented for semi-persistently transmitted (SPT) and persistently transmitted (PT) pathogens. The common AP experiment exposes insect cohorts to infected source plants, healthy test plants, and intermediate plants (for PT pathogens). The package allows estimation of acquisition and inoculation rates during feeding, recovery rates, and latent progression rates (for PT pathogens). Additional functions support inference of epidemic risk from pathogen and local parameters, and also simulate AP experiment data. The functions implement probability models for epidemiological analysis, as derived in Donnelly et al. (2025), <doi:10.32942/X29K9P>. These models were originally implemented in the 'EpiPv' 'GitHub' package. |
| Authors: | Ruairi Donnelly [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-7642-0317>) |
| Maintainer: | Ruairi Donnelly <[email protected]> |
| License: | GPL-3 |
| Version: | 0.0.1 |
| Built: | 2026-05-29 06:05:41 UTC |
| Source: | https://github.com/cran/EpiPvr |
This function simulates the assay data of an AP experiment (calling the helper function AP_insect_simulator.r to simulate on a per insect basis).
AP_assay_simulator( assayinput_in, fixedComponent_in, WF_in, smarkpams_in, isVerbose = 0, virusTypeIn )AP_assay_simulator( assayinput_in, fixedComponent_in, WF_in, smarkpams_in, isVerbose = 0, virusTypeIn )
assayinput_in |
Numeric array with 3 rows (required): |
fixedComponent_in |
Numeric vector of 3 fixed durations in the order AAP, LAP, IAP; nb. one of these 3 will not be relevant for the sub-assay as it is varied in the assay (code -1) e.g. if AAP sub-assay |
WF_in |
Number of insect vectors in cohort (required). |
smarkpams_in |
Numeric vector: 4 elements representing virus rate parameters (hr^-1) (required):
|
isVerbose |
Binary (optional, default = 0). If 1, prints information from nested functions. |
virusTypeIn |
Character. Plant virus type, either |
A numeric array with 3 rows corresponding to assay structure and outcome: The first row specifies the assay variable durations, the second the reps per duration, the third the simulated number of infected test plants.
assay1=AP_assay_simulator( assayinput_in=rbind(c(6,7,8),rep(10,3),rep(0,3)), fixedComponent_in=c(10,-1,10),WF_in=5, smarkpams_in=rep(0.1,4),isVerbose=0,virusTypeIn='PT' )assay1=AP_assay_simulator( assayinput_in=rbind(c(6,7,8),rep(10,3),rep(0,3)), fixedComponent_in=c(10,-1,10),WF_in=5, smarkpams_in=rep(0.1,4),isVerbose=0,virusTypeIn='PT' )
This dataset contains simulated AP assay results for a PT virus (AAP LAP and IAP sub-assays)
ap_data_sim_PTap_data_sim_PT
A List that is a collection of 6 data objects and 4 attributes
Numeric 3 x n array for acquistion access sub-assay: variable duration; reps per duration, no. infected test plants per duration
Numeric 3 x n array for latent access sub-assay: variable duration; reps per duration, no. infected test plants per duration
Numeric 3 x n array for inoculation access sub-assay: variable duration; reps per duration, no. infected test plants per duration
Numeric 3 x 3 assay of fixed durations (FD): row 1, FDs for acquistion access sub-assay; row 2, FDs latent access sub-assay; row 3, FDs inoculation access sub-assay. nb diagonal entries must be '-1' indicating redundancy since these elements correspond to the variable durations
Numeric number of insects per plant
Character representing underlying virus
#'
Numeric: acquisition rate (h^-1); used to generate the data or '-1' if unknown
Numeric: inoculation rate (h^-1); used to generate the data or '-1' if unknown
Numeric: latent progression rate (h^-1); used to generate the data or '-1' if unknown
Numeric: insect recovery rate (h^-1); used to generate the data or '-1' if unknown
Simulated using AP_assay_simulator()
This dataset contains simulated AP assay results for an SPT virus (AAP and IAP sub-assays)
ap_data_sim_SPTap_data_sim_SPT
A List that is a collection of 5 data objects and 3 attributes
Numeric 3 x n array for acquistion access sub-assay: variable duration; reps per duration, no. infected test plants per duration
Numeric 3 x n array for inoculation access sub-assay: variable duration; reps per duration, no. infected test plants per duration
Numeric 2 x 2 assay of fixed durations (FD): row 1, FDs for acquistion access sub-assay; row 2, FDs acquistion access sub-assay. nb diagonal entries must be '-1' indicating redundancy since these elements correspond to the variable durations
Numeric number of insects per plant
Character representing underlying virus
#'
Numeric: acquisition rate (h^-1); used to generate the data or '-1' if unknown
Numeric: inoculation rate (h^-1); used to generate the data or '-1' if unknown
Numeric: insect recovery rate (h^-1); used to generate the data or '-1' if unknown
Simulated using AP_assay_simulator()
This function calculates epidemic probability from viral and local parameters according to inoculum state (calling the helper function solveInoculumStatesBP.r to calculate transition and fertility matrices)
calculate_epidemic_probability( numberInsects, start_intervl = (1/10), localParameters, virusParameters, thresh = 10^(-7) )calculate_epidemic_probability( numberInsects, start_intervl = (1/10), localParameters, virusParameters, thresh = 10^(-7) )
numberInsects |
Numeric: number of insect vectors per plant (required). |
start_intervl |
Numeric: initial guess for expected time in hours until next event (optional, default = 0.1). This is fine-tuned from within the function. Needed to calculate efficient time step i.e. largest value such that sum of probabilities of individual events < 1. |
localParameters |
A numeric vector of length 5 corresponding to event rates (day^-1) (required): The first element specifies the Vector dispersal rate, the second the Roguing rate, the third the Vector mortality rate, the forth the Harvesting rate, the fifth the Plant latent progression rate |
virusParameters |
Numeric vector of length 3 specifying virus parameters (day^-1) (required):
|
thresh |
Numeric: equilibrium condition for extinction probability (optional, default = 10^(-7)). If no inoculum state changes by more than thresh in a timestep then function assumes extinction probability has stabilised. |
Numeric vector of length n: epidemic probability for each of n inoculum states (where n=(numberInsects+1)*3-1).
Preprint reference
The models implemented in this function follow Donnelly et al. (2025, preprint), originally implemented in the EpiPv GitHub package.
Donnelly, R., Tankam, I. & Gilligan, C. (2025). "Plant pathogen profiling with the EpiPv package." EcoEvoRxiv, doi:10.32942/X29K9P.
When available, please cite the published version.
qm_out=calculate_epidemic_probability( numberInsects=3,start_intervl=(1/10), localParameters=rep(0.1,5), virusParameters=rep(0.1,3),thresh=10^(-7) )qm_out=calculate_epidemic_probability( numberInsects=3,start_intervl=(1/10), localParameters=rep(0.1,5), virusParameters=rep(0.1,3),thresh=10^(-7) )
Runs MCMC sampling using a precompiled Stan model for PT plant virus. Analyses PT access period data estimating 5 parameters: (alpha[1], beta[1],
gamma[1],mu[1],bd[1] - acquisition, inoculation, latent progression, insect recovery and insect lab mortality rates, respectively).
estimate_virus_parameters_PT( data, D_LSin = 50, D_numPtsPdin = 1, mcmcOptions = c(500, 1000), numChainsIn = 4, mc.parallel = 0 )estimate_virus_parameters_PT( data, D_LSin = 50, D_numPtsPdin = 1, mcmcOptions = c(500, 1000), numChainsIn = 4, mc.parallel = 0 )
data |
A list containing the AP experiment data for Stan (required). |
D_LSin |
Upper bound on insect vector lifespan in days sets the vector survival discretization (optional, default = 50). |
D_numPtsPdin |
Number of points per day of insect vector sets the vector survival discretization (optional, default = 1). |
mcmcOptions |
A numeric vector of length 2: The first element specifies the number of warm-up iterations (optional, default = 500), and the second element specifies the total number of iterations (optional, default = 1000). |
numChainsIn |
Numeric: number of Markov chains (optional, default = 4). |
mc.parallel |
Binary: whether to use parallelisation for sampling (optional, default = 0, i.e. 1 core only). |
Interpreting model output - check stability first
Warnings will be printed to the screen if there are issues with model fitting.
Do not suppress warnings (e.g., via suppressWarnings()), as they contain essential information about
potential convergence problems.
Before interpreting any model results, always check the following core diagnostics:
R-hat - Should be close to 1 for all parameters; larger values indicate non-convergence (Stan R-hat documentation).
Effective Sample Size (n_eff) - Very low values suggest high autocorrelation or insufficient sampling (Stan ESS documentation).
Divergent transitions - Count should be 0; any non-zero count requires investigation (Stan divergences documentation).
Forward Simulation from Fitted Model (simulated_data) - Presented conveniently for plotting to assess posterior predictive fit.
array3: AAP forward simulation 95% credible intervals and original data
array4: LAP forward simulation 95% credible intervals and original data
array5: IAP forward simulation 95% credible intervals and original data
(Stan posterior predictive checks).
If any of these core diagnostics fail, the model fit may not be trustworthy.
Additional diagnostics for troubleshooting
These are useful when core checks indicate problems, or for deeper exploration of model behaviour:
Bayesian R^2 (r2_bayesA, r2_bayesL, r2_bayesI) - Measures explanatory power;
low values suggest poor fit to the data (i.e., 3 values for AAP assay, LAP assay,IAP assay).
Max treedepth exceeded - Number of times the sampler hit the maximum tree depth; should be 0 (Stan max treedepth documentation).
EBFMI (Energy Bayesian Fraction of Missing Information) - Low values indicate poor exploration of the posterior.
See the package vignette for worked examples of checking and interpreting these diagnostics.
Preprint reference
The models implemented in this function follow Donnelly et al. (2025, preprint), originally implemented in the EpiPv GitHub package.
A list with seven elements:
MCMC chains for estimated parameters (alpha[1], beta[1], gamma[1]/mu[1],bd[1] - and additionally lp__ for reference only).
MCMC chains for estimated parameters (alpha[1], beta[1], gamma[1],mu[1]).
summary_stats (rhat, ess_bulk, ess_tail). [1]: R-hat statistic for convergence (should be close to 1); [2-3]: statistics for n-eff.
Validation dataset: AAP sub-assay input test plant data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.
Validation dataset: LAP sub-assay input test plant data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.
Validation dataset: IAP sub-assay input test plant data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.
Mean and sd Bayesian R-squared values for model fit assessment, for AAP, LAP and IAP assays.
A list containing 2 sampler diagnostics, the number of overall divergent transitions and if maximum tree depth has been exceeded. See also screen print for acceptability of energy Bayesian fraction of missing information (E-BFMI).
Donnelly, R., Tankam, I. & Gilligan, C. (2025). "Plant pathogen profiling with the EpiPv package." EcoEvoRxiv, doi:10.32942/X29K9P.
When available, please cite the published version.
data("ap_data_sim_PT", package = "EpiPvr") # run low warm-up and iterations (mcmcOptions) for quick example only EVPT_pub=estimate_virus_parameters_PT( data=ap_data_sim_PT,D_LSin=5,D_numPtsPdin=1, mcmcOptions=c(25,50),numChainsIn=1,mc.parallel=0 )data("ap_data_sim_PT", package = "EpiPvr") # run low warm-up and iterations (mcmcOptions) for quick example only EVPT_pub=estimate_virus_parameters_PT( data=ap_data_sim_PT,D_LSin=5,D_numPtsPdin=1, mcmcOptions=c(25,50),numChainsIn=1,mc.parallel=0 )
Runs MCMC sampling using a precompiled Stan model for SPT plant virus. Analyses SPT access period data directly estimating 4 parameters:
(c1[1], c3[1], c2[1],bD[1], the first 3 are composite and the 4th corresponds to insect lab mortality rate).
The composite parameters can be unpacked into al[1], be[1], mu[1] - acquisition, inoculation, and insect recovery rates.
estimate_virus_parameters_SPT( data, D_LSin = 50, D_numPtsPdin = 1, mcmcOptions = c(500, 1000), numChainsIn = 4, mc.parallel = 0 )estimate_virus_parameters_SPT( data, D_LSin = 50, D_numPtsPdin = 1, mcmcOptions = c(500, 1000), numChainsIn = 4, mc.parallel = 0 )
data |
A list containing the AP experiment data for Stan (required). |
D_LSin |
Upper bound on insect vector lifespan in days, sets the vector survival discretisation (optional, default = 50). |
D_numPtsPdin |
Number of points per day of insect vector lifespan, sets the vector survival discretisation (optional, default = 1). |
mcmcOptions |
A numeric vector of length 2: The first element specifies the number of warm-up iterations (optional, default = 500), and the second element specifies the number of post-warm-up iterations (optional, default = 1000). |
numChainsIn |
Numeric: number of Markov chains (optional, default = 4). |
mc.parallel |
Binary: whether to use parallelisation for sampling (optional, default = 0, i.e. 1 core only). |
Interpreting model output - check stability first
Warnings will be printed to the screen if there are issues with model fitting.
Do not suppress warnings (e.g., via suppressWarnings()), as they contain essential information about
potential convergence problems.
Before interpreting any model results, always check the following core diagnostics:
R-hat - Should be close to 1 for all parameters; larger values indicate non-convergence (Stan R-hat documentation).
Effective Sample Size (n_eff) - Very low values suggest high autocorrelation or insufficient sampling (Stan ESS documentation).
Divergent transitions - Count should be 0; any non-zero count requires investigation (Stan divergences documentation).
Posterior predictive fit (simulated_data) - Compare model-simulated data with the observed data:
array3: AAP forward simulation 95% credible intervals and original data
array4: IAP forward simulation 95% credible intervals and original data
(Stan posterior predictive checks).
Additional diagnostics for troubleshooting
These are useful when core checks indicate problems, or for deeper exploration of model behaviour:
Bayesian R^2 (r2_bayesA, r2_bayesI) - Measures explanatory power;
low values suggest poor fit to the data (i.e., 2 values for AAP assay, IAP assay).
Max treedepth exceeded - Number of times the sampler hit the maximum tree depth; should be 0 (Stan max treedepth documentation).
EBFMI (Energy Bayesian Fraction of Missing Information) - Low values indicate poor exploration of the posterior.
See the package vignette for worked examples of checking and interpreting these diagnostics.
Preprint reference
The models implemented in this function follow Donnelly et al. (2025, preprint), originally implemented in the EpiPv GitHub package.
A list with seven elements:
First set of MCMC chains for estimated parameters (c1[1], c3[1], c2[1],bD[1] - and additionally lp__ for reference only).
Second set of MCMC chains for virus parameters (al[1], be[1], mu[1]).
summary_stats (rhat, ess_bulk, ess_tail). [1]: R-hat statistic for convergence (should be close to 1); [2-3]: statistics for n-eff.
Validation list, AAP input test plant infection data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.
Validation list, IAP input test plant infection data, forward simulated 2.5th percentile, forward simulated 97.5th percentile.
Mean and sd Bayesian R-squared values for model fit assessment, for AAP and IAP assays.
A list containing key sampler diagnostics: divergent transitions, maximum tree depth exceedances, See also screen print for acceptability of energy Bayesian fraction of missing information (E-BFMI).).
Donnelly, R., Tankam, I. & Gilligan, C. (2025). "Plant pathogen profiling with the EpiPv package." EcoEvoRxiv, doi:10.32942/X29K9P.
When available, please cite the published version.
data("ap_data_sim_SPT", package = "EpiPvr") # run low warm-up and iterations (mcmcOptions) for quick example only EVSPT_pub=estimate_virus_parameters_SPT( data=ap_data_sim_SPT,D_LSin=5,D_numPtsPdin=1, mcmcOptions=c(25,50),numChainsIn=1,mc.parallel=0 )data("ap_data_sim_SPT", package = "EpiPvr") # run low warm-up and iterations (mcmcOptions) for quick example only EVSPT_pub=estimate_virus_parameters_SPT( data=ap_data_sim_SPT,D_LSin=5,D_numPtsPdin=1, mcmcOptions=c(25,50),numChainsIn=1,mc.parallel=0 )