Title: | Implements Empirical Bayes Incidence Curves |
---|---|
Description: | Make empirical Bayes incidence curves from reported case data using a specified delay distribution. |
Authors: | Andrew Miller [aut], Lauren Hannah [aut, cre], Nicholas Foti [aut], Joseph Futoma [aut], Apple, Inc. [cph] |
Maintainer: | Lauren Hannah <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1 |
Built: | 2024-11-21 06:46:45 UTC |
Source: | CRAN |
This function computes expected cases given incidence curve parameters and a delay distribution.
compute_expected_cases(beta, Q, lnPmat, Tobs)
compute_expected_cases(beta, Q, lnPmat, Tobs)
beta |
parameter vector of num_params |
Q |
spline basis matrix, of size Tmod x num_params |
lnPmat |
matrix size Tobs x Tobs, log of make_likelihood_matrix |
Tobs |
maximum observed time point |
A Tobs-length vector that models expected cases
This function computes log likelihood of incidence model given parameters and observations.
compute_log_incidence(beta, Q, Tobs)
compute_log_incidence(beta, Q, Tobs)
beta |
parameter vector of num_params |
Q |
spline basis matrix, of size Tmod x num_params |
Tobs |
maximum observed time point |
I Tobs-length vector that models log incidence curve
Daily case, hospitalization, and death proportions.
covid_delay_dist
covid_delay_dist
A data frame with 61 entries and 4 columns.
number of days since infection
proportion of cases confirmed by a test that are recorded on that day
proportion of cases that become hospitalized that are hospitalized on that day
proportion of cases that result in death that die on that day
Time from incidence to symptoms: Lauer et al., "Estimated Incubation Period of COVID-19", ACC (2020). https://www.acc.org/latest-in-cardiology/journal-scans/2020/05/11/15/18/the-incubation-period-of-coronavirus-disease.
Time from symptoms to recorded cases: Case line data from Florida through 2020-07-14 with same day waits removed. https://open-fdoh.hub.arcgis.com/datasets/florida-covid19-case-line-data.
Time from symptoms to hospitalization: Wang et al., "Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus–Infected Pneumonia in Wuhan, China", JAMA (2020). https://jamanetwork.com/journals/jama/fullarticle/2761044.
Time from hospitalization to death: Lewnard et al. "Incidence, clinical outcomes, and transmission dynamics of severe coronavirus disease 2019 in California and Washington: prospective cohort study", BJM (2020). https://www.bmj.com/content/369/bmj.m1923.long
Daily case, hospitalization, and death proportions by borough through 2020-06-30.
covid_new_york_city
covid_new_york_city
A data frame with 615 entries and 5 columns.
record date
record borough: Brooklyn, Bronx, Manhattan, Queens, and Staten Island
number of recorded cases
number of new hospital admissions
number of recorded deaths
New York City Department of Health https://raw.githubusercontent.com/nychealth/coronavirus-data/master/boro/boroughs-case-hosp-death.csv.
Check input data for:
minimum length of reported
integer for reported
positivity for delay_dist and reported
sums to 1 for delay_dist
Throw an error if any conditions are violated.
data_check(reported, delay_dist)
data_check(reported, delay_dist)
reported |
An integer vector of reported cases. |
delay_dist |
A positive vector that sums to one, which describes the delay distribution. |
Does basic checks for reported data and delay distribution, front pads, and makes AR extrapolation.
data_processing( reported, delay_dist, num_ar_steps = 10, num_ar_samps = 100, seed = 1, linear_tail = 14, front_pad_size = 10, extrapolation_prior_precision = 2 )
data_processing( reported, delay_dist, num_ar_steps = 10, num_ar_samps = 100, seed = 1, linear_tail = 14, front_pad_size = 10, extrapolation_prior_precision = 2 )
reported |
An integer vector of reported cases. |
delay_dist |
A positive vector that sums to one, which describes the delay distribution. |
num_ar_steps |
An integer number of AR steps after last observation. |
num_ar_samps |
An integer number of AR samples. |
seed |
Seed for RNG. |
linear_tail |
An integer number of days used to fit linear model on tail to be used as a mean for AR extrapolation. |
front_pad_size |
An integer for initial number of 0's before first observation. |
extrapolation_prior_precision |
A positive scalar for extrapolation slope shrinkage prior precision. |
A list with elements:
extrap = a matrix of size (num_ar_samps x n + num_ar_steps + front_pad_size)
original = a vector of logicals for whether in original time series range
This function computes a transpose of the 1st difference operator.
diff_trans(a)
diff_trans(a)
a |
A vector of inputs |
The transpose of the first difference operator
This is a function that fits an incidence curve to a set of reported cases and delay distribution using an empirical Bayes estimation method, which fits parameters for a spline basis. All hyper parameter tuning and data processing are done within this function.
fit_incidence( reported, delay_dist, dof_grid = seq(6, 20, 2), dof_method = "aic", lam_grid = 10^(seq(-1, -8, length.out = 20)), lam_method = "val", percent_thresh = 2, regularization_order = 2, num_ar_steps = 10, num_ar_samps = 100, linear_tail = 14, front_pad_size = 10, extrapolation_prior_precision = 10, frac_train = 0.75, fisher_approx_cov = TRUE, end_pad_size = 50, num_samps_per_ar = 10, val_restarts = 2, seed = 1 )
fit_incidence( reported, delay_dist, dof_grid = seq(6, 20, 2), dof_method = "aic", lam_grid = 10^(seq(-1, -8, length.out = 20)), lam_method = "val", percent_thresh = 2, regularization_order = 2, num_ar_steps = 10, num_ar_samps = 100, linear_tail = 14, front_pad_size = 10, extrapolation_prior_precision = 10, frac_train = 0.75, fisher_approx_cov = TRUE, end_pad_size = 50, num_samps_per_ar = 10, val_restarts = 2, seed = 1 )
reported |
An integer vector of reported cases. |
delay_dist |
A positive vector that sums to one, which describes the delay distribution. |
dof_grid |
An integer vector of degrees of freedom for the spline basis. |
dof_method |
Metric to choose "best" spline degrees of freedom: 'aic': Akaike information criterion, 'bic': Bayesian information criterion, 'val': validation likelihood. |
lam_grid |
A vector of regularization strengths to scan. |
lam_method |
metric to choose "best" regularization strength lambda: 'aic': Akaike information criterion, 'bic': Bayesian information criterion, 'val': validation likelihood. |
percent_thresh |
If using validation likelihood to select best, the largest (strongest) lambda that is within 'percent_thresh' of the highest validation lambda will be selected. Default is 2. Must be greater than 0. |
regularization_order |
An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty. |
num_ar_steps |
An integer number of AR steps after last observation. |
num_ar_samps |
An integer number of AR samples. |
linear_tail |
An integer number of days used to fit linear model on tail to be used as a mean for AR extrapolation. |
front_pad_size |
An integer for initial number of 0's before first observation. |
extrapolation_prior_precision |
A positive scalar for extrapolation slope shrinkage prior precision. |
frac_train |
A numeric between 0 and 1 for fraction of data used to train lambda validation. |
fisher_approx_cov |
A flag to use either the Fisher Information (TRUE) or the Hessian (FALSE) to approx posterior covariance over parameters. |
end_pad_size |
And integer number of steps the spline is defined beyond the final observation. |
num_samps_per_ar |
An integer for the number of Laplace samples per AR fit. |
val_restarts |
An integer for the number of times to refit hyperparameters if 'val' is used for either. Set to 1 for faster but more unstable fits. |
seed |
Seed for RNG. |
A list with the following entries:
Isamps – sample of the incidence curve from a Laplace approximation per AR sample;
Ihat – MAP incidence curve estimate;
Chat – expected cases given MAP incidence curve estimate;
beta_hats – matrix of beta's per AR sample;
best_dof – best degrees of freedom from tuning;
best_lambda – best regularization parameter from tuning; and
reported – a copy of reported values used for fitting.
indiana_model <- fit_incidence( reported = spanish_flu$Indiana, delay_dist = spanish_flu_delay_dist$proportion)
indiana_model <- fit_incidence( reported = spanish_flu$Indiana, delay_dist = spanish_flu_delay_dist$proportion)
Add zeros in front of reported data avoid infections from before first reported date all being placed on first reported date.
front_zero_pad(reported, size)
front_zero_pad(reported, size)
reported |
An integer vector of reported cases |
size |
An integer size of zero-padding |
An integer vector of cases with size 0's in front
Export the output of fit_incidence
to a data frame with an optional
addition of a time index.
incidence_to_df(x, times = NULL, low_quantile = 0.05, high_quantile = 0.95)
incidence_to_df(x, times = NULL, low_quantile = 0.05, high_quantile = 0.95)
x |
An "incidence_spline_model" output from |
times |
An optional vector of time indices. |
low_quantile |
A scalar that specifies the low quantile value for the output CI. |
high_quantile |
A scalar that specifies the high quantile value for the output CI. |
A data frame with the following entries:
Time – a time index; if 'ts' is 'NULL' it is the observation number;
Reported – the value of 'reported';
Ihat – MAP incidence curve estimate;
Chat – expected cases given MAP incidence curve estimate;
LowCI – lower pointwise credible interval bands around the incidence curve; and
HighCI – higher pointwise credible interval bands around the incidence curve.
indiana_model <- fit_incidence( reported = spanish_flu$Indiana, delay_dist = spanish_flu_delay_dist$proportion) indiana_df <- incidence_to_df(indiana_model, times = spanish_flu$Date)
indiana_model <- fit_incidence( reported = spanish_flu$Indiana, delay_dist = spanish_flu_delay_dist$proportion) indiana_df <- incidence_to_df(indiana_model, times = spanish_flu$Date)
Initialize spline parameters (beta) using a standard Gaussian distribution.
init_params(num_params)
init_params(num_params)
num_params |
Integer size of desired parameter vector |
vector of size num_params
Make auto-regressive (AR) samples for extrapolation past end point to help with right-censoring problems.
make_ar_extrap_samps( reported, num_ar_steps = 10, num_ar_samps = 50, seed = 1, linear_tail = 14, extrapolation_prior_precision = 2 )
make_ar_extrap_samps( reported, num_ar_steps = 10, num_ar_samps = 50, seed = 1, linear_tail = 14, extrapolation_prior_precision = 2 )
reported |
An integer vector of reported cases. |
num_ar_steps |
An integer number of AR steps after last observation. |
num_ar_samps |
An integer number of AR samples. |
seed |
Seed for RNG. |
linear_tail |
An integer number of days used to fit linear model on tail to be used as a mean for AR extrapolation. |
extrapolation_prior_precision |
A positive scalar for extrapolation slope shrinkage prior precision. |
A matrix of size (num_ar_samps x n + num_ar_steps)
This function creates a matrix such that P[t, s] = P(C = t | I = s) = theta_t-s for s <= t and 0 otherwise.
make_likelihood_matrix(delay_dist)
make_likelihood_matrix(delay_dist)
delay_dist |
A positive vector that sums to one, which describes the delay distribution. |
A matrix of size n x n
This function creates basis matrix for spline model using cubic splines.
make_spline_basis(dof, tgrid)
make_spline_basis(dof, tgrid)
dof |
An integer degrees of freedom. |
tgrid |
A grid of time values. |
A matrix of cubic spline basis values with 'length(tgrid)' x 'dof' entries.
Marginal log likelihood This function computes the marginal probability of Pr(reported | beta). Note that lnPmat must be zero padded enough (or censored) to match the length of reported cases vector.
marg_loglike_poisson(beta, reported, Q, lnPmat)
marg_loglike_poisson(beta, reported, Q, lnPmat)
beta |
spline parameter vector length num_params |
reported |
An integer vector of reported cases. |
Q |
spline basis matrix Tmod x num_params |
lnPmat |
matrix size Tobs x Tobs, log of make_likelihood_matrix |
A scalar log likelihood value.
This function computes the Fisher information matrix log likelihood term with respect to beta.
marg_loglike_poisson_fisher(beta, reported, Q, lnPmat)
marg_loglike_poisson_fisher(beta, reported, Q, lnPmat)
beta |
A spline parameter vector length num_params. |
reported |
An integer vector of reported cases. |
Q |
A spline basis matrix Tmod x num_params. |
lnPmat |
A matrix size Tobs x Tobs, log of make_likelihood_matrix. |
A numeric vector, gradient of log likelihood value with respect to beta.
This function computes the gradient of the log likelihood term with respect to beta.
marg_loglike_poisson_grad(beta, reported, Q, lnPmat)
marg_loglike_poisson_grad(beta, reported, Q, lnPmat)
beta |
spline parameter vector length num_params |
reported |
An integer vector of reported cases. |
Q |
spline basis matrix Tmod x num_params |
lnPmat |
matrix size Tobs x Tobs, log of make_likelihood_matrix |
A numeric vector, gradient of log likelihood value with respect to beta.
Plot time, reported cases, incidence curve with credible interval, and implied case curve.
## S3 method for class 'incidence_spline_model' plot(x, ...)
## S3 method for class 'incidence_spline_model' plot(x, ...)
x |
An "incidence_spline_model" output from |
... |
Other parameters that can be included:
|
indiana_model <- fit_incidence( reported = spanish_flu$Indiana, delay_dist = spanish_flu_delay_dist$proportion) plot(indiana_model, times = spanish_flu$Date)
indiana_model <- fit_incidence( reported = spanish_flu$Indiana, delay_dist = spanish_flu_delay_dist$proportion) plot(indiana_model, times = spanish_flu$Date)
This function computes Poisson objective function including regularizer.
poisson_objective(beta, lam, reported, Q, lnPmat, regularization_order)
poisson_objective(beta, lam, reported, Q, lnPmat, regularization_order)
beta |
spline parameter vector length num_params |
lam |
positive scalar regularization strength |
reported |
An integer vector of reported cases. |
Q |
spline basis matrix Tmod x num_params |
lnPmat |
matrix size Tobs x Tobs, log of make_likelihood_matrix |
regularization_order |
An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty. |
scalar objective function value
This function computes the Poisson objective function (including regularizer) gradient.
poisson_objective_grad(beta, lam, reported, Q, lnPmat, regularization_order)
poisson_objective_grad(beta, lam, reported, Q, lnPmat, regularization_order)
beta |
spline parameter vector length num_params |
lam |
positive scalar regularization strength |
reported |
An integer vector of reported cases. |
Q |
spline basis matrix Tmod x num_params |
lnPmat |
matrix size Tobs x Tobs, log of make_likelihood_matrix |
regularization_order |
An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty. |
scalar objective function value
This function computes the Fisher information matrix for a regularized Poisson objective function.
poisson_objective_post_cov_approx( beta, lam, reported, Q, lnPmat, regularization_order )
poisson_objective_post_cov_approx( beta, lam, reported, Q, lnPmat, regularization_order )
beta |
A vector of spline parameters. |
lam |
A regularization penalty parameter. |
reported |
A vector of reported values. |
Q |
A spline basis matrix. |
lnPmat |
A matrix size Tobs x Tobs, log of make_likelihood_matrix. |
regularization_order |
An integer that specifies the regularization order. |
Fisher information matrix of a regularized Poisson objective function.
This function computes regularization penalty term based on the betas and a difference.
regfun(beta, regularization_order = 2)
regfun(beta, regularization_order = 2)
beta |
A spline parameter vector length num_params. |
regularization_order |
An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty. |
A scalar regularization value.
This function computes regularization penalty term gradient based on the betas and difference order.
regfun_grad(beta, regularization_order = 2)
regfun_grad(beta, regularization_order = 2)
beta |
spline parameter vector length num_params |
regularization_order |
An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty. |
scalar regularization value
This function computes regularization penalty term Hessian based on the betas and differencing order.
regfun_hess(beta, regularization_order = 2)
regfun_hess(beta, regularization_order = 2)
beta |
spline parameter vector length num_params |
regularization_order |
An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty. |
scalar regularization value
This function generates Laplace samples of posterior distribution for a vector of reported incidence.
sample_laplace_log_incidence_poisson( beta_hat, beta_cov, reported, Q, num_samps_per_ar = 10 )
sample_laplace_log_incidence_poisson( beta_hat, beta_cov, reported, Q, num_samps_per_ar = 10 )
beta_hat |
Maximum likelihood solution for beta parameter. |
beta_cov |
Covariance of objective solution (either Fisher information or Hessian inverse). |
reported |
An integer vector of reported cases. |
Q |
Spline basis matrix. |
num_samps_per_ar |
Number of Laplace samples to return for each AR path. |
A matrix of 'num_samps_per_ar' log incidence curve samples from laplace approximation of distribution.
This function holds the regularization parameter value fixed and scans spline degrees of freedom.
scan_spline_dof( reported, delay_dist, dof_grid, method = "bic", lam = 0, regularization_order = 2, reported_val = NULL, end_pad_size = 0, fisher_approx_cov = FALSE )
scan_spline_dof( reported, delay_dist, dof_grid, method = "bic", lam = 0, regularization_order = 2, reported_val = NULL, end_pad_size = 0, fisher_approx_cov = FALSE )
reported |
An integer vector of reported cases. |
delay_dist |
A positive vector that sums to one, which describes the delay distribution. |
dof_grid |
An integer vector of degrees of freedom for the spline basis. |
method |
Metric to choose "best" dof: 'aic', 'bic', 'val'. If method='val', reported_val must be non NULL and match reported size. |
lam |
A fixed value for the beta parameter regularization strength. |
regularization_order |
An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty. |
reported_val |
Validation time series of equal size to reported vector for use with 'val' method. Default is NULL. |
end_pad_size |
And integer number of steps the spline is defined beyond the final observation. |
fisher_approx_cov |
A flag to use either the Fisher Information (TRUE) or the Hessian (FALSE) to approx posterior covariance over parameters. |
A list of degree of freedom fit statistics:
best_dof = best degrees of freedom
dof_resdf = data frame of fit statistics (lambda, dof, aic, bic, val_lls, train_lls)
This function holds degrees of freedom fixed and scans regularization parameter values.
scan_spline_lam( reported, delay_dist, lam_grid, method = "val", percent_thresh = 2, dof = 10, regularization_order = 2, reported_val = NULL, end_pad_size = 0, fisher_approx_cov = TRUE )
scan_spline_lam( reported, delay_dist, lam_grid, method = "val", percent_thresh = 2, dof = 10, regularization_order = 2, reported_val = NULL, end_pad_size = 0, fisher_approx_cov = TRUE )
reported |
An integer vector of reported cases. |
delay_dist |
A positive vector that sums to one, which describes the delay distribution. |
lam_grid |
A vector of regularization strengths to scan. |
method |
Metric to choose "best" dof: 'aic', 'bic', 'val'. If method='val', reported_val must be non NULL and match reported size. |
percent_thresh |
If using validation likelihood to select best, the largest (strongest) lambda that is within 'percent_thresh' of the highest validation lambda will be selected. Default is 2. Must be greater than 0. |
dof |
Degrees of freedom for spline basis. |
regularization_order |
An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty. |
reported_val |
Validation time series of equal size to reported vector for use with 'val' method. Default is NULL. |
end_pad_size |
And integer number of steps the spline is defined beyond the final observation. |
fisher_approx_cov |
A flag to use either the Fisher Information (TRUE) or the Hessian (FALSE) to approx posterior covariance over parameters. |
List of outputs:
best_lam = best lambda
lam_resdf = data frame of fit statistics (lambda, dof, aic, bic, val_lls, train_lls)
Daily mortality data from 1918-09-01 through 1918-12-31 in Indiana, Kansas, and Philadelphia
spanish_flu
spanish_flu
A data frame with 122 entries for 3 locations
date
daily deaths for all of Indiana
daily deaths for all of Kansas
daily deaths for Philadelphia
Rogers SL (1920). Special Tables of Mortality from Influenza and Pneumonia, in Indiana, Kansas, and Philadelphia, PA (U.S. Dept Commerce, Washington, DC).
Daily death proportions.
spanish_flu_delay_dist
spanish_flu_delay_dist
A data frame with 31 entries and 3 columns.
number of days since infection
proportion of deaths that happen on that day
Goldstein E, et al. (2009). Reconstructing influenza incidence by deconvolution of daily mortality time series (PNAS). https://www.pnas.org/content/pnas/106/51/21825.full.pdf
This function fit models with selected hyperparameters on reported data and return a matrix of posterior Laplace samples.
train_and_validate( reported, delay_dist, lam, dof, beta0 = NULL, regularization_order = 2, reported_val = NULL, end_pad_size = 0, fisher_approx_cov = TRUE, num_samps_per_ar = 10 )
train_and_validate( reported, delay_dist, lam, dof, beta0 = NULL, regularization_order = 2, reported_val = NULL, end_pad_size = 0, fisher_approx_cov = TRUE, num_samps_per_ar = 10 )
reported |
An integer vector of reported cases. |
delay_dist |
A positive vector that sums to one, which describes the delay distribution. |
lam |
A fixed value for the beta parameter regularization strength. |
dof |
Degrees of freedom for spline basis. |
beta0 |
(optional) Initial setting of spline parameters (before optimization) |
regularization_order |
An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty. |
reported_val |
Validation time series of equal size to reported vector for use with 'val' method. Default is NULL. |
end_pad_size |
And integer number of steps the spline is defined beyond the final observation. |
fisher_approx_cov |
A flag to use either the Fisher Information (TRUE) or the Hessian (FALSE) to approx posterior covariance over parameters. |
num_samps_per_ar |
An integer for the number of Laplace samples per AR fit. |
A list of results of train and validate, including:
train_ll = training log likelihood
val_ll = validation log likelihood (if 'reported_val' is not 'NULL')
Isamps = samples of the incidence curve from a Laplace approximation
Ihat = MAP estimate of the incidence curve
Chat = expected cases given MAP incidence curve
beta_hat = MAP estimate of spline parameters
beta_cov = covariance of spline parameters
beta_hess = Hessian of spline parameters
Split reported case integer time series into train and validate time series through thinning.
train_val_split(reported, frac_train = 0.75)
train_val_split(reported, frac_train = 0.75)
reported |
An integer vector of reported cases. |
frac_train |
A numeric between 0 and 1 for fraction of data used to train lambda validation. |
A list(reported_train, reported_val) where the elements reported_train and reported_val are both length, Tobs, and 'frac_train' of the counts fall in reported_train, the rest in reported_val.