This R Markdown document walks through the steps of using Bayesian inference to create more realistic estimates of time-dependent reproductive number, R(t) that can be helpful for surveillance and intervention planning of infectious diseases, like COVID-19.
There are two ways to use linelineBayes
: either you have
caseCount data, which are aggregated case counts by
day, or you have Line-list data means you have a single
row for each case, that has dates for: infection, symptom onset,
positive test, and when this was reported to public health agencies.
Step 1. Load data
data("sample_dates")
data("sample_location")
data("sample_cases")
head(sample_dates)
#> [1] "2020-01-08" "2020-01-09" "2020-01-10" "2020-01-11" "2020-01-12"
#> [6] "2020-01-13"
head(sample_cases)
#> [1] 3 2 8 5 14 16
head(sample_location)
#> [1] "Tatooine" "Tatooine" "Tatooine" "Tatooine" "Tatooine" "Tatooine"
Step 2. Creating case-counts
caseCounts <- create_caseCounts(date_vec = sample_dates,
location_vec = sample_location,
cases_vec = sample_cases)
head(caseCounts)
#> date cases location
#> 1 2020-01-08 3 Tatooine
#> 2 2020-01-09 2 Tatooine
#> 3 2020-01-10 8 Tatooine
#> 4 2020-01-11 5 Tatooine
#> 5 2020-01-12 14 Tatooine
#> 6 2020-01-13 16 Tatooine
Plot
Get the first wave only, just to speed processing
Step 3. Define the serial interval. The
si()
function creates a vector of length 14 with alpha and
beta as defined in Li and White, for COVID-19.
Step 4. Run the back-calculation algorithm. Metropolis-Hastings within Gibbs sampling is used. NB dispersion (also called ‘size’) is a measure of over-dispersion (smaller size means more over-dispersion, which means variance != mean). For more information see Zeileis.
NOTE: when you run backnow
on case Count data,
you assume internally some distribution of onset (and missingness). The
code will run if some (but not all) onset data are present. If either
all onset data are NA or none are NA, the code will not run (because you
have all the information you would be simulating). This is the reason
for the reportF_missP = 0.6
argument. The implied onset
distribution is rnbinom()
with size = 3
and
mu = 9
.
out_list_demo <- run_backnow(caseCounts,
MAX_ITER = as.integer(2000),
norm_sigma = 0.2,
sip = sip,
NB_maxdelay = as.integer(20),
NB_size = as.integer(6),
printProgress = 1,
reportF_missP = 0.6)
Plot outputs. The points are aggregated reported cases, and the red line (and shaded confidence interval) represent the back-calculated case onsets that lead to the reported data.
You can also plot the R(t)
curve over time. In this
case, the red line (and shaded confidence interval) represent the
time-varying r(t). See Li and White for description.
Same result if you use with line_list data”
create your caseCounts_line
object
caseCounts_line <- create_linelist(report_dates = sample_report_dates,
onset_dates = sample_onset_dates)
head(caseCounts_line)
#> report_date delay_int onset_date is_weekend report_int week_int
#> 1 2020-01-08 NA <NA> 0 1 1
#> 2 2020-01-08 14 2019-12-25 0 1 1
#> 3 2020-01-08 7 2020-01-01 0 1 1
#> 4 2020-01-09 NA <NA> 0 2 1
#> 5 2020-01-09 12 2019-12-28 0 2 1
#> 6 2020-01-10 NA <NA> 0 3 1
and run
You can specify the distribution for caseCounts_line in either
run_backnow
or convert_to_linelist
(this runs
internally depending on the class of input
).
reportF
is the distribution function, _args
lists the distribution params that are not x
, and
_missP
is the percent missing. This must be between 0 < x < 1. Both ‘caseCounts’
and ‘caseCounts_line’ objects can be fed into
run_backnow
.
my_linelist <- convert_to_linelist(caseCounts,
reportF = rnbinom,
reportF_args = list(size = 3, mu = 9),
reportF_missP = 0.6)
head(my_linelist)
#> report_date delay_int onset_date is_weekend report_int week_int
#> 1 2020-01-08 NA <NA> 0 1 1
#> 2 2020-01-08 NA <NA> 0 1 1
#> 3 2020-01-08 5 2020-01-03 0 1 1
#> 4 2020-01-09 10 2019-12-30 0 2 1
#> 5 2020-01-09 6 2020-01-03 0 2 1
#> 6 2020-01-10 NA <NA> 0 3 1