linelistBayes: an introduction

This R Markdown document walks through the steps of using Bayesian inference to create more realistic estimates of time-dependent reproductive number, R(t) that can be helpful for surveillance and intervention planning of infectious diseases, like COVID-19.

There are two ways to use linelineBayes: either you have caseCount data, which are aggregated case counts by day, or you have Line-list data means you have a single row for each case, that has dates for: infection, symptom onset, positive test, and when this was reported to public health agencies.

Example: Case Count data

library(linelistBayes)

Step 1. Load data

data("sample_dates")
data("sample_location")
data("sample_cases")

head(sample_dates)
#> [1] "2020-01-08" "2020-01-09" "2020-01-10" "2020-01-11" "2020-01-12"
#> [6] "2020-01-13"
head(sample_cases)
#> [1]  3  2  8  5 14 16
head(sample_location)
#> [1] "Tatooine" "Tatooine" "Tatooine" "Tatooine" "Tatooine" "Tatooine"

Step 2. Creating case-counts

caseCounts <- create_caseCounts(date_vec = sample_dates,
                                location_vec = sample_location,
                                cases_vec = sample_cases)
head(caseCounts)
#>         date cases location
#> 1 2020-01-08     3 Tatooine
#> 2 2020-01-09     2 Tatooine
#> 3 2020-01-10     8 Tatooine
#> 4 2020-01-11     5 Tatooine
#> 5 2020-01-12    14 Tatooine
#> 6 2020-01-13    16 Tatooine

Plot

plot(caseCounts)

Get the first wave only, just to speed processing

caseCounts <- caseCounts[1:80, ]
plot(caseCounts)

Step 3. Define the serial interval. The si() function creates a vector of length 14 with alpha and beta as defined in Li and White, for COVID-19.

sip <- si(14, 4.29, 1.18)
plot(sip, type = 'l')

Step 4. Run the back-calculation algorithm. Metropolis-Hastings within Gibbs sampling is used. NB dispersion (also called ‘size’) is a measure of over-dispersion (smaller size means more over-dispersion, which means variance != mean). For more information see Zeileis.

NOTE: when you run backnow on case Count data, you assume internally some distribution of onset (and missingness). The code will run if some (but not all) onset data are present. If either all onset data are NA or none are NA, the code will not run (because you have all the information you would be simulating). This is the reason for the reportF_missP = 0.6 argument. The implied onset distribution is rnbinom() with size = 3 and mu = 9.

out_list_demo <- run_backnow(caseCounts, 
                        MAX_ITER = as.integer(2000), 
                        norm_sigma = 0.2,
                        sip = sip,
                        NB_maxdelay = as.integer(20),
                        NB_size = as.integer(6),
                        printProgress = 1,
                        reportF_missP = 0.6)

Plot outputs. The points are aggregated reported cases, and the red line (and shaded confidence interval) represent the back-calculated case onsets that lead to the reported data.

plot(out_list_demo, "est")

You can also plot the R(t) curve over time. In this case, the red line (and shaded confidence interval) represent the time-varying r(t). See Li and White for description.

plot(out_list_demo, "rt")

Example: lineList data

Same result if you use with line_list data”

data("sample_report_dates")
data("sample_onset_dates")

create your caseCounts_line object

caseCounts_line <- create_linelist(report_dates = sample_report_dates,
                                onset_dates = sample_onset_dates)
head(caseCounts_line)
#>   report_date delay_int onset_date is_weekend report_int week_int
#> 1  2020-01-08        NA       <NA>          0          1        1
#> 2  2020-01-08        14 2019-12-25          0          1        1
#> 3  2020-01-08         7 2020-01-01          0          1        1
#> 4  2020-01-09        NA       <NA>          0          2        1
#> 5  2020-01-09        12 2019-12-28          0          2        1
#> 6  2020-01-10        NA       <NA>          0          3        1

and run

out_list_demo <- run_backnow(caseCounts_line, 
                        MAX_ITER = as.integer(2000), 
                        norm_sigma = 0.2,
                        sip = sip,
                        NB_maxdelay = as.integer(20),
                        NB_size = as.integer(6),
                        printProgress = 1)

Additional functionality

Specifying the distribution for caseCounts_line

You can specify the distribution for caseCounts_line in either run_backnow or convert_to_linelist (this runs internally depending on the class of input). reportF is the distribution function, _args lists the distribution params that are not x, and _missP is the percent missing. This must be between 0 < x < 1. Both ‘caseCounts’ and ‘caseCounts_line’ objects can be fed into run_backnow.

my_linelist <- convert_to_linelist(caseCounts, 
                                   reportF = rnbinom, 
                                   reportF_args = list(size = 3, mu = 9),
                                   reportF_missP = 0.6)
head(my_linelist)
#>   report_date delay_int onset_date is_weekend report_int week_int
#> 1  2020-01-08        NA       <NA>          0          1        1
#> 2  2020-01-08        NA       <NA>          0          1        1
#> 3  2020-01-08         5 2020-01-03          0          1        1
#> 4  2020-01-09        10 2019-12-30          0          2        1
#> 5  2020-01-09         6 2020-01-03          0          2        1
#> 6  2020-01-10        NA       <NA>          0          3        1