Statistical Modelling for Infectious Disease Management - Risk assessment group quarantine

Question:

A school class with 25 students has met on the 04.11.2021 and two students were infected with COVID-19. Several students were tested:

  • First group: 3 persons in the group, tested with PCR on the second day after the group event.
  • Second group: 5 persons in the group, tested with PCR on the fourth day after the group event and with Antigen on the sixth day after the group event.

How likely is it that the two cases did not transmit COVID-19 and that no further cases will occur?

Calculating the probability that nobody is infected given the negative test results using calculate_posterior_no_infections

Inputs

The function calculate_posterior_no_infections() calculates the probability that nobody is infected given the negative tests. To that end, the following input values are required:

The input negative_persons denotes the number of people without the infectious persons and infected_persons denotes the number of primary COVID-19 cases in the school class. Furthermore, the event type event describes the setting of the meeting. In the package the event types school (school class) and day_care_center (day care center group) were modeled.

In addition, the information about the conducted tests test_infos, the conducted test types test_types and the number of persons which were tested at each days subgroup_size are needed. Each row of test_infos, test_types and each entry of subgroup_size describes one group of persons, which were tested in the same way on the same dates. test_infos describes in the first column how many tests and the following columns on which days after the event they were conducted, and test_types contains in each column which kind of test was conducted at the corresponding date.

Beside this, distribution as well as info are optional inputs, which can be set to use an own, custom prior distribution and own custom tests. distribution should be a probability vector, containing the probabilities that one COVID-19 case infects 0, …, negative_persons persons. For info a data frame with one column containing the day-specific sensitivities of the considered test after infection has to be created and the column should have the name of the considered tests. The specificity of the test is not affecting the likelihood, since no positive tests results are assumed.

Methodology

The function calculate_posterior_no_infections() uses calculate_likelihood_negative_tests and calculate_prior_infections to calculate the probability. The whole bayesian statistical model behind this function is described in the paper Jäckle et al [1].

Output


negative_persons <- 23
infected_persons <- 2
event <- "school"

test_infos <- matrix(nrow = 2, ncol = 3)
test_infos[1, 1] <- 1
test_infos[1, 2] <- 2
test_infos[2, 1] <- 2
test_infos[2, 2] <- 4
test_infos[2, 3] <- 6

test_types <- matrix(nrow = 2, ncol = 2)
test_types[1, 1] <- "PCR"
test_types[2, 1] <- "PCR"
test_types[2, 2] <- "Antigen"


subgroup_size <- c(3, 5)

prob <- calculate_posterior_no_infections(negative_persons,
                                          infected_persons,
                                          event,
                                          test_infos,
                                          test_types,
                                          subgroup_size)

print(prob)
#> [1] 0.825621

The resulting probability for no further infections given the negative test results is around 82.6%.

Calculating the likelihood using calculate_likelihood_negative_tests

The function calculate_likelihood_negative_tests() is used to calculate the vector of probabilities (likelihood) that zero positive tests are observed, given different numbers of persons infected by the primary cases.

Inputs

The calculate_likelihood_negative_tests() uses some inputs of calculate_posterior_no_infections, namely negative_persons, test_infos, test_types and subgroup_size.

Beside this, also info is an optional input, which can be set for using own custom tests.

Methodology

The likelihood function as part of the bayesian statistical model is described in the paper Jäckle et al [1].

Output

test_infos <- matrix(nrow = 2, ncol = 3)
test_infos[1, 1] <- 1
test_infos[1, 2] <- 2
test_infos[2, 1] <- 2
test_infos[2, 2] <- 4
test_infos[2, 3] <- 6

test_types <- matrix(nrow = 2, ncol = 2)
test_types[1, 1] <- "PCR"
test_types[2, 1] <- "PCR"
test_types[2, 2] <- "Antigen"

negative_persons <- 23
subgroup_size <- c(3, 5)

likelihood <- calculate_likelihood_negative_tests(test_infos,
                                                  test_types,
                                                  negative_persons,
                                                  subgroup_size)

print(likelihood)
#>  [1] 1.0000000000 0.8338649049 0.6908141737 0.5683275826 0.4640775090
#>  [6] 0.3759211840 0.3018929462 0.2401964946 0.1891971417 0.1474140667
#> [11] 0.1135125689 0.0862963205 0.0646996202 0.0477796460 0.0347087085
#> [16] 0.0247665041 0.0173323682 0.0118775284 0.0079573574 0.0052036265
#> [21] 0.0033167587 0.0020580817 0.0012420813 0.0007286544

The output vector shows the likelihood probabilities for negative test results given 0, ..., 23 persons infected by the primary cases.

Calculating the priori probability distribution of further infections using calculate_prior_infections

The function calculate_prior_infections() calculates the a priori probability of how many people are infected in a specific event setting.

Inputs

The calculate_prior_infections() needs the same inputs as calculate_posterior_no_infections except the test_infos, test_types and subgroup_sizes variables. Currently, the event types school (school class) and day_care_center (day care center group) are modeled and can be chosen. Beside this, the optional inputs p_one and infect_average can be set in order to model an own, custom event scenario. p_one defines the probability that a COVID-19 case infects at least one of the persons attending the event, and infect_average the average number of infected persons, under the condition that COVID-19 was transmitted at the event.

Methodology

The priori probability distribution as part of the bayesian statistical model is described in the paper Jäckle et al [1].

Outputs

negative_persons <- 23
infected_persons <- 2
event <- "school"

prior <- calculate_prior_infections(negative_persons,
                                    infected_persons,
                                    event)
print(prior)
#>  [1] 7.743990e-01 1.266298e-01 5.061046e-02 2.384428e-02 1.194404e-02
#>  [6] 6.134207e-03 3.172921e-03 1.635224e-03 8.333112e-04 4.173381e-04
#> [11] 2.042826e-04 9.720703e-05 4.471282e-05 1.975681e-05 8.325528e-06
#> [16] 3.317071e-06 1.236190e-06 4.250407e-07 1.323857e-07 3.641152e-08
#> [21] 8.515680e-09 1.593207e-09 2.128086e-10 1.529138e-11

The output vector shows the prior probabilities that 0, …, 23 persons are infected by the primary cases.

Visualization example of all date inputs on a time scale

Source
date <- as.Date("2021-10-05")
tests_df <- data.frame(c("2021-10-06 PCR-test", "2021-10-07 antigen-test", "2021-10-07 antigen-test", "2021-10-08 antigen-test", "2021-10-10 antigen-test"), c(2, 2, 3, 5, 1), c(1, 2, 2, 3, 5))
Source
 # Convert test data frame into arrays
   test_dates <- c()
   test_colors <- c()
   test_positions <- c()
   test_text_positions <- c()
   test_labels <- c("event")
   test_label_positions_x <- c(date)
   test_label_positions_y <- c(0.6)
   test_label_colours <- c("1")
   if (nrow(tests_df > 0)) {
      for (i in 1:nrow(tests_df)) {
         test_list <- strsplit(tests_df[i, 1], ", ")[[1]]
         dates_group <- c()
         for (j in 1:length(test_list)) {
            test <- strsplit(test_list[j], " ")[[1]]
            test_dates <- c(test_dates, test[1])
            dates_group <- c(dates_group, as.Date(test[1]))
            test_colors <- c(test_colors, as.character(i + 1) )
            test_positions <- c(test_positions, i * 0.9 / nrow(tests_df))
            test_text_positions <- c(test_text_positions, i * 0.9 / nrow(tests_df) + 0.1)
         }
         test_label <- paste(as.character(tests_df[i, 2]), " persons")
         if(tests_df[i, 2] == 1)
         {
            test_label <- "1 person"
         }
         test_labels <- c(test_labels, test_label)
         test_label_positions_x <- c(test_label_positions_x, mean.Date(dates_group))
         test_label_positions_y <- c(test_label_positions_y, i*0.9/nrow(tests_df) + 0.1)
         test_label_colours <- c(test_label_colours, as.character(i+1) )
      }
   }
   
   df_label <- data.frame(Date = test_label_positions_x, 
                          Position = test_label_positions_y, 
                          Label = test_labels, 
                          col = test_label_colours)
   df <- data.frame(Date = c(date, test_dates), 
                    Position = c(0.5, test_positions),
                    Text_pos = c(0.6, test_text_positions),
                    col      = c("1", test_colors))
   df <- df %>% arrange(desc(Position))
   
   timeline_plot <- ggplot(df, aes(x = Date, y = Position, color = col)) +
       scale_color_brewer(palette = "Set1") +
       theme_classic() +
       geom_hline(yintercept = 0, color = "black", size = 0.3) +
       geom_segment(aes(y = Position, yend = 0, xend = Date, color = col), 
                    size = 0.2) +
       geom_point(aes(y = Position), size = 3) +
       theme(axis.line.y = element_blank(),
             axis.text.y = element_blank(),
             axis.title.x = element_blank(),
             axis.title.y = element_blank(),
             axis.ticks.y = element_blank(),
             # axis.text.x = element_blank(),
             # axis.ticks.x = element_blank(),
             axis.line.x = element_blank(),
             legend.position = "none",
             legend.title = element_blank(),
             text = element_text(size = 16),
             axis.text.x = element_text(angle = 30, hjust = 1)
       ) +
       geom_label(aes(x = Date, y = Position, label = Label), df_label, size = 5, label.size = NA) +
       scale_x_date(date_labels = "%d %b", date_breaks = "1 day", limits = c(min(df$Date) - 1, max(df$Date) + 1)) 
       

   timeline_plot

Literature

[1] Jäckle S, Röger E, Dicken V, Geisler B, Schumacher J, Westphal M. A Statistical Model to Assess Risk for Supporting COVID-19 Quarantine Decisions. International Journal of Environmental Research and Public Health. 2021; 18(17): 9166.