--- title: "Statistical Modelling for Infectious Disease Management - Contacts" output: rmarkdown::html_vignette: fig_width: 7 fig_height: 5 fontsize: 12pt geometry: margin=1in vignette: > %\VignetteIndexEntry{Statistical Modelling for Infectious Disease Management - Contacts} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE ) knitr::knit_hooks$set( source = function(x, options) { hook.r <- function(x, options) { fence <- "```" language <- tolower(options$engine) if (language == 'node') language <- 'javascript' if (!options$highlight) language <- 'text' if(!is.null(options$foldcode)) { paste0('\n\n', "
Source\n", fence, language, '\n', x, fence, '\n\n', "
\n") } else { paste0('\n\n', fence, language, '\n', x, fence, '\n\n') } } x <- knitr:::hilight_source(x, 'markdown', options) hook.r( paste(c( x, '' ), collapse = '\n'), options ) } ) Sys.setlocale("LC_TIME", "C") ``` ```{r setup, echo = FALSE, message = FALSE} library(smidm) library(ggplot2) library(dplyr) library(hdrcde) ``` ## Question One person has become infected with COVID-19 on 28.12.2021. When will the contact person become ill and show the first symptoms? ## Generating a data frame with dates and illness probability of contacts using `get_serial_interval_density` The function `get_serial_interval_density()` creates a dataframe containing the probability that a contact will start showing symptoms (serial interval) at a particular date/time. Therefore, only the symptom begin date of the infected person is needed. Furthermore, the probability when the infected contacts of infected contacts show symptoms can be calculated, known as the second and also third generation of contacts. ### Inputs Multiple arguments are needed for the function `get_serial_interval_density()`: First of all, the `symptom_begin_date` has to be specified, which is defined as the date when the person started to show symptoms. Furthermore, the `max_serial_interval_days` is needed, which defines the interval length of the distribution output. The remaining two inputs `shape_serial` and `rate_serial` are the parameters of the log-normal distribution, which models the serial interval. ```{r get_serial_interval_density_statement} symptom_begin_date <- as.Date("2021-12-28") max_serial_interval_days <- 20 shape_serial <- 2.154631545 rate_serial <- 0.377343528 serial_in_df_v1 <- get_serial_interval_density(symptom_begin_date, max_serial_interval_days, shape_serial, rate_serial) ``` ### Methodology The default values of the parameters for the distribution, when the infected contacts will show symptoms, are from the paper Najafi et al. [1], in which a gamma distribution for the serial interval between symptom onsets of infected persons and the symptom onset of infected contacts was estimated. For the second generation of contacts the probability equals the summation of random variables because we assume that the infection from infected person to the first contact generation is independent from the first to the second generation. Thus, a convolution of two identical gamma distributions has to be conducted. For gamma distributions the convolutions of gamma distributions equals the summation of their first parameters [2]. Thus, the density for the second generation of infected persons showing symptoms is given by a gamma distribution with $2 \cdot$ `shape_serial` and the same `rate_serial`. In general, the serial interval for the $i$-th generation of contacts can be calculated with a gamma distribution with parameters $i \cdot$ `shape_serial` and `rate_serial`. This holds in an analogous way for the third generation. ### Output The function call returns the following data set: ```{r get_serial_interval_density_result, echo = FALSE, message = FALSE} knitr::kable(serial_in_df_v1[100:109, ], caption = "values 100 to 109 of resulting data frame") ``` The data frame shows for each hour beginning at `symptom_begin_date` until `max_serial_interval_days` the resulting density of the gamma distribution. This density can be used for calculating the most probable period for a contact person start showing symptoms. The same data frame is obtained for the second generation. Here, the time interval of the distribution is larger. ```{r get_serial_interval_density_v2} symptom_begin_date <- as.Date("2021-12-28") max_serial_interval_days <- 20 shape_serial <- 2 * 2.154631545 rate_serial <- 0.377343528 serial_in_df_v2 <- get_serial_interval_density(symptom_begin_date, max_serial_interval_days, shape_serial, rate_serial) ``` ```{r get_serial_interval_density_v2_table, echo = FALSE, message = FALSE} knitr::kable(serial_in_df_v2[100:109, ], caption = "values 100 to 109 of resulting data frame") ``` ## Visualization example of the data frame of \newline `get_serial_interval_density` The following code generates a plot with the gamma distribution of the illness probability of the first contact generation and the 80% and 95% high density intervals. In addition, the illness probability for the second generation is plotted in violet. ```{r .calculate_qstart_qend, foldcode = TRUE} .calculate_qstart_qend <- function(probability, df) { hdr_df <- hdr(den = data.frame(x = 1:length(df$distribution), y = df$distribution), p = probability * 100)$hdr qstart <- (hdr_df[1, 1] - 1) / 24 qend <- (hdr_df[1, 2] - 1) / 24 return(list("qstart" = qstart, "qend" = qend)) } ``` ```{r .shade_curve, foldcode = TRUE} .shade_curve <- function(df, qstart, qend, fill = "red", alpha = 0.4) { subset_df <- df[floor(qstart * 24):ceiling(qend * 24), ] geom_area(data = subset_df, aes(x = x, y = y), fill = fill, color = NA, alpha = alpha) } ``` ```{r parameters for visualization of get_infection_date_density, foldcode = TRUE} symptom_begin_date <- as.Date("2021-12-28") df <- get_serial_interval_density(symptom_begin_date, max_serial_interval_days = 20, shape_serial = 2.154631545, rate_serial = 0.377343528) period_80 <- .calculate_qstart_qend(0.8, df) period_95 <- .calculate_qstart_qend(0.95, df) df_2 <- get_serial_interval_density(symptom_begin_date, max_serial_interval_days = 20, shape_serial = 2 * 2.154631545, rate_serial = 0.377343528) symp_date_posixct_start <- as.POSIXct(format(as.POSIXct(symptom_begin_date, tz = "CET"), "%Y-%m-%d")) symp_date_posixct_end <- as.POSIXct(format(as.POSIXct(symptom_begin_date + 1, tz = "CET"), "%Y-%m-%d")) symp_date_posixct_mid <- symp_date_posixct_start - as.numeric(difftime(symp_date_posixct_start, symp_date_posixct_end, units = "hours")) / 2 * 3600 ``` ```{r visualization of get_infection_date_density, foldcode = TRUE} g <- ggplot() + scale_x_datetime(breaks = scales::date_breaks("1 days"), labels = scales::date_format("%d %b")) + theme(axis.text.x = element_text(angle = 90)) + # scale_x_continuous(breaks = x_tick, # labels = x_label) + # theme(axis.ticks.x = element_line(color = c(rbind(rep("black", length(x_label) / 2), rep(NA, length(x_label) / 2))), linetype = 2, size = 1)) + geom_path(aes(x = df$dates, y = df$distribution), color = "red", size = 1) + geom_path(aes(x = df_2$dates, y = df_2$distribution), color = "purple", size = 1) + .shade_curve(df = data.frame(x = df$dates, y = df$distribution), period_80$qstart, period_80$qend) + .shade_curve(df = data.frame(x = df$dates, y = df$distribution), period_95$qstart, period_95$qend, alpha = 0.2) + geom_rect(data = data.frame(xmin = symp_date_posixct_start, xmax = symp_date_posixct_end, ymin = -Inf, ymax = Inf), aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = "brown", alpha = 0.3) + geom_label(aes(x = symp_date_posixct_mid, y = 0.9*max(df$distribution), label = "symptom\nonset"), colour = "brown", fill = "white", size = 5, label.size = NA) + ylab("probability") + xlab("timeline") + labs(color = 'Verteilung') + # ggtitle("Visualization of get_infection_date_density ") + theme(legend.position = "none", text = element_text(size = 16*5/5)) + theme(axis.text.x = element_text(colour = "black", face = "bold", angle = 30, hjust = 1)) + theme(axis.title.x = element_text(colour = "black", face = "bold")) + theme(axis.text.y = element_text(colour = "gray50")) + theme(axis.title.y = element_text(colour = "gray50")) g ``` ## Literature [1] Najafi F, Izadi N, Hashemi-Nazari S-S, Khosravi-Shadmani F, Nikbakht R, Shakiba E. Serial interval and time-varying reproduction number estimation for COVID-19 in western Iran. New Microbes and New Infections. 2020; 36: 100715. [2] https://en.wikipedia.org/wiki/Gamma_distribution