written by Oswaldo Gressani

Introduction

The EpiLPS (Epidemiological modeling with Laplacian-P-Splines) package can be used to estimate the incubation period of an infectious disease (the time elapsed between infection and symptom onset) following the methodology of Gressani et al. (2023). A challenge one typically encounters when attempting to model the incubation period is the imperfect information that accompanies infection times. The latter are rarely observed in practice and are usually only known to lie within a time window (often called exposure interval). Together with symptom onset times (which are more easily observed and thus available), exposure intervals can be used in a statistical modeling framework to construct an estimate of the incubation period.

EpiLPS proposes a Bayesian framework to estimate the distribution of the incubation period based on coarse data, i.e. data that has an interval-censored flavor as described above for the exposure window. A semi-parametric model involving Laplace approximations and penalized B-splines is used to estimate the incubation period distribution. The posterior distribution of the model parameters is sampled by means of a Metropolis-adjusted Langevin algorithm that is embedded in a Gibbsian scheme. Time consuming subroutines are coded in C++ and integrated in R via the Rcpp package so that computational time is relatively low. The semi-parametric incubation fit is then used to inform a set of distributions belonging to well-known parametric families that are often used to model incubation times and the Bayesian information criterion guides towards the ‘best’ fit. As such, the methodology provides an interesting alternative to classic Bayesian approaches that solely rely on purely parametric models. This preprint gives a detailed description of the methodology and explains how it is naturally nested within the EpiLPS framework.

The estimIncub() (estimation of incubation period) routine is the key function that allows the user to compute an estimate of the incubation density based on three main inputs:

  • x: A data frame containing the lower and upper bound of the incubation interval.
  • K: An integer specifying the number of B-splines to smooth the incubation density.
  • niter: The number of MCMC samples required.

The data frame x has two columns. The first column should correspond to the lower bound of the incubation interval and the second column to the upper bound. From (observed) symptom onset times \(t^{\mathcal{S}}\) and exposure intervals \(\mathcal{E}=[t^{E_L},t^{E_R}]\), incubation intervals \(\mathcal{I}=[t^{\mathcal{I}_L}, t^{\mathcal{I}_R}]\) can be easily constructed with lower bound \(t^{\mathcal{I}_L}=t^{\mathcal{S}}-t^{E_R}\) and upper bound \(t^{\mathcal{I}_R}=t^{\mathcal{S}}-t^{E_L}\). The number of B-splines basis functions is by default set to K=10 but more can be added if required. Finally, the number of MCMC iterations is by default set to niter=1000.

Simulated example

A simulated example is first used to illustrate the estimIncub() routine, where symptom onset times and infecting exposure windows are generated by means of the incubsim() routine. The latter function allows to generate a sample of symptom onset times and exposure windows for a desired sample of size \(n\) with a Data Generating Mechanism (DGM) that allows for different distributions of the incubation period, namely a LogNormal, Weibull, a two-component mixture of Weibulls and a Gamma distribution. Below is the code for generating a sample of size \(n=200\) with a Weibull incubation density. The option coarseness is set to two days (this will on average set the exposure interval width of the generated data to two days). With $Dobsincub the user is able to access the lower and upper bound of the incubation interval (which will be the x input in estimIncub()). Here, the generated incubation windows (in days) for the first five individuals are shown.

set.seed(123)
simdat <- incubsim(incubdist = "Weibull", n = 100, coarseness = 2)
knitr::kable(head(simdat$Dobsincub, 5))
tL tR
3.140144 4.854940
4.591608 6.388431
3.207375 4.898248
8.411519 11.502474
6.906292 9.660761

The true incubation distribution in the above DGM is a Weibull with a mean of 6.4 days and a standard deviation of 2.3 days (Backer et al. 2020). Detailed information on the DGM can be found by typing ?incubsim() in the R console. The estimIncub() routine is now applied on the above generated dataset.

fitsim <- estimIncub(x = simdat$Dobsincub, verbose = TRUE, tmax = 20)
## ----------------------------------------------------------------
## Time elapsed: 2.56 seconds.
## Fitted density is Weibull with shape=2.811 and scale=7.408.
## Mean incubation period (days): 6.597 with 95% CI: 6.372-6.836.
## 95th percentile (days): 10.945 with 95% CI: 10.62-11.465.
## ----------------------------------------------------------------

Typing fitsim$stats gives access to point estimates (and credible intervals) for crucial features of the incubation distribution (Mean, SD, quantiles).

knitr::kable(head(fitsim$stats))
PE CI90L CI90R CI95L CI95R
Mean 6.597295 6.412497 6.836487 6.371662 6.836487
SD 2.541933 2.418050 2.693500 2.401534 2.693500
q0.05 2.574895 2.385706 2.807828 2.360121 2.874251
q0.1 3.326458 3.125690 3.566497 3.093982 3.637859
q0.15 3.881047 3.674159 4.125784 3.643898 4.184041
q0.2 4.344429 4.135072 4.584816 4.100857 4.635539

The fitted incubation density (in red) can also be graphically compared with the target Weibull density (in black) by accessing the different elements of the fitsim object.

plot(fitsim$tg, fitsim$ftg, col = "royalblue", type = "l", lwd = 2, xlab = "Time (in days)",
     ylab = "Probability density function", ylim = c(0,0.16))
lines(simdat$tdom, dweibull(simdat$tdom, shape = 3.00060, scale = 7.170345), 
      type = "l", lwd = 2)
legend("topright", c("Target", "Estimation"), col = c("black","royalblue"), lwd = c(2,2), bty = "n")

The plot() function can also directly be applied to the fitsim object to obtain the fitted probability density function and (if required) a bar plot showing the incubation windows. The gray shaded area corresponds to the \(95\%\) credible interval.

gridExtra::grid.arrange(plot(fitsim, typ = "incubwin"), plot(fitsim, type = "pdf"),
                        nrow = 1, ncol = 2)

Real data example

As a real data application, we consider the dataset on MERS-CoV infections in South Korea downloaded from Virlogeux et al. (2016) https://wwwnc.cdc.gov/eid/article/22/3/15-1437_article. The dataset contains lower and upper bounds of the incubation period (in days) for 109 patients and information on the death status (0-alive; 1-dead). First, we read the data and transform the discrete daily values into continuous values by adding a uniformly distributed random noise.

# Read data
data <- read.csv(file = "data/Korea_MERS2015.csv", header = TRUE, skip = 1)
data_exposure <- data[data$missing_exposure == 0, ] # Skip missing exposure dates

set.seed(123)
Dobs <- data.frame(tL = data_exposure$IncP_min, tR = data_exposure$IncP_max)
Dobs <- Dobs[-which(Dobs$tR > 20),] # Remove (two) "outliers".
n <- nrow(Dobs)
for(i in 1:n) {
  u1 <- runif(1)
  Dobs$tL[i] <- Dobs$tL[i] + u1
  u2 <- runif(1)
  while (u2 <= u1) u2 <- runif(1)
  Dobs$tR[i] <- Dobs$tR[i] + u2
}

The lower and upper incubation bounds for the first five patients are shown below.

knitr::kable(head(Dobs, 5))
tL tR
0.2875775 15.788305
4.4089769 4.883017
9.9404673 9.956833
8.4533342 8.677571
7.5726334 9.899825

Next, we use the estimIncub() routine with K=15 B-splines and niter=5000 MCMC iterations and plot the estimated incubation distribution together with the incubation bounds. EpiLPS selects a Weibull fit to the data with a mean incubation period of 7.4 days (95% CI 7.1-7.7 days).

incubfit <- estimIncub(x = Dobs, K = 15, niter = 5000, verbose = TRUE, tmax = 20)
## ----------------------------------------------------------------
## Time elapsed: 15.39 seconds.
## Fitted density is Weibull with shape=2.727 and scale=8.308.
## Mean incubation period (days): 7.391 with 95% CI: 7.129-7.658.
## 95th percentile (days): 12.424 with 95% CI: 12.013-12.978.
## ----------------------------------------------------------------
gridExtra::grid.arrange(plot(incubfit, type = "incubwin"), plot(incubfit, type = "pdf"), nrow = 1, ncol = 2)

As we also have information on whether a patient is dead or alive, we can stratify our analysis by death status and compare the estimated incubation distribution of those who survived and those who died from MERS-CoV infection.

set.seed(123)
Dobs2 <- data.frame(tL = data_exposure$IncP_min, tR = data_exposure$IncP_max,
                   surv = data_exposure$Death_status)
Dobs2 <- Dobs2[-which(Dobs2$tR > 20),]
Dobs_surv <- Dobs2[Dobs2$surv == 0,][1:2]
nsurv <- nrow(Dobs_surv)
Dobs_dead <- Dobs2[Dobs2$surv == 1,][1:2]
ndead <- nrow(Dobs_dead)

set.seed(123)
for(i in 1:nsurv) {
  u1 <- runif(1)
  Dobs_surv$tL[i] <- Dobs_surv$tL[i] + u1
  u2 <- runif(1)
  while (u2 <= u1) u2 <- runif(1)
  Dobs_surv$tR[i] <- Dobs_surv$tR[i] + u2
}

for(i in 1:ndead) {
  u1 <- runif(1)
  Dobs_dead$tL[i] <- Dobs_dead$tL[i] + u1
  u2 <- runif(1)
  while (u2 <= u1) u2 <- runif(1)
  Dobs_dead$tR[i] <- Dobs_dead$tR[i] + u2
}
incub_surv <- estimIncub(x = Dobs_surv, K = 15, niter = 5000, verbose = TRUE, tmax = 20)
## ----------------------------------------------------------------
## Time elapsed: 17.94 seconds.
## Fitted density is Weibull with shape=2.856 and scale=8.575.
## Mean incubation period (days): 7.641 with 95% CI: 7.322-7.933.
## 95th percentile (days): 12.591 with 95% CI: 12.121-13.191.
## ----------------------------------------------------------------
incub_dead <- estimIncub(x = Dobs_dead, K = 15, niter = 5000, verbose = TRUE, tmax = 20)
## ----------------------------------------------------------------
## Time elapsed: 11.97 seconds.
## Fitted density is Gamma with shape=5.35 and rate=0.811.
## Mean incubation period (days): 6.598 with 95% CI: 6.013-7.127.
## 95th percentile (days): 11.881 with 95% CI: 10.976-13.011.
## ----------------------------------------------------------------
plot(incub_surv$tg, incub_surv$ftg, type = "l", col = "black", 
     ylab = "Probability density function", xlab = "Incubation period (days)", 
     lwd = 2, ylim = c(0,0.15), main = "MERS-CoV South Korea, 2015")
lines(incub_dead$tg, incub_dead$ftg, type ="l", lty = 2, lwd = 2)
legend("topright", c("Patients who survived", "Patients who died"),
       lty = c(1,2), lwd = c(2,2), bty = "n")

Funding

This work was supported by the ESCAPE project (101095619) and the VERDI project (101045989), funded by the European Union. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Health and Digital Executive Agency (HADEA). Neither the European Union nor the granting authority can be held responsible for them.


References

Gressani, O., Wallinga, J., Althaus, C. L., Hens, N. and Faes, C. (2022). EpiLPS: A fast and flexible Bayesian tool for estimation of the time-varying reproduction number. PLoS Comput Biol 18(10): e1010618. https://doi.org/10.1371/journal.pcbi.1010618

Gressani, O., Torneri, A., Hens, N. and Faes, C. (2023). Flexible Bayesian estimation of incubation times. MedRxiv preprint. 10.1101/2023.08.07.23293752

Backer, J. A., Klinkenberg, D., and Wallinga, J. (2020). Incubation period of 2019 novel coro- navirus (2019-nCoV) infections among travellers from Wuhan, China, 20–28 January 2020. Eurosurveillance, 25(5):2000062.

Virlogeux, V., Park, M., Wu, J. T., & Cowling, B. J. (2016). Association between Severity of MERS-CoV Infection and Incubation Period. Emerging Infectious Diseases, 22(3), 526-528. https://doi.org/10.3201/eid2203.151437.