In order to estimate the potential size of the ongoing COVID-19 outbreak, we adapt a model previously developed for the recent Ebola outbreak (Drake and Park 2016). The model assumes that local transmission is contained by increasing the rate of patient isolation over time (as experience, diagnostics and capacity are augmented) effectively shortening the infectious period such that transmission is halted. These dynamics may be modeled as a special case of the generalized birth-death model (Kendall 1948). Embedding this in a global model, via probabilistic sparking of new outbreaks elsewhere, the full model is able to generate a distribution of global epidemic sizes using relatively few, but estimable parameters, and is useful in providing best-case, worst-case and average-case scenarios when region-specific data to support bespoke models is still in collection/collation phase. The embedding of the local transmission model in a global probabilistic framework is done in exactly the same way as previously described (Drake and Park 2016). However, formerly the local outbreak dynamics in the Ebola model were assumed to be curtailed by reduced force of infection, whereas here it is by decreased time-to-isolation.

Local transmission and mean outbreak size

We consider a birth-death process where births (infections) occur at rate \(\lambda\) and deaths (death, recovery or isolation) occur at a rate \(\mu=a_1-(a_1-a_0)e^{-bt}\). Following pioneering work on the generalized birth-death model (Kendall 1948) the mean final size is given by

\[ M=1+\lambda\int_{0}^{\infty} e^{-\rho(t)} dt \]

where \[ \rho(t)=\int_{0}^t \mu(t)-\lambda dt \\ \]

which evaluates to \[ \rho(t)=\frac{(a_1-a_0)e^{-bt}+b(a_1-\lambda)t+(a_0-a_1)}{b} \]

Consequently \[ M=1+\lambda \int_{0}^{\infty} e^{\frac{1}{b}\left((a_0-a_1)e^{-bt}+b(\lambda-a_1)t+(a_1-a_0)\right)} dt \]

For ease of notation, let \(q(t)=-fe^{-bt}-gt+f\) where \(f=\frac{a_1-a_0}{b}\), and \(g=a_1-\lambda\), then \[ M=1+\lambda \int_{0}^{\infty} e^q dt = 1+\lambda \int_{0}^{\infty} e^{-fe^{-bt}-gt+f} dt \]

which can be re-written as \[ M=1+\lambda e^f \int_{0}^{\infty} e^{-fe^{-bt}}e^{-gt} dt \]

Let \(u=fe^{-bt}\), then \(du=-bfe^{-bt}dt\) and \(dt=\frac{-1}{bf}e^{bt}du\). Also, \(\frac{u}{f}=e^{-bt}\) such that \(\rm{ln}\left(\frac{u}{f}\right)=-bt\) and \(t=-\frac{1}{b}\rm{ln}\left(\frac{u}{f}\right)\). Transforming variables from \(t\) to \(u\) in the integral expression results in

\[ M=1-\frac{\lambda e^f}{b f} \int_{f}^{0} e^{-u}e^{\frac{g}{b}\rm{ln}\left(\frac{u}{f}\right)} e^{-\rm{ln}\left(\frac{u}{f}\right)}du \]

which simplifies to \[ M=1-\frac{\lambda e^f}{b f}\left(\frac{1}{f}\right)^{\frac{g}{b}-1} \int_{f}^{0} e^{u} u^{\frac{g}{b}-1}du \]

which further simplifies to \[ M=1+\frac{\lambda e^f}{b f}\left(\frac{1}{f}\right)^{\frac{g}{b}-1} \gamma \left( \frac{g}{b}, f \right) \]

where \(\gamma\) is the lower incomplete gamma function, and as a recap, \(f=\frac{a_1-a_0}{b}\) and \(g=a_1-\lambda\).

The learning rate, \(b\), then, describes how quickly local time-to-isolation of a patient improves. For this scenario, we assume improvement is from 7 days infectious in a community (an assumed natural infectious period), to a minimum of one day:

t <- 0:500
a1 <- 1 # best rate, equiv to infectious period = 1 day
a0 <- 1/7 # worst rate, equiv to infectious period = 7 days
b <- 0.001
mu <- a1-(a1-a0)*exp(-b*t)
slate <- rev(c("gray1","gray20","gray40","gray60","gray80"))
plot(t,1/mu,type="l",ylim=c(0,8),col=slate[1],ylab="Time-to-isolation, 1/mu",xlab="Time, days")
other_bs <- c(0.005,0.01,0.05,0.1)
for (i in 1:4){
  b <- other_bs[i]
  mu <- a1-(a1-a0)*exp(-b*t)

Figure 1: Example parameterizations of how patient time-to-isolation decreases during a local outbreak as a function of learning rate \(b\).

To satisfy ourselves that the result for mean local outbreak size is correct, we can measure the final size from a deterministic model and compare it to the analytical result.

eventfun <- function(t, y, parms){
  with(as.list(y), {
    y[y < 1e-6] <- 0
ken <- function(t,x,params){
  n <- x[1]
  r <- x[2]
    mu <- a1-(a1-a0)*exp(-b*t)
    dn <- lambda*n-mu*n
    dr <- mu*n
maxTime <- 500.0 
analyt <- NULL
results <- NULL
for (bs in seq(0.005,0.02,0.001)){
params<-c(a1=1,a0=1/7,b=bs,lambda=0.371)  # model parameters
xstart<-c(n=1,r=0)  # initial conditions
output<-as.data.frame(lsoda(xstart,times,ken,params,events=list(func=eventfun,time=0:maxTime))) # tells computer to solve (integrate) equations
results <- rbind(results,c(bs,tail(output)[6,3]))
M<- 1+(l*exp(f))/(b*f)*(1/f)^((g/b)-1)*Igamma(g/b,f)
plot(results[,2],analyt,xlab="Simulated outbreak size",ylab="Analytical outbreak size",log="xy",las=1)

Figure 2: Comparison of outbreak sizes from deterministic simulation of the birth-death model and the analytical result presented, with all comparisons lying on the the 1:1 line.

Further, we can see how local outbreak size drops as a function of rate of improving patient isolation (\(b\)), done here using other parameters \(a_1=1,a_0=1/7,\lambda=0.371\). Parameters \(a_0\) and \(\lambda\) are estimated from data (CEID and others). Together, parameters \(a_0\) and \(a_1\) describe a situation where patient isolation during local outbreaks is initially absent (leading to a recovery rate of 1/7 days\(^{-1}\)) and improves (to one day) with parameter \(b\) describing how fast a local area moves between these two isolation times. Parameters \(\lambda\) is in broad agreement with estimates. Parameters describing recovery rate and augmented patient isolation are plausible estimates.

plot(seq(0.005,0.02,0.001),analyt,type="l",ylab="Local outbreak size",xlab="Rate of isolation time improvement",las=1)

Figure 3: Local outbreak size as a function of learning rate \(b\), here varied between \(b=0.005\) and \(b=0.02\), with \(a_1=1,a_0=1/7,\lambda=0.371\).

Scaling from local to global case number estimates

To scale up from local outbreaks to epidemics we adopt a probabilistic model in which local outbreaks are connected by movement of infected individuals among communities. In general, we assume that the number of uninfected communities is large so that the chance that an infected individual sparks an outbreak in another community may be represented by a small constant \(0 < \varepsilon \ll 1\). Let \(p_x\) be the probability mass function for an outbreak of size \(x\). Since the probability that an individual doesnโ€™t spark a secondary outbreak is \(1-\varepsilon\), the probability that an outbreak of size \(x\) fails to spark a secondary outbreak will be \((1- \varepsilon)^x\) by an assumption of independence. The probability that there is an outbreak of size \(x\) and that it fails to spark any secondary outbreaks is therefore \(p_x(1-\varepsilon)^x\). By enumeration of all possible outbreak sizes, the probability that an outbreak of unknown size will spark at least one secondary outbreak is

\[ \label{eq:alpha} \alpha = 1- \sum_{x=1}^\infty p_x(1-\varepsilon)^x. \]

With \(\varepsilon \ll 1\), we assume that each outbreak sparks, at most, only one secondary outbreak.

Let \(j = 1, 2, 3, ... n\) index the local outbreaks so that \(j_n\) is the total number of local outbreaks. The probability that the first outbreak is also the last one is just \(p(j_n=1) = 1- \alpha\). By contrast, the probability that the first outbreak gives rise to a secondary outbreak (with probability \(\alpha\)) and that the second outbreak fails to give rise to a third (with probability \(1-\alpha\)) is \(p(j_n=2) = \alpha (1-\alpha)\). Proceeding to \(j=3\), the probability that both outbreaks one and two give rise to a secondary outbreak and that the third outbreak is the last yields \(p(j=3) = \alpha^2(1-\alpha)\). By induction, we see that the general rule is given by

\[ \label{eq:fm} f(m) = p(j_n=m) = \alpha^{m-1}(1-\alpha). \]

The next challenge is to ascertain the total number of cases in these \(m\) outbreaks. Let \(X_j\) be the random number of cases in the \(j^{th}\) outbreak. The total number of cases in the epidemic will be the sum of cases in the local outbreaks, i.e.,

\[ \label{eq:cases} Y_m = \sum_{j=1}^m X_j. \]

Since the \(X_j\) are independently and identically distributed according to distribution \(p_x\), it follows that the distribution of \(Y_m\) is just the \(m\)-fold convolution of \(p_x\), denoted \(p_x^{m\ast}\). The probability that there are exactly \(m\) outbreaks and that these give rise to \(Y\) cases is

\[ \label{eq:epidemic-size} p_y = p_x^{m\ast}f(m). \]

Using established notation (Johnson, Kotz, and Kemp 1992), we have the following re-parameterization for the distribution of outbreak sizes.

\[ M = (1-p)/p \rightarrow p = 1/(M+1), \]

\[ P = (1-p)/p = M, \]


\[ Q = 1/p = M+1. \]

If \(k\) outbreaks are summed, the result is negative binomially distributed with parameters \(k\) and \(P\). Let \(k\) be the number of non-primary outbreaks. Applying the same rationale as applied earlier, we obtain \(P(k=0) = 1-\alpha = a\) and in general \(P(k=n) = (1-a)^na\). So, the number of non-primary outbreaks is a geometric distribution with parameter \(p=a\).

The distribution formed by taking a negative binomial with \(k\) drawn from a geometric distribution with parameters \(Q'\) and \(P'\) is also a geometric distribution with parameter \(QQ'-P'\). Identifying parameters, we have \(Q'=1/(1-\alpha)\) and \(P'=\alpha Q'\) yielding \(Q=(M+1)(\frac{1}{1-\alpha}) - \frac{\alpha}{(1-\alpha)}\). Expanding to obtain the unconditional total epidemic size distribution (not including primary outbreak), we have

\[ P(Y=y) = \pi(1-\pi)^{y-1}, \]


\[ \pi=\left ( (M+1)(\frac{1}{1-\alpha}) - \frac{\alpha}{(1-\alpha)} \right )^{-1}. \]

This simplifies to

\[ \label{eq:final-size-distn} P(Y=y) = \frac{(1-\alpha)(M/(M+1-\alpha))^{y-1}}{M+1-\alpha} \]

with expected value

\[ \label{eq:mean} 1/\pi = (M+1)(\frac{1}{1-\alpha}) - \frac{\alpha}{(1-\alpha)}. \]

From this we can add the mean size for the primary outbreak to estimate the average global epidemic size as

\[ M+(M+1)(\frac{1}{1-\alpha}) - \frac{\alpha}{(1-\alpha)}. \]

This derivation of this equation relies on approximations for the probability of a secondary outbreak given an outbreak of unknown size and the distribution of outbreak sizes (assumed to be approximated by a geometric distribution), as well as the assumption that outbreak number and outbreak sizes are independent. By embedding the local mean outbreak size estimate in a model that assumes each infectious person has a probability of sparking a new outbreak, we arrive at a distribution of global outbreak sizes, as a function of just a few model parameters. From this distribution, we illusrate here the mean and worst case (99th percentile) for global outbreak size with \(\lambda=0.371,a_0=1/7,a_1=1\) as a function of sparking probability (\(\varepsilon\)) and learning rate (\(b\)).


# fixed parameters
a1 <- 1
a0 <- 1/7
b <- 0.005
l <- 0.371

# function and evaluation of M at different learning rates
getM <- function(b,l,f,g) Q<-1+(l*exp(f))/(b*f)*(1/f)^((g/b)-1)*Igamma(g/b,f)

# function and evaluation of alpha at different sparking rates
getAlpha <- Vectorize(function(epsilon,b,l,f,g){
  M <- getM(b,l,f,g)
  p <- 1/(M+1)
  q <- 1-p
  summand <-c()
  for(x in 1:100){
    px <- p*q^x
    summand[x] <- px*(1-epsilon)^x
  return (1-sum(summand))

# mean epidemic size
mean.size2 <- function(epsilon,b,a0=1/7,a1=1,l=0.371){
  ((getM(b,l=0.371,f,g)+1)*(1/(1-getAlpha(epsilon, b,l=0.371,f,g))) - getAlpha(epsilon, b,l=0.371,f,g)/(1-getAlpha(epsilon,b,l=0.371,f,g)))
bs <- 10^seq(-2.35, -1, length=50)
epsilons <- 10^seq(-5,-1, length=50)

# 99th quantile
pi <- function(epsilon, b) (mean.size2(epsilon,b,a0=1/7,a1=1,l=0.371))^-1
Qg <- function(epsilon, b, q=0.99) log(1-q)/log(1-pi(epsilon, b))
Qb <- function(epsilon, b, q=0.01) log(1-q)/log(1-pi(epsilon, b))

q <- expand_grid(bs,epsilons)
q %<>% mutate(avCases=M+mean.size2(epsilon = epsilons,b=bs),mostCases=M+Qg(epsilon=epsilons,b=bs),leastCases=M+Qb(epsilon=epsilons,b=bs))

#Heatmap plots
#q %>% ggplot(.,aes(x=epsilons,y=bs))+geom_tile(aes(fill=log10(avCases)))+geom_contour(aes(z=log10(avCases)),col="black")+scale_x_continuous(trans="log10")+scale_y_continuous(trans="log10")+scale_fill_gradientn(colours = rev(heat.colors(10)))+xlab(expression(paste("Sparking probability, ",epsilon)))+ylab("Learning rate, b")

#q %>% ggplot(.,aes(x=epsilons,y=bs))+geom_tile(aes(fill=log10(mostCases)))+geom_contour(aes(z=log10(mostCases)),col="black")+scale_x_continuous(trans="log10")+scale_y_continuous(trans="log10")+scale_fill_gradientn(colours = rev(heat.colors(10)))+xlab(expression(paste("Sparking probability, ",epsilon)))+ylab("Learning rate, b")

#2d plots
q %>% gather(contains("Cases"), key=outcome, value=size) %>%
  ggplot(.,aes(x=bs,y=size,col=outcome)) +
  geom_point(alpha = 0.25, size=1) +
  geom_smooth(size=.5) +
                     labels = c('1','10','100','1 k','10 k','100 k', '1 M' ,'10 M', '100 M'),
                     minor_breaks = NULL) +
                     minor_breaks = c(seq(.001,.01,.001),seq(.02,.1,.01))) +
  xlab("Learning rate, b") + ylab("Global outbreak size")