Reproducible Materials

All code and data needed to reproduce our results can be found in our public GitHub repository: https://github.com/CEIDatUGA/COVID-GA-model.


Introduction

The epidemiology of SARS-CoV-2 is poorly understood. Here we develop a model for the transmission of SARS-CoV-2 from March 2020 through May 2020 in Georgia, USA. This model is being used for inference, forecasting, and scenario analysis.

Key features of this model include:

  1. Stochastic transmission process. Stochastic models are models with a realistic level of random variation. Stochastic models are essential for proper modeling of systems that start with a small number of infections.
  2. Realistic interval distributions for presymptomatic and symptomatic periods.
  3. Transmission is allowed at different rates for asymptomatic, presymptomatic, and symptomatic individuals.
  4. Time varying rates of case detection, isolation, and case notification.
  5. Realistic intervention scenarios.
  6. Affect of human mobility on transmission (i.e., social distancing).
  7. A latent process that allows transmission to vary over time due to environmental factors and other behavioral measures that can reduce transmission but are difficult to include with data (e.g., wearing face masks).

This model contains both fixed and fitted parameter values. Fixed parameters were defined using clinical outcome reports. Fitted parameters values were found by calibration to incident case and death reports, as described in more detail below. The pandemic of SARS-CoV-2 is still evolving and information that was used in the construction of this model may be incomplete or contain errors. Accordingly, these results are preliminary, provisional, and subject to change. These results have not been peer-reviewed, but have been prepared to a professional standard with the intention of providing useful interpretation of a rapidly developing event.

Methods

Data

We fit the model to incident case and death reports for Georgia, as collated by The COVID Tracking Project.

In addition, we include a covariate that describes human mobility. These data come from Unacast. We smooth the raw data from Unacast using a spline fit, resulting in the trajectory of human movement shown below. This covariate is used to reduce baseline transmission.

The Model

The model comprises susceptible, pre-symptomatic, asymptomatic, symptomatic, diagnosed, hospitalized, deceased, and recovered persons. The following compartments are included:

  • \(\boldsymbol{S}\) - Uninfected and susceptible individuals. Susceptible individuals can become infected by individuals in the \(L\), \(I_a\), \(I_{su}\), \(I_{sd}\), \(C\), and \(H\) stages. Rates of transmission from these stages can be adjusted individually.
  • \(\boldsymbol{L}\) - Individuals with latent infections who do not yet show symptoms. Those individuals can be infectious. At the end of the \(L\) stage, a fraction moves into the \(I_a\) stage, another fraction moves into the \(I_{su}\) stage, and the remainder into the \(I_{sd}\) stage.
  • \(\boldsymbol{I_a}\) - Individuals who are infected and asymptomatic. Those individuals are likely infectious, but the model allows to adjust this.
  • \(\boldsymbol{I_{su}}\) - Individuals who are infected and symptomatic, but are undetected. Those individuals are likely infectious. Individuals in this compartment never get diagnosed, and are assumed to recover.
  • \(\boldsymbol{I_{sd}}\) - Individuals who are infected and symptomatic, and are detected. Those individuals are likely infectious. Individuals in this compartment will be diagnosed and move to \(C\).
  • \(\boldsymbol{C}\) - Individuals who have been diagnosed as cases. Those individuals are likely isolated and not infectious, but the model allows to adjust this. A fraction of individuals in the \(C\) stage will naturally recover, without the need for hospitalization. The remainder moves into the \(H\) stage.
  • \(\boldsymbol{H}\) - Individuals who have been hospitalized. Those individuals are likely isolated and not infectious, but the model allows to adjust this. A fraction of individuals in the \(H\) stage will recover, the remainder will die.
  • \(\boldsymbol{R}\) - Recovered/removed individuals. Those individuals have recovered and are immune.
  • \(\boldsymbol{D}\) - Individuals who died from the infection.

To allow more realistic distributions of movement through compartments, several of these compartments are internally split into multiple stages using the linear chain trick.5

  • \(\boldsymbol{L}\) - 4 compartments
  • \(\boldsymbol{I_a}\) - 4 compartments
  • \(\boldsymbol{I_{su}}\) - 4 compartments
  • \(\boldsymbol{I_{sd}}\) - 4 compartments
  • \(\boldsymbol{C}\) - 4 compartments
  • \(\boldsymbol{H}\) - 4 compartments

The flow diagram for this model shown below.

We implemented the above model using the R package pomp version 2.7.1.06. All transitions among the compartments in our model were modeled using the Euler multinomial approximation (reulermultinom in pomp syntax) given the size of the “donating” compartment and the specified or fitted rate of transition.

Note that tranmission rate at time \(t\) is \(\omega(t) = \frac{\beta}{N} \psi(t) \phi(t) \Gamma(t)\), resulting the force of infection \(f(t) = \omega(t) \left(I(t) \right)\), where, for simplicity, \(I\) stands for all infectious individuals at time \(t\). \(\phi(t)\) is the human mobiltiy metric shown in the figure above. \(\psi(t)\) is the result of a latent trend process that is modeled using a fitted spline:

\[ \text{logit}\left(\psi(t)\right) = \sum_{i=1}^K g_i \xi_{i_t}, \]

where \(K\) is the number of basis functions, \(\mathbf{g}\) is a vector of spline coefficients (to be fitted), and \(\mathbf{\xi}\) is a matrix basis functions. We define the number of basis functions (\(K\)) as the number of days in the data set divided by 10 (so, one function for every 10 days). Note the logit transformation to go from the linear scale to the 0 - 1 scale. We used cubic basis functions from the pomp function pomp::bspline_basis().

We also included process noise (\(\Gamma(t)\)) on the rate of tranmission (\(\omega(t)\)) at each time step. \(\Gamma(t)\) is modeled as gamma-distributed white noise (temporally uncorrelated noise) with mean 1 and variance \(\sigma^2\)7, which we fit.

Interventions enter our model in three ways. The first is through the augmentation of \(\beta\) by the mobility covariate (\(\phi\)) and the latent process (\(\psi\)). The second is through an increase in the fraction of exposed individuals that are detected and flow into the \(I_{sd}\) compartments. This fraction (\(q\)) starts at a low level of 0.1 and then, starting on March 30, slowly ramps up to 0.4. Third, the time to diagnosis (\(s\)) decreases over time, ramping up starting on March 30. The interventions on \(q\) and \(s\) are applied with a sigmoid function, rather than a linear change over time.

We assume that new, daily case reports arise from a negative binomial distribution whose central tendency is captured by the flow of individuals from \(I_{sd4}\) to \(C_1\) (see flow diagram above). Let this quantity be \(C_{\text{new}}\), which accumulates over the course of one day in the simulation model and resets to zero at the end of each day (the model is simulated at a time step of 1/20 days). Similarly, we assume that new, daily death resports arise from a negative binomial distribution whose central tendency is captured by the flow of individuals from \(C_4\) to \(D\). Let this quantity be \(D_{\text{new}}\), which accumulates over the course of one day in the simulation model and resets to zero at the end of each day. Then, for both new cases and deaths, we model the observation process as:

\[ \text{cases}(t) \sim \text{NB}\left(C_{\text{new}}(t), \theta_C \right), \text{and} \\ \text{deaths}(t) \sim \text{NB}\left(D_{\text{new}}(t), \theta_D \right), \]

where \(\theta_C\) and \(\theta_D\) are the negative binomial dispersion parameters for cases and deaths, respectively. Note that \(\text{cases}(t)\) and \(\text{deaths}(t)\) are the observed number of cases or deaths reported on day t, as shown in the data figure at the top of this document.

Model Fitting

We fit the model using Maximization by Iterated particle Filtering (MIF). Observations (daily case and death reports) were modeled as arising from a negative binomial reporting process (see above). We estimated five parameters: baseline transmission rate (\(\beta\)), the fraction of hospitalized cases that result in death (\(m\)), a parameter accounting for extra-demographic process noise (\(\sigma\)), and two negative binomial dispersion parameters (\(\theta_c\) and \(\theta_d\)). In addition, we estimated the basis function parameters (\(g_i\)), whose number depend on the length of the time series at the time of fitting.

All other parameters were fixed at the following values:

Parameter Description value
frac_trans_e Relative transmissibility of latent infections 1.20e-01
frac_trans_a Relative transmissibility of asymptomatic individuals 5.00e-01
frac_trans_c Relative transmissibility of detected symptomatic individuals post-reporting 2.70e-01
frac_trans_h Relative transmissibility of hospitalized individuals 0.00e+00
time_e Time spent in latent compartments (days) 4.00e+00
time_a Time spent in asymptomatic compartments (days) 3.50e+00
time_su Time spent in symptomatic, undetected compartments (days) 6.00e+00
time_sd Time spent in symptomatic, detected compartments (days) 3.00e+00
time_c Time spent in diagnosed cases compartments (days) 3.00e+00
time_h Time spent in hospitalized compartments (days) 6.00e+00
max_diag_factor Maximum for factor by which movement through Isd happens faster (quicker diagnosis) 2.00e+00
diag_rampup Rate at which faster diagnosis ramps up to max 1.10e+00
t_half_diag Time at which diagnosis is at 50% of max (in days since March 1) 2.50e+01
max_detect_frac Maximum fraction of cases that are detected 4.00e-01
detect_rampup Speed at which fraction detected ramps up 1.10e+00
t_half_detect Time at which infection detection fraction is at 50% of max (days since March 1) 2.50e+01
base_detect_frac Minimum fraction detected at March 1 1.00e-01
frac_asym Fraction of latent infections that move to aysmptomatic 1.80e-01
frac_hosp Fraction of detected cases that are hospitalized 1.20e-01
S_0 Initial number of susceptible individuals on March 1 1.06e+07
L1_0 Initial number of latent infectious individuals on March 1 4.00e+01
Ia1_0 Initial number of asymptomatic individuals on March 1 2.20e+01
Isu1_0 Initial number of symptomatic, undetected individuals on March 1 9.00e+01
Isd1_0 Initial number of symptomatic, detected individuals on March 1 1.40e+01
C1_0 Initial number of diagnosed cases on March 1 2.00e+00
H1_0 Initial number of hospitalized cases on March 1 2.00e+00
R_0 Initial number of recovered individuals on March 1 0.00e+00
D_0 Initial number of deaths after hospitalization on March 1 0.00e+00

MIF relies on particle filtering, which estimates the likelihood of fixed parameters by integrating state variables of a stochastic system. To identify the maximum likelihood estimates, MIF lets parameters take a random walk during the filtering process and selectively propagates forward parameter sets (i.e., particles) with the highest likelihood. The variance of the random walk decreases at each iteration of MIF, where a MIF iteration means one filtering pass through the time series. In theory, this procedure converges toward the maximimum likelihood estimates (MLEs).

We used the IF2 algorithm8 implemented in the R9 package pomp version 2.7.1.010 to conduct the MIF procedure. To initialize MIF, we generated 30 parameter sets with values normally distributed around our prior expectations based on a literature review and line list data. We then performed two rounds of MIF, each for 150 iterations with 2,000 particles and geometric cooling. For the first round of MIF we set cooling.factor = 0.9. For the second round of MIF, which simply continues from where the first round stopped, we set cooling.factor = 0.7. We then computed the log likelihood of the 30 final MIF parameter sets (i.e., parameter sets collected after 300 MIF iterations) as the log of the mean likelihoods of 10 replicate particle filters with 5,000 particles each. At this stage, we assume the parameter set with highest log likelihood is the MLE. We use all parameter sets with log likelihoods within 2 of the maximum when forecasting.

Model Assessment

We assess our model in two ways. First, we calculate the mean absolute error (MAE) of the model by comparing in-sample simulation trajectories to the observed data. To do so, we simulate 100 trajectories from the model from all parameter sets within 2 points of the maximum log likelihood. We then calculate the MAE for each data point across the replicate simulations and then take the average MAE over the time series. We do this independently for case and death data. We smoothed the observations using a 7-day moving average before calculating MAE.

Second, we evaluate the forecast performance of our model by comparing projections to new data. We make benchmark forecasts every Monday that are compared to new data as they emerge. This allows us to evaluate forecasts over a range of prediction horizons.

Projection Scenarios

We use the fitted model to make projections under three scenarios. Under all scenarios, we assume that the latent trend is equal to its average value over the last thirty days. This means we are assuming that preventative measures unrelated to human mobility remain constant at the levels observed over the last 30 days.

  1. Status quo: We assume that social distancing remains at the level of the last observation at \(t = T\), meaning that all future \(\phi\) equal \(\phi(T)\).
  2. Increased social distancing: We assume that social distancing increases such that \(\phi \rightarrow 0.3\) linearly over the course of seven days and then remains at 0.3.
  3. Return to normal: We assume that social distancing decreases such that \(\phi \rightarrow 1\) linearly over the course of seven days and then remains at 1.

Each scenario is simulated 100 times for each parameter set that is within 2 points of the maximum log likelihood parameter set. Initial conditions are determined by simulating 100 trajectories over the observation period and then extracting unobservable state values from the trajectory that most closely aligns with the final observations of case and death reports. Thus, each replicate within an MLE set starts from the same initial conditions, but different MLE sets start from unique initial conditions. We make projections six weeks into the future.

Results

Parameter Estimates

The log likelihood of the highest MLE parameter set is -1605.39. For comparsion, the log likelihood of a simple random walk model is -1730.08. We defined the random walk as \(x(t+1) = x(t) + \sigma\), where \(x\) is the variable of interest (reported cases or deaths) and \(\sigma\) is the standard deviation of the of the time series up to time \(t\). We calculated the log likelihood of the random walk model using the following R code:

loglik_naive <- function(y){
  res <- diff(y)
  sigma <- sd(res, na.rm = TRUE)
  densities <- dnorm(res, sd = sigma, log = TRUE)
  sum(densities)
}

We calculated the log likelihood of the random walk model for cases and deaths independently and then sum them to arrive a a joint log likelihood analogous to the one evaluated by the pomp particle filter for the SEIR model.

Our current estimate of baseline transmission rate is 0.65. Our current estimate of the fraction of hospitalized cases that result in death is 0.28. Transmission rate is not constant in our model, however. Below we plot the estimated latent trend (\(\psi\)), the mobility covariate (\(\phi\)), and the resulting trend in transmission over time (\(\beta(t) = \phi(t) \psi(t) \beta\)).

Model Fits

Here we show the fit our model to the data. We plot the trajectory used for determining initial conditions for projections from each MLE parameter set within 2 log likelihood points of the maximum. In the plots, dashed lines are the data, solid blue lines are the mean trajectories across MLE sets, and the lighter blue lines are the trajectories from individual MLE sets.

Overall, the MAE of our model for cases is 674.21. The MAE of our model for deaths is 27.24.

Our initial forecasts with this model anticipated an upward trajectory in cases and deaths that did not materialize (see plots below). The most recent forecast shows much improvement.

Model Projections

Cumulative infections over time

The following plot shows the estimated total number of infections for the three scenarios. Shaded regions represent the area within which 95% of the model simulations fall.

Daily cases and deaths over time (data, model fit, projections)

The following plots show the observed and projected daily number of recorded cases and deaths for the three scenarios. Dashed line represents the observed data. The green line is our model’s fit to the observed data, and the purple line is the mean projection. Shaded regions represent the area within which 95% of the model simulations fall.

Next steps

This model is currently being used primarily to make projections of the future transmission of COVID-19. It is hoped that it may help inform public health officials and individuals to make decisions about “re-opening.” In the near future, we hope to use the model to make more formal inference about key aspects of COVID-19 transmission such as the transmission rate over time and the fraction of asymptomatic cases.


  1. Odum School of Ecology and Center for the Ecology of Infectious Diseases, University of Georgia

  2. Department of Epidemiology and Biostatistics and Center for the Ecology of Infectious Diseases, University of Georgia

  3. Odum School of Ecology and Center for the Ecology of Infectious Diseases, University of Georgia

  4. Western EcoSystems Technology, Inc.

  5. Hurtado and Kirosingh 2019 (https://link.springer.com/article/10.1007/s00285-019-01412-w)

  6. King et al. 2016 (https://www.jstatsoft.org/article/view/v069i12) and King et al. 2020 (https://kingaa.github.io/pomp/)

  7. Bretó and Ionides 2011 (https://www.sciencedirect.com/science/article/pii/S0304414911001761)

  8. Ionides et al. 2015 (https://www.pnas.org/content/112/3/719)

  9. R Core Team 2019 (https://www.r-project.org/)

  10. King et al. 2016 (https://www.jstatsoft.org/article/view/v069i12) and King et al. 2020 (https://kingaa.github.io/pomp/)