All code and data needed to reproduce our results can be found in our public GitHub repository: https://github.com/CEIDatUGA/COVID-GA-model.
The epidemiology of SARS-CoV-2 is poorly understood. Here we develop a model for the transmission of SARS-CoV-2 from March 2020 through May 2020 in Georgia, USA. This model is being used for inference, forecasting, and scenario analysis.
Key features of this model include:
This model contains both fixed and fitted parameter values. Fixed parameters were defined using clinical outcome reports. Fitted parameters values were found by calibration to incident case and death reports, as described in more detail below. The pandemic of SARS-CoV-2 is still evolving and information that was used in the construction of this model may be incomplete or contain errors. Accordingly, these results are preliminary, provisional, and subject to change. These results have not been peer-reviewed, but have been prepared to a professional standard with the intention of providing useful interpretation of a rapidly developing event.
We fit the model to incident case and death reports for Georgia, as collated by The COVID Tracking Project.
In addition, we include a covariate that describes human mobility. These data come from Unacast. We smooth the raw data from Unacast using a spline fit, resulting in the trajectory of human movement shown below. This covariate is used to reduce baseline transmission.
The model comprises susceptible, pre-symptomatic, asymptomatic, symptomatic, diagnosed, hospitalized, deceased, and recovered persons. The following compartments are included:
To allow more realistic distributions of movement through compartments, several of these compartments are internally split into multiple stages using the linear chain trick.5
The flow diagram for this model shown below.
We implemented the above model using the R package pomp version 2.7.1.06. All transitions among the compartments in our model were modeled using the Euler multinomial approximation (reulermultinom
in pomp syntax) given the size of the “donating” compartment and the specified or fitted rate of transition.
Note that tranmission rate at time \(t\) is \(\omega(t) = \frac{\beta}{N} \psi(t) \phi(t) \Gamma(t)\), resulting the force of infection \(f(t) = \omega(t) \left(I(t) \right)\), where, for simplicity, \(I\) stands for all infectious individuals at time \(t\). \(\phi(t)\) is the human mobiltiy metric shown in the figure above. \(\psi(t)\) is the result of a latent trend process that is modeled using a fitted spline:
\[ \text{logit}\left(\psi(t)\right) = \sum_{i=1}^K g_i \xi_{i_t}, \]
where \(K\) is the number of basis functions, \(\mathbf{g}\) is a vector of spline coefficients (to be fitted), and \(\mathbf{\xi}\) is a matrix basis functions. We define the number of basis functions (\(K\)) as the number of days in the data set divided by 10 (so, one function for every 10 days). Note the logit transformation to go from the linear scale to the 0 - 1 scale. We used cubic basis functions from the pomp function pomp::bspline_basis()
.
We also included process noise (\(\Gamma(t)\)) on the rate of tranmission (\(\omega(t)\)) at each time step. \(\Gamma(t)\) is modeled as gamma-distributed white noise (temporally uncorrelated noise) with mean 1 and variance \(\sigma^2\)7, which we fit.
Interventions enter our model in three ways. The first is through the augmentation of \(\beta\) by the mobility covariate (\(\phi\)) and the latent process (\(\psi\)). The second is through an increase in the fraction of exposed individuals that are detected and flow into the \(I_{sd}\) compartments. This fraction (\(q\)) starts at a low level of 0.1 and then, starting on March 30, slowly ramps up to 0.4. Third, the time to diagnosis (\(s\)) decreases over time, ramping up starting on March 30. The interventions on \(q\) and \(s\) are applied with a sigmoid function, rather than a linear change over time.
We assume that new, daily case reports arise from a negative binomial distribution whose central tendency is captured by the flow of individuals from \(I_{sd4}\) to \(C_1\) (see flow diagram above). Let this quantity be \(C_{\text{new}}\), which accumulates over the course of one day in the simulation model and resets to zero at the end of each day (the model is simulated at a time step of 1/20 days). Similarly, we assume that new, daily death resports arise from a negative binomial distribution whose central tendency is captured by the flow of individuals from \(C_4\) to \(D\). Let this quantity be \(D_{\text{new}}\), which accumulates over the course of one day in the simulation model and resets to zero at the end of each day. Then, for both new cases and deaths, we model the observation process as:
\[ \text{cases}(t) \sim \text{NB}\left(C_{\text{new}}(t), \theta_C \right), \text{and} \\ \text{deaths}(t) \sim \text{NB}\left(D_{\text{new}}(t), \theta_D \right), \]
where \(\theta_C\) and \(\theta_D\) are the negative binomial dispersion parameters for cases and deaths, respectively. Note that \(\text{cases}(t)\) and \(\text{deaths}(t)\) are the observed number of cases or deaths reported on day t, as shown in the data figure at the top of this document.
We fit the model using Maximization by Iterated particle Filtering (MIF). Observations (daily case and death reports) were modeled as arising from a negative binomial reporting process (see above). We estimated five parameters: baseline transmission rate (\(\beta\)), the fraction of hospitalized cases that result in death (\(m\)), a parameter accounting for extra-demographic process noise (\(\sigma\)), and two negative binomial dispersion parameters (\(\theta_c\) and \(\theta_d\)). In addition, we estimated the basis function parameters (\(g_i\)), whose number depend on the length of the time series at the time of fitting.
All other parameters were fixed at the following values:
Parameter | Description | value |
---|---|---|
frac_trans_e | Relative transmissibility of latent infections | 1.20e-01 |
frac_trans_a | Relative transmissibility of asymptomatic individuals | 5.00e-01 |
frac_trans_c | Relative transmissibility of detected symptomatic individuals post-reporting | 2.70e-01 |
frac_trans_h | Relative transmissibility of hospitalized individuals | 0.00e+00 |
time_e | Time spent in latent compartments (days) | 4.00e+00 |
time_a | Time spent in asymptomatic compartments (days) | 3.50e+00 |
time_su | Time spent in symptomatic, undetected compartments (days) | 6.00e+00 |
time_sd | Time spent in symptomatic, detected compartments (days) | 3.00e+00 |
time_c | Time spent in diagnosed cases compartments (days) | 3.00e+00 |
time_h | Time spent in hospitalized compartments (days) | 6.00e+00 |
max_diag_factor | Maximum for factor by which movement through Isd happens faster (quicker diagnosis) | 2.00e+00 |
diag_rampup | Rate at which faster diagnosis ramps up to max | 1.10e+00 |
t_half_diag | Time at which diagnosis is at 50% of max (in days since March 1) | 2.50e+01 |
max_detect_frac | Maximum fraction of cases that are detected | 4.00e-01 |
detect_rampup | Speed at which fraction detected ramps up | 1.10e+00 |
t_half_detect | Time at which infection detection fraction is at 50% of max (days since March 1) | 2.50e+01 |
base_detect_frac | Minimum fraction detected at March 1 | 1.00e-01 |
frac_asym | Fraction of latent infections that move to aysmptomatic | 1.80e-01 |
frac_hosp | Fraction of detected cases that are hospitalized | 1.20e-01 |
S_0 | Initial number of susceptible individuals on March 1 | 1.06e+07 |
L1_0 | Initial number of latent infectious individuals on March 1 | 4.00e+01 |
Ia1_0 | Initial number of asymptomatic individuals on March 1 | 2.20e+01 |
Isu1_0 | Initial number of symptomatic, undetected individuals on March 1 | 9.00e+01 |
Isd1_0 | Initial number of symptomatic, detected individuals on March 1 | 1.40e+01 |
C1_0 | Initial number of diagnosed cases on March 1 | 2.00e+00 |
H1_0 | Initial number of hospitalized cases on March 1 | 2.00e+00 |
R_0 | Initial number of recovered individuals on March 1 | 0.00e+00 |
D_0 | Initial number of deaths after hospitalization on March 1 | 0.00e+00 |
MIF relies on particle filtering, which estimates the likelihood of fixed parameters by integrating state variables of a stochastic system. To identify the maximum likelihood estimates, MIF lets parameters take a random walk during the filtering process and selectively propagates forward parameter sets (i.e., particles) with the highest likelihood. The variance of the random walk decreases at each iteration of MIF, where a MIF iteration means one filtering pass through the time series. In theory, this procedure converges toward the maximimum likelihood estimates (MLEs).
We used the IF2 algorithm8 implemented in the R9 package pomp version 2.7.1.010 to conduct the MIF procedure. To initialize MIF, we generated 30 parameter sets with values normally distributed around our prior expectations based on a literature review and line list data. We then performed two rounds of MIF, each for 150 iterations with 2,000 particles and geometric cooling. For the first round of MIF we set cooling.factor = 0.9
. For the second round of MIF, which simply continues from where the first round stopped, we set cooling.factor = 0.7
. We then computed the log likelihood of the 30 final MIF parameter sets (i.e., parameter sets collected after 300 MIF iterations) as the log of the mean likelihoods of 10 replicate particle filters with 5,000 particles each. At this stage, we assume the parameter set with highest log likelihood is the MLE. We use all parameter sets with log likelihoods within 2 of the maximum when forecasting.
We assess our model in two ways. First, we calculate the mean absolute error (MAE) of the model by comparing in-sample simulation trajectories to the observed data. To do so, we simulate 100 trajectories from the model from all parameter sets within 2 points of the maximum log likelihood. We then calculate the MAE for each data point across the replicate simulations and then take the average MAE over the time series. We do this independently for case and death data. We smoothed the observations using a 7-day moving average before calculating MAE.
Second, we evaluate the forecast performance of our model by comparing projections to new data. We make benchmark forecasts every Monday that are compared to new data as they emerge. This allows us to evaluate forecasts over a range of prediction horizons.
We use the fitted model to make projections under three scenarios. Under all scenarios, we assume that the latent trend is equal to its average value over the last thirty days. This means we are assuming that preventative measures unrelated to human mobility remain constant at the levels observed over the last 30 days.
Each scenario is simulated 100 times for each parameter set that is within 2 points of the maximum log likelihood parameter set. Initial conditions are determined by simulating 100 trajectories over the observation period and then extracting unobservable state values from the trajectory that most closely aligns with the final observations of case and death reports. Thus, each replicate within an MLE set starts from the same initial conditions, but different MLE sets start from unique initial conditions. We make projections six weeks into the future.
The log likelihood of the highest MLE parameter set is -1605.39. For comparsion, the log likelihood of a simple random walk model is -1730.08. We defined the random walk as \(x(t+1) = x(t) + \sigma\), where \(x\) is the variable of interest (reported cases or deaths) and \(\sigma\) is the standard deviation of the of the time series up to time \(t\). We calculated the log likelihood of the random walk model using the following R code:
loglik_naive <- function(y){
res <- diff(y)
sigma <- sd(res, na.rm = TRUE)
densities <- dnorm(res, sd = sigma, log = TRUE)
sum(densities)
}
We calculated the log likelihood of the random walk model for cases and deaths independently and then sum them to arrive a a joint log likelihood analogous to the one evaluated by the pomp particle filter for the SEIR model.
Our current estimate of baseline transmission rate is 0.65. Our current estimate of the fraction of hospitalized cases that result in death is 0.28. Transmission rate is not constant in our model, however. Below we plot the estimated latent trend (\(\psi\)), the mobility covariate (\(\phi\)), and the resulting trend in transmission over time (\(\beta(t) = \phi(t) \psi(t) \beta\)).
Here we show the fit our model to the data. We plot the trajectory used for determining initial conditions for projections from each MLE parameter set within 2 log likelihood points of the maximum. In the plots, dashed lines are the data, solid blue lines are the mean trajectories across MLE sets, and the lighter blue lines are the trajectories from individual MLE sets.
Overall, the MAE of our model for cases is 674.21. The MAE of our model for deaths is 27.24.
Our initial forecasts with this model anticipated an upward trajectory in cases and deaths that did not materialize (see plots below). The most recent forecast shows much improvement.
The following plot shows the estimated total number of infections for the three scenarios. Shaded regions represent the area within which 95% of the model simulations fall.
The following plots show the observed and projected daily number of recorded cases and deaths for the three scenarios. Dashed line represents the observed data. The green line is our model’s fit to the observed data, and the purple line is the mean projection. Shaded regions represent the area within which 95% of the model simulations fall.
This model is currently being used primarily to make projections of the future transmission of COVID-19. It is hoped that it may help inform public health officials and individuals to make decisions about “re-opening.” In the near future, we hope to use the model to make more formal inference about key aspects of COVID-19 transmission such as the transmission rate over time and the fraction of asymptomatic cases.
Odum School of Ecology and Center for the Ecology of Infectious Diseases, University of Georgia↩
Department of Epidemiology and Biostatistics and Center for the Ecology of Infectious Diseases, University of Georgia↩
Odum School of Ecology and Center for the Ecology of Infectious Diseases, University of Georgia↩
Western EcoSystems Technology, Inc.↩
Hurtado and Kirosingh 2019 (https://link.springer.com/article/10.1007/s00285-019-01412-w)↩
King et al. 2016 (https://www.jstatsoft.org/article/view/v069i12) and King et al. 2020 (https://kingaa.github.io/pomp/)↩
Bretó and Ionides 2011 (https://www.sciencedirect.com/science/article/pii/S0304414911001761)↩
Ionides et al. 2015 (https://www.pnas.org/content/112/3/719)↩
R Core Team 2019 (https://www.r-project.org/)↩
King et al. 2016 (https://www.jstatsoft.org/article/view/v069i12) and King et al. 2020 (https://kingaa.github.io/pomp/)↩