Overview

The goal of this study is to retrospectively determine the factors that influenced the spatiotemporal spread of COVID-19 between counties in the United States during the first wave of the pandemic. Specifically, we aim to explain the role of county-level attributes and county-county mobility patterns on the spread of COVID-19. Additionally, the model can aid in predicting future spatial spread in the United States in the event of regional containment.

Our approach involves fitting a stochastic model that estimates the probability of COVID-19 importation into new counties in the United States. The model is updated daily from March 1, 2020 to August 3, 2020. The observed date of first infection for each county is based on COVID-19 case data reported at the county level by the New York Times which are based on reports from state and local health agencies. The probabilities of COVID-19 importation to all potential receiving counties, from all counties that have reported COVID-19 cases (sources), at each time step are defined by a generalized gravity model.

The results in this summary are from the best fit model which estimates the probability of COVID-19 importation based on:

  • county population sizes
  • distances between counties
  • total COVID-19 cases reported in the previous 10 days in the source county
  • the estimated number of commuters between counties based on the American Community Survey (ACS)
  • the estimated number of daily flight passengers traveling between counties from March 2020-July 2020
  • four non-pharmaceutical interventions in place in source counties:
    • bar closures
    • stay-at-home orders
    • mask requirements
    • gathering size restrictions

An additional covariate that we tested, but was not included in the best-fit model is the county-county Facebook Social Connectedness Index (SCI).

Model details

The results in this summary are from the best fit model which estimates the probability of COVID-19 importation based on county population sizes (\(\text{mass}_{ij}=\text{pop}_i*\text{pop}_j\)), distances (\(\text{dist}_{ij}\)) between counties, total COVID-19 cases reported in the previous 10 days in the source county \(i\) (\(\text{cases}_{ij}\)), the estimated number of commuters between counties \(i\) and \(j\) based on the American Community Survey (\(\text{commute}_{ij}\)), the estimated number of daily flight passengers traveling between counties \(i\) and \(j\) (\(\text{flight}_{ij}\)) from March 2020-July 2020 based on Official Airline Guide (OAG) data, and four non-pharmaceutical interventions in place in counties \(i\) (\(\text{bars}_i\), \(\text{sah}_i\), \(\text{mask}_i\), \(\text{gather}_i\)).

\[\begin{equation} p_{ij} = \left(1 + e^{\beta_0 + \frac{\text{mass}^{\beta_{2}}_{ij} \text{cases}^{\beta_3}_i}{\text{dist}^{\beta1}_{ij}} + \beta_4\log(\text{commute}_{ij} + 1) + \beta_5\log(\text{flights}_{ij}) + \beta_6\text{bars}_i + \beta_7\text{sah}_i + \beta_8\text{mask}_i + \beta_9\text{gather}_i}\right)^{-1} \;\; (Eq. 1) \end{equation}\]

Models are fit using maximum likelihood estimation and the best model is selected using AIC.

Parameter estimates

Table 1 contains the parameter estimates for the model specified by Eq. (1), which estimates that infection probability increases with population size and decreases with distance between counties. Higher numbers of COVID-19 cases in the source is also associated with higher infection probability. Counties with higher commuting and domestic flight passenger flows between them also have higher risk of COVID-19 transmission. Stay at home orders, mask mandates, and gathering size restrictions in place in source counties are associated with a lower probability of COVID-19 spread from source counties to uninfected counties. Bar closures show the opposite trend, where bar closures in the source counties are associated with higher risk of COVID-19 spread from those source counties (Table 1, Figure 1).

Table 1. Parameter estimates for best-fit model
model intercept dist mass cases commute flights bars sah mask gather
Model1 8.03 -1.77 -1 -0.7 -0.47 -0.16 -1.15 0.94 1.2 0.09

Comparing Interventions

We also use the fitted model to compare the effectiveness of different interventions - bar closures, stay-at-home orders, mask requirements, and gathering size limits - in reducing the probability of spread of COVID-19 from source counties to receiving counties. Parameter estimates are depicted in Figure 1. Three of the four interventions are quite effective at reducing spatial spread. Bar closures appear to be associated with increased COVID-19 importation risk, but this is likely because many counties close bars around the same time they have a first case.

Figure 1: Parameter estimates from four intervention types. Error bars show asymmetric likelihood profile standard errors. Positive parameter values indicate that an intervention decreases the probability of spread in our model.

Limitations

The model is a closed system with US counties as the only potential sources of transmission, that is, our model cannot consider international importations. To account for this, we use data recorded after March 1, 2020. New cases after this date are thought to be predominately a result of widespread local transmission.1 Furthermore testing criteria was expanded on March 4 to include individuals without international travel history2. Testing availability was still limited for some time, so the infection times we fit are likely to be biased to later than the true infection times. We can be confident in our model estimates insofar as case ascertainment did not vary systematically across geographies.

Map: Outbreak probability by county

The following map shows the model-predicted probability of reporting the first case in the next period. Probabilities change over time as underlying conditions, such as the number of cases in neighboring counties, change. Use the slider to show probabilities for a different day. Counties turn gray once they report their first case.

Map: County-county commuting flows

According to the best fit model, the number of commuters between counties is positively associated with higher COVID-19 transmission. The commuting flows are based on estimates from the 2011-2015 ACS commuting survey from the US Census. These connections are predominantly short-distance commutes between cities and their surrounding suburbs, but also notably contain long-distance commuting flows.

Fig. 3: These lines connect counties that account for the top 0.1% (n = 4944) of the strongest pairwise county-county commuting flows

Map: County-county domestic flight passenger flows

The volume of county-county domestic flight passengers is also associated positively associated with higher risk of COVID-19 spread. Data on domestic flight passenger volume are from the Official Airline Guide (OAG). Passengers were allocated to counties in catchment areas surrounding airports, with proportion of passengers allocated to counties based on the county’s population and distance to an airport, with a lower proportion of passengers allotted to counties as the radius increased from the airports and as the population decreased.

We originally fit the model to a static data set of the mean of 2019 flight passenger volume, which was an improvement when compared to a model without any flight data. However, a model that was fit with a time-varying data set of flight volume from the period of the pandemic (March 2020-July 2020) outperformed the model with the 2019 data only. This provided evidence that the relative county-county flight passenger flows varied throughout the pandemic months, and have had unequal changes in their passenger volume compared to baseline. This is evident by the data in Fig. 4, with some paths returning to close to the 2019 baseline quickly whereas others remain far below baseline even in July (e.g. flight volume to Hawaii counties, New York City counties, etc.). However, since the data on passenger volume is not available in real-time, we suggest that if the model is to be used for predicting COVID-19 transmission risk in the future, using historic flight volume averages is still beneficial.

Fig. 4: The lines connect counties that account for the top 0.01% strongest pairwise connections from 2019 (n = 494), and are colored by the each month’s volume as a % of the 2019 baseline.


  1. Davis JT, Chinazzi M, Perra N, et al. Estimating the establishment of local transmission and the cryptic phase of the COVID-19 pandemic in the USA. Preprint. medRxiv. 2020;2020.07.06.20140285. Published 2020 Jul 7. doi:10.1101/2020.07.06.20140285↩︎

  2. CDC, “Updated Guidance on Evaluating and Testing Persons for Coronavirus Disease 2019 (COVID-19)”; https://emergency.cdc.gov/han/2020/han00429.asp.↩︎