COVID-19 Forecasts Using Internet Search Information in The United States
COVID-19 Forecasts Using Internet Search Information in The United States
A Preprint
November 4, 2021
Abstract
As the COVID-19 ravaging through the globe, accurate forecasts of the disease spread is
crucial for situational awareness, resource allocation, and public health decision-making.
Alternative to the traditional disease surveillance data collected by the United States (US)
Centers for Disease Control and Prevention (CDC), big data from Internet such as online
search volumes has been previously shown to contain valuable information for tracking
infectious disease dynamics such as influenza epidemic. In this study, we evaluate the
feasibility of using Internet search volume of relevant queries to track and predict COVID-
19 pandemic. We find strong association between COVID-19 death trend and the search
volume of symptom-related queries such as “loss of taste”. Then we further develop a
previously-proposed influenza-tracking ARGO model (AutoRegression with GOogle search
data) to predict future 4-week COVID-19 deaths on the US national level, by combining
search volume information with COVID-19 time series information. Encouraged by the 20%
average error reduction of ARGO on national level comparing to the baseline time series
model, we additionally build state-level COVID-19 deaths models. We introduce variants of
ARGOX (Augmented Regression with GOogle data CROSS space), leveraging the cross-state
cross-resolution spatial temporal framework that pools information from search volume and
COVID-19 reports across states, regions and the nation. These variants of ARGOX are then
aggregated in a winner-takes-all ensemble fashion to produce the final state-level 4-week
forecasts. Numerical experiments demonstrate that our method steadily outperforms time
series baseline models, and achieves the state-of-the-art performance among the publicly
available benchmark models. Overall, we show that disease dynamics and relevant public
search behaviors co-evolve during the COVID-19 pandemic, and capturing their dependencies
while leveraging historical cases/deaths as well as spatial-temporal cross-region information
will enable stable and accurate US national and state-level forecasts.
K eywords Infectious disease prediction · COVID-19 · spatial-temporal model · internet search data ·
statistical modeling
Author Summary
Big data from the Internet has great potential to track infectious diseases at multiple geographical levels,
such as estimating influenza activity from online search volume data. With the current COVID-19 pandemic,
accurate forecasts of the disease dynamics can help resource allocation and public health decision-making. In
this work, we further develop a previously-proposed influenza-tracking model for the short-term COVID-19
deaths forecasting in the United State at both national and state levels. Our model efficiently combines
publicly available Internet search data at multiple resolutions (national, regional, and state-level) with the
surveillance data from the Centers for Disease Control and Prevention (CDC), accounting for the spatial-
temporal structure in the disease spread and online search pattern. Our method, across all states, performs
∗
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, USA
A preprint - November 4, 2021
competitively with the current state-of-arts benchmark models, demonstrating great potential in assisting
CDC’s current ensemble predictions. Our model is robust and easy to implement, with the flexibility to
incorporate additional information from other sources and resolutions, making it generally applicable to
tracking other social, economic or public health events at the state or local level.
1 Introduction
COVID-19, an acute respiratory syndrome disease caused by novel coronavirus SARS-CoV-2, has spread to
more than 200 countries worldwide, leading to more than 224 million confirmed cases and 4.98 million deaths
as of Oct 25, 2021 [1]. Understanding how the disease spread dynamics progress over time is much needed,
given the fluid situation and the potential rapid growth of COVID-19 infections. The implementation of
efficient intervention policy and the allocation of emergency resources all depend on the accurate forecasts
of the disease situation [2]. Currently, machine learning methods [3, 4, 5] and compartmental models
[6, 7, 8, 9, 10] are the most popular and prevailing approaches for the publicly-available COVID-19 spread
forecasts, according to the weekly forecast reports compiled by the Centers for Disease Control and Prevention
(CDC) [11]. On the other hand, statistical models utilizing internet search behaviors for COVID-19 predictions
have not attracted much attention.
In the last decade, numerous studies have shown that Internet-based big data could be a valuable com-
plementary data source to monitor the prevalence of infectious diseases and provide near real-time disease
estimations [12, 13, 14, 15], alternative to the traditional surveillance approach. For example, Yang et al. [12]
provided a robust way to estimate real-time influenza situation in the United States using Google search data;
Ning et al. [16] and Yang et al. [17] further extended the method for the regional-level and the state-level
influenza estimation using the Google search data at the finer resolution; Yang et al. [18] demonstrated
the success of search data to track dengue fever in five tropical countries. Other types of internet-based
data include cloud-based electronic health records [19] and social media messages [20]. Currently, during the
ongoing COVID-19 pandemic, a large amount of pandemic-related online searches are generated, indicating
actual infections and general concerns, which might contain useful information to estimate and predict the
disease spread.
However, building COVID-19 prediction model with online search data is undoubtedly challenging. First of
all, COVID-19 pandemic is a novel disease outbreak with rapid development, which creates difficulties in
identifying relevant keyword queries of the search data. Even with the expert-curated queries, the online
search frequency data can contain a high level of noise with many unusual spikes, due to the general searches
driven by non-disease factors such as media coverage or public concern. Besides, the ground truth data
compiled by CDC could also be noisy with frequent retrospective revisions of daily/weekly cases accounting
for the mistakes in data collection and reporting.
Related literature
The correlation between search engine data (such as Google Trends [21], Baidu [22], Twitter, and Youtube
searches [23]) and the COVID-19 situation has been well documented for multiple countries [24, 25, 26],
including specific studies in China [27, 28], Europe [29, 30], India [31], Iran [30, 32], U.S. [30, 33, 34, 35],
and Spain [36]. However, these articles mostly focus on the pure correlation exercise, including correlation
computation [26, 36, 35, 24], rank analysis [34], and cross correlation for time delay between search peaks
and COVID-19 cases/deaths [27, 28, 31]. None of them examines the importance among the search queries,
considers the spatial-temporal structure of the data, or attempts to make weeks-ahead predictions.
The relevant search queries in existing literature include general COVID-19 terms such as “coronaviours” or
“COVID” [27], public safety precautions such as “handwashing” [28], symptom-related queries such as “loss
of smell” [35, 30]. However, most of the existing articles focus on a handful of query terms under a specific
class, without a large-scale data-driven query identification process.
While most of existing articles argue for the potential of online search data for COVID-19 forecasts [23,
27, 28, 29, 30, 33, 31, 24, 35], only a few actually build the prediction models [34, 26, 32, 25]. Specifically,
Mavragani and Gkillas[34] demonstrates the search data predictive power via quantile regression in U.S.
states. Prasanth et al. [26] uses the search data from selected queries on a long-short term memory (LSTM)
framework for U.S, U.K and India. Ayyoubzadeh et al. [32] takes it further by combining linear regression
with LSTM model to provide short-term COVID-19 cases forecast in Iran. Lampos et al. [25] conducts a
prediction study in several European countries, using transfer-learning and Gaussian processes.
2
A preprint - November 4, 2021
However, none of the articles above fully utilizse the predictive power of internet search data by accounting
for the spatial-temporal structure, including the time series information of COVID-19 or internet searches
in near-by regions/areas. Other qualitative analysis, ad-hoc correlation exercise, or off-the-shelf model
application are even further away from a real impact. The only internet-search-based model, to the best of
our knowledge, that accounts for spatial structure is ARGONet [22], which uses a clustering and L1 -penalized
data augmentation technique for 2-day ahead COVID-19 cases forecast in China. So far, none of the existing
Internet-search-based methods provides robust weeks-ahead forecasts for different geographical areas in the
United States that account for spatial correlations.
Our contribution
In this paper, we propose a simple framework with Google search queries for United States national and
state level 4-week-ahead COVID-19 deaths forecasts. In particular, we identify relevant queries through
a large-scale cross-correlation exercise, and de-noise the search frequency data from unusual spikes using
inter-quantile-range method. We detect strong correlation between lagged Google search data and COVID-19
deaths with symptom-related terms, and select important Google search queries among the large-batch of
related terms to increase the robustness and interpretability of our forecasts. We then utilize the detected
predictive power and combine the selected Google search data and lagged COVID-19 time series information
to produce national level forecasts. We further incorporate a multi-resolution spatial temporal framework for
state-level forecasts, leveraging cross-state, cross-region COVID-19 and Google search information, accounting
for the geographical proximity and correlations in infections. Unlike the clustering technique in ARGONet
[22], our state level method exhibits stronger internal spatial structure and better model interpretability.
Lastly, we incorporate a winner-takes-all mechanism to generate more coherent final United States national
and state level predictions. Numerical comparisons show that our method performs competitively with other
publicly available COVID-19 forecasts. The success of our method demonstrates that previously-developed
models for influenza predictions [12, 17] using online search data can be re-purposed for accurate and robust
forecasts of COVID-19, further emphasizing the general applicability of our method and the power of big
data disease detection.
We use reported COVID-19 confirmed cases and death of United States from New York Times (NYT) [37] as
features in our model. This data is collected from January 21, 2020 to Oct 9, 2021.
When comparing against other Centers for Disease Control and Prevention (CDC) official predictions, we use
COVID-19 confirmed cases and death from JHU CSSE COVID-19 dataset [38] as the groundtruth. This data
is curated dataset used by the CDC at their official website, collected from January 22, 2020 to Oct 10, 2021.
We do not use JHU COVID-19 dataset as input features in our model because JHU COVID-19 dataset
retrospectively corrects past confirmed cases and death due to reporting error, and federal and state policy
changes, while NYT dataset does not revise past data, which gives more realistic forecasts.
The online search data used in this paper is obtained from Google Trends [21], where one can obtain the
search frequencies of a term of interest in a specific region and time frame by typing in the search query on
the website. With Google Trends API, we are able to obtain a daily time series of the search frequencies for
the term of interest, including all searches that contain all of its words (un-normalized). The search term’s
frequencies time series from Google Trends is based on a sampling approach, which looks at the search query
representative of all raw Google searches frequencies [21].
This paper uses 256 top searched COVID-19 related Google search queries, including common searched
COVID related terms, COVID related symptoms, COVID pandemic policies implemented, COVID related
resource allocated, and etc. Example terms include “Coronavirus”, “COVID 19”, “COVID Vaccine”, “loss of
taste”, “loss of smell”, “cough”, and “fever”. We obtain all the Google search queries’ daily frequencies for
3
A preprint - November 4, 2021
national and state level. Regional level Google search queries are obtained by simply summing up the state
level Google search query volumes that are in the region. Note that some of the search terms might seem
identical, e.g. "coronavirus vaccine" and "covid 19 vaccine.", but we treat them separately in our model due
to linguistic heterogeneity, as terms with similar meaning but differently phrased are embedded with different
search frequencies by the public due to different linguistic preferences.
Google Trends also truncates data to 0 if the search volume for the query is too low. Consequently, for a
given query and state, the zeros in Google Trends data indicate missing data due to low volume of searches
for the the specified query and state, which is very common in practice. We account for the high level of
sparsity in the state level data by borrowing information from regional level to “enrich” state-level sparsity
through a weighted average of state-level search frequency (2/3 weight) and regional-level search frequency
(1/3 weight) [17].
Instability and sudden spikes/drops in Google search volume data can due to natural noises in Google
Trends’ sampling approach. Meanwhile, sparsity might still exist in some states’ search queries after regional-
enrichment of state-level Google search data. Such instability and sparsity severely reduce the prediction
accuracy at the national, regional and state level. Therefore, we introduce an Inter-Quantile Range (IQR)
based data filtering mechanism to reduce the noise in the data.
We first drop the Google search terms that have frequencies lower than the median number of all other Google
search queries frequencies. Then, we identify the outliers of a Google search query. The large-valued outlier
are those above 99.9 percent quantile and also three standard deviations above past-week rolling average.
The small-value outliers aer those below 1 percent quantile. We overwrite the outlier values to be the past
three-day average.
The reason behind different data processing approaches for large and small valued outliers lies in the hypothesis
that an sudden increase of search volume is more probable to be true than a drop in search frequency of a
Google search query, which is possibly due to inadequate search intensity. For instance, sudden increases in
search frequencies of COVID-19 related terms occur when COVID-19 first hit the U.S. in mid March 2020,
while decreases/sparsities in search query volumes are resulted from low search volume and missing data,
especially in state-level search queries. Therefore, the IQR filter is “looser” on large valued outliers and “more
strict” on small valued outliers. As an result, we believe that a large valued outlier is indeed an “unreasonable”
spike if they are significantly larger than search frequencies from the other days in the same week.
This IQR inspired filtering mechanism is able to further account for the sparsity in Google search queries as
well as removing unusual spikes, which improves our model’s forecast accuracy.
It is typical to see the peak of COVID-19 search volume ahead of the peak in reported cases or deaths, see
Fig 1 for an illustration for query “loss of taste”. One hypothesis is that the early-stage infected people could
search for COVID-19 related information online before their arrival at a clinic or tested positive.
As such, using delayed Google search frequencies for forecast is essential for our predictions. One simple way
is to enumerate all possible delaying lags of Google search frequencies as exogenous feature variables. Yet,
this will significantly increase the number of exogenous variables and impact prediction accuracy which could
result in over-fitting. Thus, we derive the optimal lag for each Google search query and only consider those
optimal lags in forecasting model.
We use the period from April 1st 2020 to June 30th 2020 to find the optimal lags. We will use the period after
July 1st 2020 for comparing forecast accuracy. Because media-driven or information-seeking searches are very
common at the beginning of the pandemic [39, 40], we exclude period prior to April 1st 2020 in the analysis,
so that the query terms identified are more likely to be driven by actual infection. For each query, we fit a
linear regression of COVID-19 daily death against lagged Google search frequency, considering a range of lags
(4 to 35 days). We select the lagged Google search frequency that has the lowest mean square error (MSE)
as the optimal lag for that query. Table S3 in supplementary materials shows all the optimal lags for the
selected important Google search terms, ranked by their optimal lags. For national, regional and state level
Google search information, we consider the same optimal lag displayed in table S3 throughout this study.
4
A preprint - November 4, 2021
Figure 1: Google search query “loss of taste” and COVID-19 weekly incremental death Illustration
of delay in peak between Google search query search frequencies (Loss of Taste in red) and COVID-19 national
level weekly incremental death (blue). Y-axis are adjusted accordingly.
Though we removed low frequency queries and sparsity in the remaining queries through the IQR filter, some
remaining queries might still exhibit high variability and do not obtain a clear trend comparing to COVID-19
death. To further obtain most useful terms for predicting COVID-19 death and eventually reduce our model
complexity, we computed Pearson correlation coefficient between each of the optimal lagged search term and
COVID-19 daily death during the period from April 1st 2020 to June 30th 2020, where the detail derivation
of optimal delay is shown in section 2.2.2. Table S2 in supplementary materials lists all the Google search
query terms and their Pearson correlation coefficient against COVID-19 daily death, in which only positive
Pearson correlation terms are displayed.
We select the 23 terms that have Pearson correlations above 0.5 as “important terms” and only use them in
our forecast model, shown in table S3.
3 Methods
Let Xi,t,m be the Google Trends data of search term i day t of area m; yt,m be the New York Times COVID-19
death increment at day t of area m; ct,m be the New York Times COVID-19 confirmed case increment at
day t of area m, where the area m can refer to the entire nation, one specific HHS region (such as New
England), or one specific state (such as Georgia). Let Ok be the optimal lag for the kth Google search term,
which is the same for all area m. Let I{t,r} be the weekday r indicator for t (i.e., I{t,1} indicates day t being
Monday, and I{t,6} indicates day t being Saturday), which accounts for the weekday seasonality in COVID-19
incremental death time series.
5
A preprint - November 4, 2021
Inspired by ARGO method [12], with information available as of time T , to estimate yT +l,m for l > 0, the
incremental COVID-19 death on day T + l of area m, an L1 regularized linear estimator is used:
I
X X K
X
ŷT +l,m = µ̂y,m + α̂i,m yT −i,m + β̂j,m cT +l−j,m + δ̂k,m Xk,T +l−Ôk ,m
i=0 j∈J k=1
(1)
6
X
+ γ̂r,m I{T +l,r}
r=1
where we use lagged death, lagged confirmed cases and optimal lagged Google search terms for death
prediction. For lth day ahead prediction at area m, the coefficients {µy,m , α = (α1,m , . . . , αI,m ), β =
(β1,m , . . . , β|J |,m ), δ = (δ1,m , . . . , δK,m ), γ = (γ1,m , . . . , γ6,m )} are obtained via
T
X −l 6
X X
argmin yt+l,m − µy,m − αi,m yt−i,m − βj,m ct+l−j,m
µy,m ,α,β,δ,γm ,λ
t=T −M −l+1 i=0 j∈J
(2)
23 6
!2
X X
− δk,m Xk,t+l−Ôk ,m − γr,m I{t+l,r}
k=1 r=1
+ λα kαk1 + λβ kβk1 + λδ kδk1 + λγ kγk1
We set M = 56, i.e. 56 days as training period; I = 6 considering consecutive 1 week lagged death;
J = max ({7, 14, 21, 28}, l) considering weekly lagged confirmed cases; K = 23 highly correlated Google
search terms; Ôk = max (Ok , l) be the adjusted optimal lag of kth Google search term subject to lth day
ahead prediction. We set hyperparameters λ = (λα , λβ , λδ , λγ ) through cross-validation. For simplicity, we
constrain λα = λβ = λδ = λγ .
To further impose smoothness into our predictions, we use the three-day moving average of the coefficients
for predicting day T + l, which slightly boosts our prediction accuracy.
Using the above formulation, we forecast future 4 weeks of daily incremental COVID-19 death of area
m, i.e. {ŷT +1,m , . . . , ŷT +28,m }, and aggregate them into weekly prediction. In other words, ŷT +1:T +7,m =
P7 P14 P21
i=1 ŷT +i,m is first week, ŷT +8:T +14,m = ŷ
P28 i=8 T +i,m
is the second week, ŷT +15:T +21,m = i=15 ŷT +i,m , is
the third week, and ŷT +22:T +28,m = i=22 ŷT +i,m is the fourth week ARGO incremental death prediction.
We denote this method as “ARGO Inspired Prediction”.
6
A preprint - November 4, 2021
death predictions does not equal the national level ARGO prediction. Furthermore, our ARGO inspired
national level predictions (section 3.1) are more accurate for the national level groundtruth than the sum of
ARGOX 51 states’ predictions. Therefore, we propose a constrained second step to ARGOX inspired state
level prediction, by restricting the sum of state-level ARGOX predictions to be close to the ARGO inspired
national prediction.
After the first step in section 3.2, instead of separating the 51 US states into “joint” and “alone” states and
estimating state-level COVID-19 deaths separately, we treat them as a whole (except HI and VT) as we are
constraining the sum of all states’ death estimations. We separate out HI and VT, since these two states have
the lowest incremental COVID-19 deaths, and such sparsity causes instability when deriving the covariance
matrices. We estimate HI and VI COVID-19 weekly incremental death using the ARGO inspired state-level
estimates through equation (2).
For the 49 states (except HI and VT), our raw estimates for the state-level weekly COVID-19 incremental death
yτ = (yτ,1 , . . . , yτ,49 ) are ŷτGT = (ŷτ,1
GT GT |
, . . . , ŷτ,49 ) , ŷτreg = (ŷτ,r
reg
1
reg |
, . . . , ŷτ,r 49
) and ŷτnat = (ŷτnat , . . . , ŷτnat )| ,
where rm is the region number for state m. Here, we denote state and reg to be state/regional estimates
with internet search information only, where the detailed derivations can be found in SI, and nat to be
national estimates from equation (1). Similar to the second step in section 3.2 and in ARGOX [17], we
denote the state-level death increment at week τ as Zτ = ∆yτ = yτ − yτ −1 and it has four predictors: (i)
Zτ −1 = ∆yτ −1 , (ii) ŷτGT − yτ −1 , (iii) ŷτreg − yτ −1 , and (iv) ŷτnat − yτ −1 . Let Wτ denote the collection of
these four vectors Wτ = (Zτ|−1 , (ŷτGT − yτ −1 )| , (ŷτreg − yτ −1 )| , (ŷτnat − yτ −1 )| )| .
We denote ŷτ,nat
∗ARGO
as the week τ national level ARGO-inspired COVID-19 incremental death estimation
excluding HI and VT. To predict the week τ state-level COVID-19 death increment, we solve the constrained
optimization problem below that minimizes the variance between groundtruth and our predictor, subject to
the constraint that the sum of our predictor ought to be close to national level COVID-19 incremental death
estimations.
min Tr(Var(Zτ − AWτ ))
A
(3)
s.t. 1| (µZ + AWτ ) = ŷτ,nat
∗ARGO
− 1| yτ −1
where 1 = (1, . . . , 1)| is a length 49 vector of 1s, and µZ is the mean of Zτ . In particular if the above
optimization problem is unconstrained, the solution will be the best-linear predictor (no ridge-penalty) in
ARGOX [17]. The detail derivation of the optimization problem in equation (3) as well as the final closed-form
solution can be found in the supplementary materials.
To further boost the state-level COVID-19 death prediction accuracy, we incorporate an ensemble framework
that combines our previous estimations and selects the best predictor for each week. For all 51 U.S. states,
we denote the ARGO-inspired state-level prediction for week τ as “ARGO” (section 3.1), ARGOX-inspired
joint-alone state prediction as “ARGOX-2step” (section 3.2), and ARGOX-inspired national constrained
prediction as “ARGOX-NatConstraint” (section 3.3). For a training period of 15 weeks, we evaluate each
predictor with mean squared error (MSE) and select the one with lowest MSE as the ensemble predictor for
week τ + 1, τ + 2, τ + 3, and τ + 4. Such winner-takes-all approach has been previously shown to be effective
for influenza estimation [14].
4 Retrospective Evaluation
We use three metrics to evaluate the accuracy of an estimate of COVID-19 death against the actual COVID-19
Death published by John Hopkins University (JHU): the root mean squared error (RMSE), the mean absolute
error (MAE), and the Pearson q P correlation (Correlation). RMSE between an estimate ŷt and the true value yt
T 2
over period t = 1, . . . , T is T t=1 (ŷt − yt ) . MAE between an estimate ŷt and the true value yt over period
1
PT
t = 1, . . . , T is T1 t=1 |ŷt − yt |. Correlation is the Pearson correlation coefficient between ŷ = (ŷ1 , . . . , ŷT )
and y = (y1 , . . . , yT ).
7
A preprint - November 4, 2021
Table 1 summarizes the accuracy metrics for all estimation methods for the period from July 4, 2020 to Oct
9th, 2021 on national level, which shows that ARGO’s estimates outperform all other simplified models,
in every accuracy metric for the whole time period. Fig. 1 displays the estimates against the observed
COVID-19 weekly incremental death.
ARGO outperforms simplified models in terms of RMSE, MAE, and Pearson correlation throughout 1 to
4 weeks ahead predictions. AR-Only predictions, on the other hand, have higher MSE comparing against
Naive predictions for 2 and 4 weeks ahead, suggesting the importance of Google search terms in the method.
COVID-19 death’s trend is highly correlated with the optimal lagged important terms we selected, which
boosts the model’s accuracy. However, using only Google search information isn’t good enough as well, as
GT-only predictions are barely beating Naive predictions for 1 to 4 weeks ahead considering MAE and fall
behind if considering RMSE, indicating autoregressive information (lagged COVID-19 cases and death) can
help predictions using solely Google search terms by correcting its trend to not overshoot or under estimate,
as shown in the period between December 2020 to March 2021 in Fig 2. Delaying behavior exists in all
methods for 1 to 4 weeks ahead predictions, due to the lagged latest information to train for predictions,
especially when forecast horizon extends to 3 and 4 weeks ahead. Yet, utilizing people’s search behavior to
foresee future trends, ARGO is able to overcome such delay effect in almost all the weeks for 1 week ahead
predictions and majority of the weeks in 2020 for 2 to 4 weeks ahead predictions. Also, ARGO is the only
method that captures the COVID-19 death peak in the week of January 16, 2021, for all 1 to 4 weeks ahead
8
A preprint - November 4, 2021
predictions. The integration of time series information and Google search terms leads to a trend-capturing
estimation curve without undesired spikes in 1 to 3 weeks ahead forecasts, and robust recovering of spikes in
4 weeks ahead forecasts comparing to other benchmark methods.
(a) 1 Week Ahead National Level Predictions (b) 2 Weeks Ahead National Level Predictions
(c) 3 Weeks Ahead National Level Predictions (d) 4 Weeks Ahead National Level Predictions
Figure 2: 1 to 4 weeks ahead national level COVID-19 weekly incremental death predictions’ comparisons
weekly from 2020-07-04 to 2021-10-09. The method included are AR-Only, ARGO, GT-Only, Naive (persis-
tence), truth. Estimation results for COVID-19 1 (top left), 2 (top right), 3 (bottom left), and 4 (bottom
right) weeks ahead weekly incremental death. ARGO estimations (thick red), contrasting with the true
COVID-19 death from JHU dataset (thick black) as well as the estimates from AR-Only (gold), GT-Only
(Green), and Naive (blue).
The results presented above demonstrate ARGO’s accuracy and robustness. The optimal lagged 23 important
Google search terms appear to be key factors in the enhanced accuracy of ARGO, as well as the past week’s
COVID-19 deaths, as shown in Fig S3 and S4 in supplementary materials, which reflect a strong temporal
autocorrelation after the feature selection of L1 penalty.
9
A preprint - November 4, 2021
Table 2 summarizes the overall results of ARGO prediction, ARGOX-2step prediction, ARGOX-NatConstraint,
Winner-takes-all Ensemble and Naive, averaging over the 51 states for the whole period of July 4th 2020 to
Oct 9th 2021. Our winner-takes-all ensemble method gives the leading performance uniformly in all metrics as
shown in table 2, which achieves around 18% error reduction in RMSE, around 20% error reduction in MAE
and around 8% increase in Pearson correlation compared to the best alternative in the whole period on average
across 1 to 4 weeks ahead predictions. Among all the methods we compare in this section, winner-takes-all
ensemble is the only method that uniformly outperforms the naive predictions. The robustness and accuracy
are further illustrated in Fig 3, which shows the 51 state’s RMSE, MSE and Pearson correlation in the violin
charts for 1 to 4 weeks ahead predictions. The winner-takes-all ensemble approach outperforms all other
approaches in the three metrics, in terms of mean and standard deviation range over all 51 states.
Detailed numerical results for each state are reported in tables S5–S55 and fig S14-S64 in supplementary
material, where the ensemble approach demonstrates its accuracy in majority of the states. Notice that
JHU ground-truth data exists jumps and spikes in some states due to retrospective edition. Such noisy
ground-truth data is a key challenge to forecasting tasks. Our ensemble framework, on the other hand, reveals
its robustness over geographical variability and extracts a strong combination from all other ARGO, ARGOX
approaches.
To show detail break down of methods contributing to the ensemble approach, we display the ensemble
approach’s selection proportion among ARGO, ARGOX-2step and ARGOX-NatConstraint 1 to 4 weeks
ahead predictions for all 51 U.S. states from 2020-07-04 to 2021-10-09, in table 3. Throughout the period
10
A preprint - November 4, 2021
Figure 3: The distribution of values for each metric for each model, over the 51 states for 1 to 4 weeks (from
left to right) ahead predictions during the period from 2020-07-04 to 2021-10-09. The embedded black dot
and vertical line indicate mean and 1 standard deviation range. Average of each error metric across 51 states
are reported in table 2.
for all 51 states, the ensemble approach selects state-level ARGO predictions the most (around 40%), and
ARGOX-NatConstraint the least (around 25%) throughout 1 to 4 weeks ahead estimations. This is evident as
table 2 indicates state-level ARGO performs competitively against naive estimations for all 1 to 4 weeks ahead
predictions on average across all the states for almost all three error metrics. Yet, state-level ARGO cannot
beat naive predictions uniformly across all the states, as shown in supplementary materials tables S5–S55 and
fig S14-S64. On the other hand, though ARGOX-NatConstraint and ARGOX-2Step seem to perform poorly
against naive predictions as shown in table 2, they contribute to the ensemble approach and boost its accuracy
drastically. Detailed ensemble approach state-level selections for 1 to 4 weeks ahead estimations are displayed
in fig S5a and S5b in supplementary materials. It seems that state-level ARGO, ARGOX-NatConstraint and
ARGOX-2Step all have different strength and weakness depending on the particular state and time. Through
adaptive selections of the best performer for each state, the winner-takes-all ensemble approach takes the
best part of each of the three models, and thus achieves most robust prediction performance.
In addition to the weekly estimates, ARGOX Ensemble also gives confidence intervals, by simply taking the
confidence intervals of selected method. Table S4 (in supplementary materials) shows the coverage of the
confidence intervals for all 51 states. The nominal 95% confidence interval has an actual 88.2% coverage
on average for 1 week ahead predictions, suggesting that our confidence intervals reasonably measure the
accuracy of our weekly estimates, albeit with over-confidence.
11
A preprint - November 4, 2021
methods top performances among 30 number of methods submitted to CDC, considering comparison period
from July 4th 2020 to Oct 9th 2021. We use both ARGOX Ensemble method and persistence (Naive) method
for this comparison period (29 prediction submission records). We only consider top 6 CDC published teams
for prediction comparison, after filtering out teams having missing values in their reporting over the period
and states we are considering. Again, the groundtruth is COVID-19 weekly incremental death from JHU
COVID-19 dataset, shown in section 2.1, and naive approach uses this week’s death published by NYT as 1
to 4 weeks ahead predictions. We summarize the national level comparison results in table 4 and the state
level comparison results in table 5, where we compared RMSE, MAE and Correlation of 1 to 4 weeks ahead
national and state level COVID-19 death predictions. We rank the teams according to the average of each
error metric we used. We further show the distribution of state level comparison results across all three error
metric in Fig S7 violin charts (in supplementary material), where mean and standard deviations of each
methods are displayed. Detailed state-by-state error metric are shown in fig S8 to S13.
From table 4 and table 5, we can observe that our ARGOX-Ensemble model produces competitive accuracy
for COVID-19 1 to 4 weeks ahead predictions, and is among the top 8 models in term of all three error
metrics. Fig S7 (in supplementary materials) further displays ARGOX-Ensemble’s accuracy and robustness
in terms of the metrics’ mean and standard deviation range over all 51 states. Overall, all the error metric
comparisons demonstrate ARGOX-Ensemble’s competitiveness during the comparison time periods.
5 Discussion
While our ARGOX-Ensemble approach shows strong results, its accuracy and robustness depends on the
reliability of its inputs. One limitation of our method is that the Google search query volumes are sensitive
to media coverage, and such instability could propagate into our COVID-19 death predictions. Fortunately,
media driven searches die down as pandemic progresses. In addition, our model also mitigate such instability
via adaptive training.
12
A preprint - November 4, 2021
We use the summer period to identify the optimal lag and the 23 highly correlated queries. Such idea of
optimal lag captures the intuition that people tend to search before clinic visits. It is interesting to observe
different indications of epidemiological plausibility and infections from the optimal delays of the queries (table
S3). COVID-19 incremental death trend has the longest delay from COVID-19 cases and tests related queries
(more than 4 weeks), and follows by mild (3-4 weeks) and severe symptoms (2-3 weeks) related queries,
indicating symptom to death is in moderate horizon and symptom’s severity increases while the optimal lag
decreases. On the other hand, there are still some Google search queries affected by media coverage and
general public fear. Namely, vaccination related queries have the shortest delays (table S3), which is a short
term signal as well as general concern, intensified by news media coverage and spikes in cases or death trends,
since the vaccination is not yet available in summer 2020. Nonetheless, our model is able to robustly capture
the COVID-19 death trend, by determining the optimal lags during the summer period.
Information in Google search data deteriorates as forecast horizons expands, which could potentially impact
the robustness and accuracy of our 4 weeks ahead predictions (Fig 2 and Tab 4). Nevertheless, the L1 penalty
and the dynamic training are able to capture the most relevant search terms and time series information for
COVID-19 national level death estimation, and our model is still better than GT-only or AR-only models
(Table 1) in all forecasting horizons. Meanwhile, ARGOX-Ensemble is able to robustly select accurate 1-4
weeks ahead state-level predictions from the three ARGOX alternative methods, despite that JHU dataset is
only a noisy ground-truth. Models to further alleviate the bias in Internet search data and capture long-term
COVID-19 trends could be an interesting future direction.
For national level COVID-19 death, the last week’s COVID-19 death and all the Google search terms have
significant effects on the 1 to 4 weeks ahead COVID-19 growth, shown in fig S3 in supplementary materials,
which reflects a strong temporal auto-correlation and dependence on people search behaviors. Smoothing
the penalized linear regression’s coefficients with past three day’s coefficients further leads to a smooth
and continuous estimation curve and prevents undesired spikes, shown in Fig 2. ARGO also allows us to
transparently understand how Google search information and historical COVID-19 information complement
one another. For instance, past week COVID-19 death contribute positively to the current national COVID-19
13
A preprint - November 4, 2021
death trend predictions, shown in fig S3, which indicates that the current trend is likely to follow from the
past week’s growth/drop. Fig S3 also indicates the time-varying relationships between COVID-19 death
trend and people’s search behavior for COVID-19 related terms (general and symptom related searches). In
the national COVID-19 1 to 4 weeks ahead predictions, time series models tend to have delay responses to
sudden changes and are easily carried away by the changes, as shown in Fig 2 from December 2020 to March
2021. Google search information, on the other hand, is better at reacting to sudden changes, but is also
sensitive to public’s overreaction embedded in the search frequencies. Fortunately, the adaptive training can
help ARGO achieves fast self-correction in the subsequent week.
For state level, besides producing ARGO state level predictions using the same framework as national
level ARGO, we effectively combine state, regional, and national level publicly available data from Google
searches and delayed COVID-19 cases and death to produce ARGOX-2Step state level estimations. ARGOX-
NatConstraint improve upon ARGOX-2Step by restricting the sum of state level death predictions to be
similar to national level death predictions, as ARGO national level predictions have already shown its
strength. Both ARGOX-2Step and ARGOX-NatConstraint incorporate geographical and temporal correlation
of COVID-19 death to provide accurate, reliable 1 to 4 weeks ahead estimations. To further improve accuracy
and robustness, we combine all three methods and produce winner-takes-all ensemble forecast for 1 to 4
weeks ahead state level deaths. ARGO and ARGOX-2Step are unified frameworks adapted directly from
influenza prediction with minimal changes, which demonstrates their robustness and general applicability,
while reducing the possibility of over-fitting. Furthermore, the winner-takes-all ensemble approach efficiently
combines all three frameworks and is able to outperform the constituent models for all states in all 1 to 4 weeks
ahead predictions. Our national model and state-level performances are competitive to other state-of-arts
models from CDC. Thus, we have shown that adapting ARGOX framework to COVID-19 can achieve accurate
and robust results, and our model could serve as a valuable input for the CDC’s current ensemble forecast.
Concluding Remarks
In this paper, we demonstrate that methods for influenza prediction method using online search data
[12, 16, 17] can be re-purposed for COVID-19 prediction. Specifically, by incorporating Google search
information and autoregressive information, we could achieve strong performance on national level deaths
predictions, while aggregating Google search information and cross-state-regional-national data could achieve
competitive performance on state level death predictions, for 1 to 4 weeks ahead COVID-19 death forecasts,
compared with other existing COVID-19 methods submitted to CDC. The combination of COVID-19 cases
and deaths with optimally delayed Google search information, as well as the utilization of geographical
structure, appear to be key factors in the enhanced accuracy of ARGO in national and state level predictions,
demonstrating great additional insights which could assist and complement current CDC forecasts.
14
A preprint - November 4, 2021
References
[1] Atul Sharma, Swapnil Tiwari, Manas Kanti Deb, and Jean Louis Marty. Severe acute respiratory
syndrome coronavirus-2 (sars-cov-2): a global pandemic and treatment strategies. International Journal
of Antimicrobial Agents, 56(2):106054, 2020.
[2] Gitanjali R Shinde, Asmita B Kalamkar, Parikshit N Mahalle, Nilanjan Dey, Jyotismita Chaki, and
Aboul Ella Hassanien. Forecasting models for coronavirus disease (covid-19): a survey of the state-of-
the-art. SN Computer Science, 1(4):1–15, 2020.
[3] Alexander Rodriguez, Anika Tabassum, Jiaming Cui, Jiajia Xie, Javen Ho, Pulak Agarwal, Bijaya
Adhikari, and B. Aditya Prakash. Deepcovid: An operational deep learning-driven framework for
explainable real-time covid-19 forecasting. medRxiv, 2020.
[4] Xiaoyong Jin, Yu-Xiang Wang, and Xifeng Yan. Inter-series attention model for covid-19 forecasting,
2020.
[5] Maria Jahja, David Farrow, Roni Rosenfeld, and Ryan J Tibshirani. Kalman filter, sensor fusion,
and constrained regression: Equivalences and insights. In H. Wallach, H. Larochelle, A. Beygelzimer,
F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems,
volume 32. Curran Associates, Inc., 2019.
[6] J. Chhatwal, O. Dalgic, P. Mueller, M. Adee, Y. Xiao, M.A. Ladd, B.P. Linas, and T. Ayer. Pin68
covid-19 simulator: An interactive tool to inform covid-19 intervention policy decisions in the united
states. Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes
Research, 23(12):S556—S556, December 2020.
[7] Difan Zou, Lingxiao Wang, Pan Xu, Jinghui Chen, Weitong Zhang, and Quanquan Gu. Epidemic model
guided machine learning for covid-19 forecasts in the united states. medRxiv, 2020.
[8] S Abbott, J Hellewell, RN Thompson, K Sherratt, HP Gibbs, NI Bosse, JD Munday, S Meakin,
EL Doughty, JY Chun, YWD Chan, F Finger, P Campbell, A Endo, CAB Pearson, A Gimma, T Russell,
null null, S Flasche, AJ Kucharski, RM Eggo, and S Funk. Estimating the time-varying reproduction
number of sars-cov-2 using national and subnational case counts [version 1; peer review: awaiting peer
review]. Wellcome Open Research, 5(112), 2020.
[9] Sercan O Arik, Chun-Liang Li, Jinsung Yoon, Rajarishi Sinha, Arkady Epshteyn, Long T Le, Vikas
Menon, Shashank Singh, Leyou Zhang, Nate Yoder, et al. Interpretable sequence learning for covid-19
forecasting. arXiv preprint arXiv:2008.00646, 2020.
[10] Wan Yang, Sasikiran Kandula, Mary Huynh, Sharon K Greene, Gretchen Van Wye, Wenhui Li, Hiu Tai
Chan, Emily McGibbon, Alice Yeung, Don Olson, et al. Estimating the infection-fatality risk of sars-cov-2
in new york city during the spring 2020 pandemic wave: a model-based analysis. The Lancet Infectious
Diseases, 21(2):203–212, 2021.
[11] Evan L Ray, Nutcha Wattanachit, Jarad Niemi, Abdul Hannan Kanji, Katie House, Estee Y Cramer,
Johannes Bracher, Andrew Zheng, Teresa K Yamana, Xinyue Xiong, et al. Ensemble forecasts of
coronavirus disease 2019 (covid-19) in the us. MedRXiv, 2020.
[12] Shihao Yang, Mauricio Santillana, and S. C. Kou. Accurate estimation of influenza epidemics using
google search data via argo. Proceedings of the National Academy of Sciences, 112(47):14473–14478,
2015.
[13] Mauricio Santillana, Andre Nguyen, Mark Dredze, Michael Paul, and John Brownstein. Combining
search, social media, and traditional data sources to improve influenza surveillance. PLoS computational
biology, 11, 08 2015.
[14] Fred Lu, Mohammad Hattab, Cesar Clemente, Matthew Biggerstaff, and Mauricio Santillana. Improved
state-level influenza nowcasting in the united states leveraging internet-based data and network approaches.
Nature Communications, 10, 01 2019.
[15] Jeremy Ginsberg, Matthew Mohebbi, Rajan Patel, Lynnette Brammer, Mark Smolinski, and Larry
Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457:1012–4, 12 2008.
[16] Shaoyang Ning and Shihao Yang. Accurate regional influenza epidemics tracking using internet search
data. Scientific Reports, 9:5238, 03 2019.
[17] Shihao Yang, Shaoyang Ning, and S. C. Kou. Use internet search data to accurately track state level
influenza epidemics. Sci Rep, 11(4023), 2021.
15
A preprint - November 4, 2021
[18] Shihao Yang, Samuel C Kou, Fred Lu, John S Brownstein, Nicholas Brooke, and Mauricio Santillana.
Advances in using internet searches to track dengue. PLoS computational biology, 13(7):e1005607, 2017.
[19] Shihao Yang, Mauricio Santillana, John S Brownstein, Josh Gray, Stewart Richardson, and SC Kou.
Using electronic health records and internet search information for accurate influenza forecasting. BMC
infectious diseases, 17(1):1–9, 2017.
[20] Mauricio Santillana, André T Nguyen, Mark Dredze, Michael J Paul, Elaine O Nsoesie, and John S
Brownstein. Combining search, social media, and traditional data sources to improve influenza surveillance.
PLoS Comput Biol, 11(10):e1004513, 2015.
[21] Faq about google trends data. https://support.google.com/trends/answer/4365533?hl=en&ref_
topic=6248052. Accessed: 2021-04-03.
[22] Dianbo Liu, Leonardo Clemente, Canelle Poirier, Xiyu Ding, Matteo Chinazzi, Jessica Davis, Alessandro
Vespignani, and Mauricio Santillana. Real-time forecasting of the covid-19 outbreak in chinese provinces:
Machine learning approach using novel digital data and estimates from mechanistic models. J Med
Internet Res, 22(8):e20285, Aug 2020.
[23] Sohaib R Rufai and Catey Bunce. World leaders’ usage of twitter in response to the covid-19 pandemic:
a content analysis. Journal of Public Health, 42(3):510–516, 2020.
[24] Maria Effenberger, Andreas Kronbichler, Jae Il Shin, Gert Mayer, Herbert Tilg, and Paul Perco. Associ-
ation of the covid-19 pandemic with internet search volumes: a google trendstm analysis. International
Journal of Infectious Diseases, 95:192–197, 2020.
[25] Vasileios Lampos, Maimuna S Majumder, Elad Yom-Tov, Michael Edelstein, Simon Moura, Yohhei
Hamada, Molebogeng X Rangaka, Rachel A McKendry, and Ingemar J Cox. Tracking covid-19 using
online search. NPJ digital medicine, 4(1):1–11, 2021.
[26] Sikakollu Prasanth, Uttam Singh, Arun Kumar, Vinay Anand Tikkiwal, and Peter HJ Chong. Forecasting
spread of covid-19 using google trends: A hybrid gwo-deep learning approach. Chaos, Solitons & Fractals,
142:110336, 2021.
[27] Cuilian Li, Li Jia Chen, Xueyu Chen, Mingzhi Zhang, Chi Pui Pang, and Haoyu Chen. Retrospective
analysis of the possibility of predicting the covid-19 outbreak from internet searches and social media
data, china, 2020. Eurosurveillance, 25(10):2000199, 2020.
[28] Atina Husnayain, Anis Fuad, and Emily Chia-Yu Su. Applications of google search trends for risk
communication in infectious disease management: A case study of the covid-19 outbreak in taiwan.
International Journal of Infectious Diseases, 95:221–223, 2020.
[29] Amaryllis Mavragani. Tracking covid-19 in europe: infodemiology approach. JMIR public health and
surveillance, 6(2):e18941, 2020.
[30] Abigail Walker, Claire Hopkins, and Pavol Surda. The use of google trends to investigate the loss of
smell related searches during covid-19 outbreak. International Forum of Allergy Rhinol, 10(7):839–847,
04 2020.
[31] U Venkatesh and Periyasamy Aravind Gandhi. Prediction of covid-19 outbreaks using google trends in
india: A retrospective analysis. Healthcare informatics research, 26(3):175–184, 2020.
[32] Seyed Mohammad Ayyoubzadeh, Seyed Mehdi Ayyoubzadeh, Hoda Zahedi, Mahnaz Ahmadi, and
Sharareh R Niakan Kalhori. Predicting covid-19 incidence through analysis of google trends data in iran:
data mining and deep learning pilot study. JMIR public health and surveillance, 6(2):e18828, 2020.
[33] Young-Rock Hong, John Lawrence, Dunc Williams Jr, and Arch Mainous III. Population-level interest
and telehealth capacity of us hospitals in response to covid-19: cross-sectional analysis of google search
and national hospital survey data. JMIR Public Health and Surveillance, 6(2):e18961, 2020.
[34] Amaryllis Mavragani and Konstantinos Gkillas. Covid-19 predictability in the united states using google
trends time series. Scientific reports, 10(1):1–12, 2020.
[35] Shyam J Kurian, Mohammed Ali Alvi, Henry H Ting, Curtis Storlie, Patrick M Wilson, Nilay D Shah,
Hongfang Liu, Mohamad Bydon, et al. Correlations between covid-19 cases and google trends data in
the united states: A state-by-state analysis. Mayo Clinic Proceedings, 95(11):2370–2381, 2020.
[36] Alberto Jimenez Jimenez, Rosa M Estevez-Reboredo, Miguel A Santed, and Victoria Ramos. Covid-19
symptom-related google searches and local covid-19 incidence in spain: Correlational study. Journal of
medical Internet research, 22(12):e23518, 2020.
16
A preprint - November 4, 2021
[37] The New York Times. Coronavirus (covid-19) data in the united states, 2021. https://github.com/
nytimes/COVID-19-data, Last accessed on 2021-04-03.
[38] Ensheng Dong, Hongru Du, and Lauren Gardner. An interactive web-based dashboard to track covid-19
in real time. Lancet Infect Dis, 20(5), 2020.
[39] Sherry Towers, Shehzad Afzal, Gilbert Bernal, Nadya Bliss, Shala Brown, Baltazar Espinoza, Jasmine
Jackson, Julia Judson-Garcia, Maryam Khan, Michael Lin, Robert Mamada, Victor Moreno, Fereshteh
Nazari, Kamaldeen Okuneye, Mary Ross, Claudia Rodriguez, Jan Medlock, David Ebert, and Carlos
Castillo-Chávez. Mass media and the contagion of fear: The case of ebola in america. PLOS ONE,
10:e0129179, 06 2015.
[40] Yla Tausczik, Kate Faasse, James Pennebaker, and Keith Petrie. Public anxiety and information seeking
following the h1n1 outbreak: Blogs, newspaper articles, and wikipedia visits. Health communication,
27:179–85, 08 2011.
[41] Dan Sheldon and Casey Gibson. Bayesian seird model, 2020. Accessed = 2021-04-03.
[42] Rebecca K Borchering, Cécile Viboud, Emily Howerton, Claire P Smith, Shaun Truelove, Michael C
Runge, Nicholas G Reich, Lucie Contamin, John Levander, Jessica Salerno, et al. Modeling of future
covid-19 cases, hospitalizations, and deaths, by vaccination rates and nonpharmaceutical intervention
scenarios—united states, april–september 2021. Morbidity and Mortality Weekly Report, 70(19):719,
2021.
[43] L Castro, G Fairchild, I Michaud, and D Osthus. Coffee: Covid-19 forecasts using fast evaluations and
estimation, 2020.
[44] Joceline Lega. Parameter estimation from icc curves. arXiv preprint arXiv:2005.08134, 2020.
[45] Sam Abbott, Joel Hellewell, Robin N Thompson, Katharine Sherratt, Hamish P Gibbs, Nikos I Bosse,
James D Munday, Sophie Meakin, Emma L Doughty, June Young Chun, et al. Estimating the time-
varying reproduction number of sars-cov-2 using national and subnational case counts. Wellcome Open
Research, 5(112):112, 2020.
17
A preprint - November 4, 2021
• Supplementary Text
• Supplementary Figs. S1a to S64
• Supplementary Tables S1 to S55
In the first step, we use LASSO to aggregate the search volume information in the corresponding area. In the
second step, we take a dichotomous approach for the 51 US states/districts, setting apart seven states: AK,
HI, DE, KY, VT and ME. We first set apart AK and HI, since they are geographically separated from the
contiguous US. Then, we determine the rest by computing multiple correlation in COVID-19 incremental death
count of each state to the COVID-19 incremental death counts of entire nation, the COVID-19 incremental
death counts of the other regions (excluding the region that the state belongs) and the COVID-19 incremental
death counts other states. DE, KY, VT and ME are the 4 states that have the lowest multiple correlations.
A relatively low multiple correlation of a state implies that the state’s COVID-19 death growth trend is not
well aligned with other states’, other regions’ or the whole nation, indicating that information cross the other
states or other regions might not help the stand-alone 7 states’ death prediction. Therefore, we incorporate
the dichotomous approach from ARGOX [17] on the 45 “joint” states, and 6 “alone” states.
First Step
For the first step, using the same notation as in section 3.1, we extract region/state level internet search
information in region/state m for day T + l by estimating ŷT +l,m using Google search terms with equation
(4), for l > 0.
K
X 6
X
ŷT +l,m = µ̂y,m + δ̂k,m Xk,T +l−Ôk ,m + γ̂r,m I{T +l,r} (4)
k=1 r=1
where Xi,t,m is the Google Trends data of search term i day t of region/state m and I{T +l,r} is a weekday r
indicator for the forecast date T + l. The coefficients {µy,m , δ = (δ1,m , . . . , δK,m ), γ = (γ1,m , . . . , γ6,m } are
obtained via
T −l 27 6
!2
X X X
argmin yt+l,m − µy,m − δk,m Xk,t+l−Ôk ,m + γr,m I{T +l,r} + λδ kδk1 + λγ kγk1 (5)
µy,m ,δ,γm ,λ
t=T −M −l+1 k=1 r=1
We set M = 56 days for training, and λ = {λδ , λγ } through cross-validation, where we let λδ = λγ for
simplicity. Additionally, we use K = 23 highly correlated Google search terms, and let Ôk = max (Ok , l) as
the adjusted optimal lag of kth Google search term subject to lth day ahead prediction. Denote the regional
reg reg
estimates obtained as (ŷT,1 , . . . , ŷT,10 ) and state estimates obtained as (ŷT,1
GT GT
, . . . , ŷT,51 ).
Lastly, we obtain national COVID-19 death estimate ŷTnat using equation (1). Since all the estimates obtained
above are daily COVID-19 incremental death, we aggregate them into 1 to 4 weeks total incremental death
and work on 1 to 4 weeks ahead death forecast separately using the following steps. We denote index τ for
weekly indexing and t for daily indexing.
Second Step
For the 45 joint states, we gather the raw estimates for state/regional/national-level weekly COVID-19
incremental deaths from the first step to obtain the best linear predictor with ridge-regression inspired shrinkage
1 1
A preprint - November 4, 2021
for state-level COVID-19 increnental death estimate for week τ via ARGOX [17], using four predictors.
Specifically, we use our raw estimates for the state-level weekly COVID-19 incremental deaths: ŷτGT =
(ŷτ,1
GT GT |
, . . . , ŷτ,45 ) , expanded national and regional level COVID-19 death estimates: ŷτnat = (ŷτnat , . . . , ŷτnat )|
and ŷτ = (ŷτ,r1 , . . . , ŷτ,r
reg reg reg |
45
) , where rm is the region number for state m, and the previous week state-level
groudtruth, with 30-weeks training window for parameter estimations.
For the 6 alone states, we take a stand-alone modeling approach [17], focusing on estimating the individual
state’s COVID-19 1 to 4 weeks ahead incremental death by integrating the within-state and national
information in the second step. Specifically, we use 3 predictors, previous week state-level groundtruth, state
level and national level COVID-19 estimates, ŷτ,mGT
, ŷτnat , for m ∈ {AK, HI, DE, KY, VT and ME}, where
the regional terms are dropped. Similarly, we use the best linear predictor with ridge-regression inspired
shrinkage to get the final estimates [17], with 30-weeks training period.
2 2
A preprint - November 4, 2021
∇λ f (A, λ) = ỹ − 1| AW
After setting them to 0 for optimally condition and solve for A and λ, we have:
A = ΣZW + λ2 1| W | Σ−1
(
WW
(8)
λ = nW T Σ2−1 W ỹ − 1| ΣZW Σ−1
WW W
WW
Thus, our final prediction for state level COVID-19 week t incremental death is
!
1
ŷτ = ŷτ −1 + µZ + ΣZW + |
ỹ − 1 ΣZW Σ−1
W W Wτ
f 1| W
f|
τ Σ−1
W W Wτ
f (9)
f T Σ−1 W
nW fτ
τ WW
Moreover, we use the ridge-regression inspired shrinkage to modify the estimate, by replacing ΣZW as 12 ΣZW
and ΣW W as ( 21 ΣW W + 12 DW W ) where DW W is the diagonal of the empirical covariance of Wτ :
−1 ! !
1 1 1 | 1 1
Ẑτ = µZ + ΣZW + ỹ − 1 ΣZW ΣW W + DW W | f|
Wτ 1 Wτ
f
2 f T 1 ΣW W + 1 DW W −1 W 2 2 2
nW τ 2 2
fτ
−1
1 1
ΣW W + D W W W
fτ
2 2
Therefore, our final prediction for state-level COVID-19 week t incremental death with ridge inspired shrinkage
is:
!
1
−1 f
ŷτ = ŷτ −1 + µZ + ΣZW + ỹ − 1 ΣZW (ΣW W + DW W ) Wτ 1 Wτ
| | f|
nWf T (ΣW W + DW W )−1 W fτ
τ
−1
(ΣW W + DW W ) W
fτ
(10)
3 3
A preprint - November 4, 2021
4 4
A preprint - November 4, 2021
Table S2: Optimal Lagged Google Query and COVID-19 Death Pearson Correlation from 2020-04-01 to
2020-06-30
Google Query Pearson Correlation Google Query Pearson Correlation
loss of taste 0.909 covid 19 how long 0.223
loss of smell 0.877 normal body 0.223
how long contagious 0.864 body temperature 0.222
covid 19 vaccine 0.815 cold vs coronavirus 0.205
rapid covid 19 0.782 coronavirus vs cold 0.205
pneumonia 0.761 expectorant 0.203
robitussin 0.738 acute bronchitis 0.186
bronchitis 0.724 covid 19 hospital 0.178
sinus 0.711 high fever 0.157
cough 0.699 covid 19 relief 0.153
covid 19 0.649 human temperature 0.153
fever 0.649 is coronavirus contagious 0.147
symptoms of the covid 19 0.64 normal body temperature 0.14
how long covid 19 0.636 signs of the coronavirus 0.14
sore throat 0.636 contagious coronavirus 0.129
coronavirus test 0.628 coronavirus contagious 0.129
coronavirus cases 0.622 shortness of breath 0.125
strep throat 0.605 coronavirus vitamin c 0.089
coronavirus exposure 0.573 oseltamivir 0.082
coronavirus vaccine. 0.549 coronavirus test kit 0.079
exposed to coronavirus 0.523 covid 19 treatment 0.073
rapid coronavirus 0.513 cold and coronavirus 0.061
upper respiratory 0.51 coronavirus and cold 0.061
headache 0.507 how long does the coronavirus last 0.054
the covid 19 0.506 how long does coronavirus last 0.052
covid 19 cases 0.506 symptoms of covid 19 0.049
nausea 0.502 coronavirus medication 0.048
tessalon 0.491 coronavirus family 0.039
symptoms of pneumonia 0.482 taking temperature 0.037
oscillococcinum 0.476 do i have the coronavirus 0.036
strep 0.473 respiratory coronavirus 0.035
nasal congestion. 0.468 covid 19 what to do 0.034
common cold 0.446 coronavirus hospital 0.03
chest cold 0.434 i have the coronavirus 0.018
walking pneumonia 0.397 coronavirus care 0.013
coronavirus relief 0.357 covid 19 symptoms 0.011
covid 19 test 0.336 ear thermometer 0.01
cough fever 0.331 coronavirus how long 0.01
fever cough 0.331 how long coronavirus 0.01
covid 19 care 0.24 coronavirus recovery 0.005
sinus infections 0.23 how to treat coronavirus 0.004
reduce fever 0.223 coronavirus cough 0.001
5 5
A preprint - November 4, 2021
6 6
A preprint - November 4, 2021
ARGOX-Nat-Constrained motivation
FigS1a, S1b, S2a, and S2b show the inconsistency of aggregated state level ARGOX predictions comparing
against ARGO national level predictions for future 1-4 weeks ahead predictions.
7 7
A preprint - November 4, 2021
8 8
A preprint - November 4, 2021
9 9
A preprint - November 4, 2021
10 10
A preprint - November 4, 2021
11 11
A preprint - November 4, 2021
Fig S6a and S6b show the 3 weeks and 4 weeks ahead winner-takes-all state level forecasts selection among
the three other methods: ARGO, ARGOX-2Step and ARGOX-NatCOnstraint.
12 12
A preprint - November 4, 2021
13 13
A preprint - November 4, 2021
Figure S7: Comparison among different models’ 1 to 4 weeks (from left to right) ahead U.S. states level weekly
incremental death predictions (from 2020-07-04 to 2021-10-10). The RMSE, MAE and Pearson correlation
for each method across all states are reported in the violin plot. The methods (x-axis) are sorted based on
their RMSE.
14 14
A preprint - November 4, 2021
Figure S8: State Level 1 week (left) and 2 weeks (right) ahead all teams RMSE comparison heatmap. States
(y-axis) are sorted based on the best performing method’s RMSE, in this case ARGOX-Ensemble. RMSE
greater than 200 are scaled to 200.
15 15
A preprint - November 4, 2021
Figure S9: State Level 3 weeks (left) and 4 weeks (right) ahead all teams RMSE comparison heatmap. States
(y-axis) are sorted based on the best performing method’s RMSE, in this case ARGOX-Ensemble. RMSE
greater than 200 are scaled to 200.
16 16
A preprint - November 4, 2021
Figure S10: State Level 1 week (left) and 2 weeks (right) ahead all teams MAE comparison heatmap. States
(y-axis) are sorted based on the best performing method’s MAE, in this case ARGOX-Ensemble. MAE
greater than 200 are scaled to 200.
17 17
A preprint - November 4, 2021
Figure S11: State Level 3 weeks (left) and 4 weeks (right) ahead all teams MAE comparison heatmap. States
(y-axis) are sorted based on the best performing method’s MAE, in this case ARGOX-Ensemble. MAE
greater than 200 are scaled to 200.
18 18
A preprint - November 4, 2021
Figure S12: State Level 1 week (left) and 2 weeks (right) ahead all teams Pearson correlation against
JHU groundtruth comparison heatmap. States (y-axis) are sorted based on the best performing method’s
correlation, in this case ARGOX-Ensemble
19 19
A preprint - November 4, 2021
Figure S13: State Level 3 weeks (left) and 4 weeks (right) ahead all teams Pearson correlation against
JHU groundtruth comparison heatmap. States (y-axis) are sorted based on the best performing method’s
correlation, in this case ARGOX-Ensemble
20 20
A preprint - November 4, 2021
US−AK US−AK
120
80
Death
40
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
100
Death
50
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S14: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Alaska (AK).
21 21
A preprint - November 4, 2021
US−AL US−AL
1200
900
Death
600
300
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
1200 Naive
Truth
800
Death
400
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S15: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Alabama (AL).
22 22
A preprint - November 4, 2021
US−AR US−AR
300
200
Death
100
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
400
300
Death
200
100
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S16: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Arkansas (AR).
23 23
A preprint - November 4, 2021
US−AZ US−AZ
1000
Death
500
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
1500 Naive
Truth
1000
Death
500
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S17: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Arizona (AZ).
24 24
A preprint - November 4, 2021
US−CA US−CA
4000
3000
Death
2000
1000
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
5000 Naive
Truth
4000
3000
Death
2000
1000
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S18: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for California (CA).
25 25
A preprint - November 4, 2021
US−CO US−CO
600
400
Death
200
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
750
Death
500
250
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S19: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Colorado (CO).
26 26
A preprint - November 4, 2021
US−CT US−CT
300
200
Death
100
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
400 Naive
Truth
300
Death
200
100
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S20: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Connecticut (CT).
27 27
A preprint - November 4, 2021
US−DC US−DC
125
100
75
Death
50
25
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Truth
100
Death
50
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S21: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for District of Columbia (DC).
28 28
A preprint - November 4, 2021
US−DE US−DE
150
100
Death
50
Methods
0
ARGO
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
150
100
Death
50
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S22: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Delaware (DE).
29 29
A preprint - November 4, 2021
US−FL US−FL
3000
2000
Death
1000
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
4000
Truth
3000
Death
2000
1000
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S23: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Florida (FL).
30 30
A preprint - November 4, 2021
US−GA US−GA
1000
750
Death
500
250
Methods
ARGO
0 ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
1500 Naive
Truth
1000
Death
500
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S24: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Georgia (GA).
31 31
A preprint - November 4, 2021
US−HI US−HI
75
50
Death
25
0 Methods
ARGO
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
60
Death
40
20
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S25: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Hawaii (HI).
32 32
A preprint - November 4, 2021
US−IA US−IA
600
400
Death
200
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
750 Truth
500
Death
250
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S26: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Iowa (IA).
33 33
A preprint - November 4, 2021
US−ID US−ID
200
150
Death
100
50
0
Methods
ARGO
−50
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
200 Truth
150
Death
100
50
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S27: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Idaho (ID).
34 34
A preprint - November 4, 2021
US−IL US−IL
1500
1000
Death
500
Methods
ARGO
0 ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
2000 Truth
1500
Death
1000
500
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S28: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Illinois (IL).
35 35
A preprint - November 4, 2021
US−IN US−IN
1500
1000
Death
500
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
1500
Death
1000
500
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S29: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Indiana (IN).
36 36
A preprint - November 4, 2021
US−KS US−KS
400
300
Death
200
100
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
500 Naive
Truth
400
300
Death
200
100
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S30: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Kansas (KS).
37 37
A preprint - November 4, 2021
US−KY US−KY
600
Death
400
200
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
600
Death
400
200
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S31: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Kentucky (KY).
38 38
A preprint - November 4, 2021
US−LA US−LA
400
Death
200
0 Methods
ARGO
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Truth
800
Death
400
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S32: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Louisiana (LA).
39 39
A preprint - November 4, 2021
US−MA US−MA
600
400
Death
200
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
600
Truth
400
Death
200
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S33: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Massachusetts (MA).
40 40
A preprint - November 4, 2021
US−MD US−MD
600
400
Death
200
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
600 Naive
Truth
400
Death
200
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S34: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Maryland (MD).
41 41
A preprint - November 4, 2021
US−ME US−ME
100
Death
50
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Truth
100
75
Death
50
25
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S35: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Maine (ME).
42 42
A preprint - November 4, 2021
US−MI US−MI
900
Death
600
300
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
1500
Truth
1000
Death
500
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S36: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Michigan (MI).
43 43
A preprint - November 4, 2021
US−MN US−MN
500
400
300
Death
200
100
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Truth
600
Death
400
200
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S37: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Minnesota (MN).
44 44
A preprint - November 4, 2021
US−MO US−MO
400
Death
200
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
600 Truth
400
Death
200
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S38: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Missouri (MO).
45 45
A preprint - November 4, 2021
US−MS US−MS
900
Death
600
300
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
1000
Truth
750
Death
500
250
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S39: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Mississippi (MS).
46 46
A preprint - November 4, 2021
US−MT US−MT
150
100
Death
50
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
150
Truth
100
Death
50
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S40: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Montana (MT).
47 47
A preprint - November 4, 2021
US−NC US−NC
600
Death
400
200
Methods
ARGO
0 ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
800 Naive
Truth
600
Death
400
200
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S41: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for North Carolina (NC).
48 48
A preprint - November 4, 2021
US−ND US−ND
200
Death
100
0 Methods
ARGO
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
150 Truth
100
Death
50
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S42: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for North Dakota (ND).
49 49
A preprint - November 4, 2021
US−NE US−NE
300
200
Death
100
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
200
Death
100
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S43: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Nebraska (NE).
50 50
A preprint - November 4, 2021
US−NH US−NH
150
100
Death
50
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
150
Truth
100
Death
50
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S44: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for New Hampshire (NH).
51 51
A preprint - November 4, 2021
US−NJ US−NJ
800
600
Death
400
200
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
1000
Truth
750
Death
500
250
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S45: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for New Jersey (NJ).
52 52
A preprint - November 4, 2021
US−NM US−NM
300
200
Death
100
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
400 Naive
Truth
300
Death
200
100
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S46: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for New Mexico (NM).
53 53
A preprint - November 4, 2021
US−NV US−NV
300
200
Death
100
Methods
ARGO
0 ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
400
Truth
300
Death
200
100
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S47: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Nevada (NV).
54 54
A preprint - November 4, 2021
US−NY US−NY
1500
1000
Death
500
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
1500 Truth
1000
Death
500
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S48: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for New York (NY).
55 55
A preprint - November 4, 2021
US−OH US−OH
4000
3000
Death
2000
1000
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
4000
3000
Death
2000
1000
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S49: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Ohio (OH).
56 56
A preprint - November 4, 2021
US−OK US−OK
1500
1000
Death
500
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
1500 Truth
1000
Death
500
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S50: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Oklahoma (OK).
57 57
A preprint - November 4, 2021
US−OR US−OR
250
200
150
Death
100
50
Methods
ARGO
0 ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Truth
200
Death
100
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S51: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Oregon (OR).
58 58
A preprint - November 4, 2021
US−PA US−PA
1500
1000
Death
500
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
2000
Death
1000
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S52: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Pennsylvania (PA).
59 59
A preprint - November 4, 2021
US−RI US−RI
150
100
Death
50
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
200 Naive
Truth
150
Death
100
50
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S53: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Rhode Island (RI).
60 60
A preprint - November 4, 2021
US−SC US−SC
600
Death
400
200
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
750
500
Death
250
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S54: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for South Carolina (SC).
61 61
A preprint - November 4, 2021
US−SD US−SD
200
150
Death
100
50
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
300 Truth
200
Death
100
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S55: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for South Dakota (SD).
62 62
A preprint - November 4, 2021
US−TN US−TN
1000
750
Death
500
250
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
750
Death
500
250
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S56: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Tennessee (TN).
63 63
A preprint - November 4, 2021
US−TX US−TX
2000
Death
1000
Methods
ARGO
0 ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
3000
Death
2000
1000
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S57: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Texas (TX).
64 64
A preprint - November 4, 2021
US−UT US−UT
150
100
Death
50
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
150 Truth
100
Death
50
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S58: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Utah (UT).
65 65
A preprint - November 4, 2021
US−VA US−VA
1000
Death
500
Methods
0
ARGO
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
1500
Truth
1000
Death
500
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S59: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Virginia (VA).
66 66
A preprint - November 4, 2021
US−VT US−VT
20
0
Death
−20
−40
Methods
ARGO
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
20
15
Death
10
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S60: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Vermont (VT).
67 67
A preprint - November 4, 2021
US−WA US−WA
300
200
Death
100
Methods
0 ARGO
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Truth
400
300
Death
200
100
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S61: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Washington (WA).
68 68
A preprint - November 4, 2021
US−WI US−WI
400
Death
200
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
800
Truth
600
Death
400
200
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S62: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Wisconsin (WI).
69 69
A preprint - November 4, 2021
US−WV US−WV
200
150
Death
100
50
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
Naive
Truth
200
Death
100
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S63: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for West Virginia (WV).
70 70
A preprint - November 4, 2021
US−WY US−WY
100
Death
50
Methods
ARGO
0
ARGOX.2Step
1 Week Ahead 2 Weeks Ahead
ARGOX.NatConstraint
150 Naive
Truth
100
Death
50
0
20
21
21
21
20
21
21
21
02
02
20
20
20
20
20
20
20
20
r2
r2
ct
ct
ct
ct
Ju
Ju
Ap
Ap
Ja
Ja
O
Figure S64: Plots of the COVID-19 1 week (top left), 2 weeks (top right), 3 weeks (bottom left), and 4 weeks
(bottom right) ahead estimates for Wyoming (WY).
71 71