Multiscale Flu Forecasting Model
Multiscale Flu Forecasting Model
a
Los Alamos National Laboratory, Statistical Sciences Group
b
Duke University, Department of Statistical Science
arXiv:1909.13766v1 [stat.AP] 30 Sep 2019
Abstract
Influenza forecasting in the United States (US) is complex and challenging for reasons
including substantial spatial and temporal variability, nested geographic scales of forecast
interest, and heterogeneous surveillance participation. Here we present a flexible influenza
forecasting model called Dante, a multiscale flu forecasting model that learns rather than
prescribes spatial, temporal, and surveillance data structure. Forecasts at the Health and
Human Services (HHS) regional and national scales are generated as linear combinations
of state forecasts with weights proportional to US Census population estimates, resulting in
coherent forecasts across nested geographic scales. We retrospectively compare Dante’s short-
term and seasonal forecasts at the state, regional, and national scales for the 2012 through 2017
flu seasons in the US to the Dynamic Bayesian Model (DBM), a leading flu forecasting model.
Dante outperformed DBM for nearly all spatial units, flu seasons, geographic scales, and
forecasting targets. The improved performance is due to Dante making forecasts, especially
short-term forecasts, more confidently and accurately than DBM, suggesting Dante’s improved
forecast scores will also translate to more useful forecasts for the public health sector. Dante
participated in the prospective 2018/19 FluSight challenge hosted by the Centers for Disease
Control and Prevention and placed 1st in both the national and regional competition and
the state competition. The methodology underpinning Dante can be used in other disease
forecasting contexts where nested geographic scales of interest exist.
Introduction
Influenza represents a significant burden to public health with an estimated 9 to 49 million cases
each year in the United States (US) [1]. Influenza (flu) related activity is monitored in the US
by the Centers for Disease Control and Prevention (CDC) through numerous surveillance efforts.
∗ Email correspondence can be directed to dosthus@lanl.gov
1
One such effort is the Outpatient Influenza-like Illness Surveillance Network (ILINet). ILINet
collects weekly data on influenza-like illness (ILI) from over 2000 healthcare providers from all
50 states, Puerto Rico, the US Virgin Islands, and the District of Columbia. ILI is defined as
a temperature greater or equal to 100 degrees Fahrenheit, a cough or sore throat, and no other
known cause, representing symptoms consistent with influenza. ILINet constitutes a significant
and necessary effort to understanding the spread and prevalence of flu-like illness in the US in
near real-time.
With mature ILI surveillance infrastructure in place in the US, attention has turned in recent
years to ILI prediction. The ability to predict the spread of ILI poses a substantial public health
opportunity if able to be done accurately, confidently, and with actionable lead times at geographic
and temporal scales amenable to public health responsiveness. Since 2013, the CDC has hosted an
influenza forecasting challenge called the FluSight challenge to gauge the feasibility of forecasting
targets of public health interest in real-time, to galvanize the flu forecasting community around
common goals, and to foster innovation and improvement through collaboration and competition
[2, 3, 4]. The FluSight challenge has been a leading driver of recent model development and flu
forecasting advancements [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21].
Up until the 2016 flu season (i.e., the flu season starting in the fall of 2016 and ending in the
spring of 2017), the FluSight challenge’s scope encompassed forecasting short-term (one-to-four
week ahead) and seasonal (season onset, peak timing, and peak intensity) targets at two geographic
scales: nationally and regionally, where regions correspond to Health and Human Services (HHS)
regions. National and regional forecasts give a high-level view of flu activity across the US.
Those forecasts provide value to national and regional public health officials, but offer only coarse
information for state and local public health practitioners. Thus, starting with the 2017 flu season,
the FluSight challenge expanded to a third geographic scale: states and territories (referred to as
states). This expansion to a finer geographic scale presents an opportunity to move forecasting to
geographic scales better aligned with public health response infrastructure and decision making. It
also presents an opportunity to develop and advance methodological forecasting frameworks that
can share information across geographic locations, flu seasons, and geographic scales coherently
in ways that geographically isolated forecasting models cannot.
2
exceptions to these spatial trends with some higher than average ILI states exclusively surrounded
by below average ILI states (e.g., California and New Jersey). Attempts to model the spatial
relationships of flu and flu-like illnesses include using network models and US commuter data [5],
network models based on Euclidean distance [19], and empirically derived network relationships
[21, 22]. Though these approaches consider spatial relationships differently, they all support the
conclusion that sharing information across geographies can improve forecasting.
Figure 2 (and SI Figure S.3) shows season-to-season variability, illustrating the common directional
effect a flu season can have on nearly all states. 2015, for instance, was a mild flu season in the US
with 42 out of 53 states experiencing ILI activity below their state-specific averages (the 53 states
are all 50 states, minus Florida with no available data, plus Puerto Rico, the US Virgin Islands,
New York City, and the District of Columbia). In contrast, 2017 was an intense flu season with
47 out of 53 states experiencing ILI activity above their state-specific averages. Similar to the
findings that sharing information across geographies can improve flu forecasting, previous work
has found that sharing information across seasons can also improve flu forecasting [13].
Figure 3 shows the average standardized week-to-week volatility across geographic scales (see
SI Section S.3 for details). Standardized volatility measures how much ILI (states) and wILI
(regions and nationally) varies from week-to-week, where wILI is weighted ILI — a state-population
weighted version of ILI used to characterize ILI regionally and nationally. High volatility poses a
challenge to forecasting as increased volatility can swamp the signal in the (w)ILI data. Figure 3
makes clear that extending forecasts down to the state scale, a more actionable scale for public
health officials, comes at the cost of increased volatility. ILI is an estimate of the proportion of
patients seen by ILINet with symptoms consistent with the flu relative to the total number of
patients seen by ILINet for any reason. The level of noise in the estimates of those proportions are
largely driven by the number of patients seen weekly. As illustrated in Figure 3, the fewer patients
seen weekly, the noisier the ILI estimates and the larger the volatility can be. Thus, volatility
will naturally increase with finer geographic scales, unless counterbalanced with increased ILINet
participation. Developing multiscale flu forecasting models that account for decreasing volatility
with coarsening geographic scales will be crucial. Some multiscale forecasting models have been
developed in the context of norovirus gastroenteritis prediction [23]. In that work, [23] showed
that modeling at the finest available data scale and aggregating up to coarser scales generally had
better predictive performance than models directly operating at the aggregated scales. To our
knowledge, such models have not been developed and operationalized for ILI forecasting.
3
United States of America
DC
HI PR VI
AK
5 5
5
0 0
ILI (%)
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
3 3
4
2 2
2
1 1
0 0 0
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Figure 1: (Top) Average state influenza-like illness (ILI) relative to average national weighted ILI
(wILI). States bordering the Gulf of Mexico tend to have higher ILI than the national average.
The geographical sizes of Alaska (AK), Hawaii (HI), Puerto Rico (PR), the US Virgin Islands (VI),
New York City (NYC), and the District of Columbia (DC) are not to scale. Data for Florida is
unavailable. Averages are based on 2010 through 2017 data. (Bottom) ILI by season (colored lines)
for select states. Black line is national average wILI for reference. Appreciable season-to-season
and state-to-state ILI variability exists.
4
2015 2017
2015 2017
State Detrended ILI (%)
10 10
5 5
0 0
−5 −5
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Figure 2: (Top) Dark green states denote states with ILI less than their state-specific averages
while pink states are states with ILI above their state-specific averages. 2015 was a mild flu season
for the majority of states relative to their state-specific average ILI, while 2017 was an intense
flu season for the majority of states, indicating that season-to-season effects can affect most of
the country. Data unavailable for Florida. States displayed outside of the contiguous US are
geographically not to scale. (Bottom) State detrended ILI for the 2015 and 2017 flu seasons,
where state detrended ILI is ILI for a state/season minus ILI for that state averaged over all
seasons. Positive/negative state detrended ILI means ILI for that season was above/below the
state-specific average, respectively.
5
● ●
Week−to−week (w)ILI Volatility
Average Standardized
0.9 0.9
●
●
0.8 0.8 ●
● ●●
0.7 0.7
●●
●● ●●
0.6 0.6 ●● ● ● ●
●
● ●● ●
●●
0.5
●●●●●●●●
● ●
●● ● ●●
0.5
●
●
● ●●●
● ●●● ●
0.4 ● ●●●
●●
● ● ●●●
0.4
● ●
States HHS Regions National 1 2 5 10 25 50 100 250 500 1000
Average Number of
Weekly Patients (Thousands)
Figure 3: (Left) Average standardized week-to-week influenza-like illness (ILI – states) and
weighted ILI (wILI – HHS regions and nationally) volatility for three geographic scales. Volatility
decreases as the scales coarsen. (Right) Average standardized week-to-week (w)ILI volatility ver-
sus the average number of weekly patients on a log scale for each state, HHS region, and nationally.
Volatility decreases as the number of weekly patients seen increases, suggesting that volatility is
in part a product of ILINet participation.
developed. While efforts have been made to address each of these challenges in isolation, no one
has yet to tackle all of these challenges simultaneously in the context of influenza forecasting.
Jointly addressing all these challenges is the main contribution of this paper.
Dante is a probabilistic, Bayesian flu forecasting model that is decomposed into two submodels: the
fine-scale model (i.e., the state model) and the aggregation model (i.e., the regional and national
model).
Dante’s fine-scale model is itself described in two parts: the data model and the process model.
6
Dante’s Data Model
Let yrst ∈ (0, 1) be ILI/100 for state r = 1, 2, . . . , R in flu season s = 1, 2, . . . , S, for epidemic
week t = 1, 2, . . . , T = 35, where t = 1 corresponds to epidemic week 40, roughly the first week of
October and T = 35 most often corresponds to late May. Dante models the observed proportion
yrst with a Beta distribution as follows:
where
In Dante, θrst is the unobserved true proportion of visits for ILI in state r for season s during
week t and λr > 0 is a state-specific parameter that captures the level of noise in the ILINet
surveillance system and thus the level of volatility in the ILI time series. In Dante, yrst is modeled
as unbiased for the latent state θrst . The observation yrst , however, is not equal to θrst due to
variability in the measurement surveillance process (i.e., the true proportion of ILI in state r for
season s during week t is not going to be perfectly captured by ILINet surveillance). Motivated
by Figure 3, λr is likely to be related to ILINet participation as measured by the total number of
patients seen weekly in state r. As λr increases, the variance of yrst decreases and observations
will tend to be closer to θrst . Because we do not know the relationship between patient count
and λr a priori, we model λr hierarchically, allowing them to be learned from data (details in SI
Section S.1.1).
Figure 4 shows the posterior mean of λr versus the average number of patients seen weekly by
each state. A clear linear relationship is observed on a log-log scale, where the variance of yrst
goes down (i.e., λr increases) as the total weekly seen patients increases. What is particularly
striking about Figure 4 is that Dante has no knowledge of the number of patients seen each week
as it is not an input to Dante, illustrating how structure can be learned rather than prescribed
with a flexible, hierarchical model.
Dante’s process model models the unobserved true proportion of ILI, θrst ∈ (0, 1), as a function
of four components:
7
VA
● ●
1e+05 IL NYC
●
CA
MS ●
● AZ ● GA
PA ● ● MA
AL
WV ● MI ●
NM ● TX
●● ●
IN ● OH
CO ●
TN ●
SD KS ● ● ● NV LA
1e+04 ●●
PR ● ● NY NC
RI ●MO ● KY
λr
AK WA ● ●
● ●● ● ME ● MD ●
CT ● ● UT
VT ● WY ●
●
●● MN SC NJ
NE NH ● ●
IA
● ● WI OR
DE ●● MT
DC HI
● ●
● ●
AK ID OK
1e+03
ND
VI ●
●
0.5 1 2 5 10 20 50 100
Average Number of Weekly Patients (Thousands)
Figure 4: The posterior mean for λr versus the average number of patients seen weekly by each
state. Both axes are on a log scale. A clear linear relationship is observed. Dante learns this
relationship, as it has no explicit knowledge of the average number of patients.
The four terms in Equation 5 are modeled as random or reverse-random walks, allowing patterns
in the process model to be flexibly learned while capturing week-to-week correlation (details in SI
Section S.1.2).
Figure 5 illustrates the fits for all model components for Alabama and Iowa for the 2015 and 2017
flu seasons. The component µall
t is common to every state and season and acts as the anchor for
1, capturing the profile for a typical state and season. The component µstate
rt captures the state-
specific deviation from µall
t and is common to every season for a given state, but is distinct for each
state. As can be see in Figure 1, Alabama typically sees ILI above the national average, hence why
µstate
rt for Alabama is learned to be greater than zero. Iowa, however, typically sees ILI below the
national average, explaining why µstate
rt is learned to be less than zero for Iowa. The component
µseason
st captures the season-specific deviation from µall
t and is common to every state for a given
season, but is distinct for each season. This component captures the fact that seasons can have
effects that are shared by nearly all states, as illustrated in Figure 2. The shape of µseason
st for 2015
and 2017 has a similar shape to the average residuals for 2015 and 2017, respectively, in Figure
8
2. Finally, µinteraction
rst captures the remaining signal in πrst that cannot be accounted for by µall
t ,
µstate
rt , and µseason
st . The term µinteraction
rst is distinct for each state and season.
Dante’s process model is purposely over-specified. If our interest were purely to fit ILI data, the
term µinteraction
rst alone would suffice. However, there is not enough structure to forecast effectively
with only µinteraction
rst . On the other hand, the non-interaction terms in the decomposition of πrst
(µall state
t , µrt , and µseason
st ) provide structure for forecasting but not enough flexibility to capture
all the signal in the ILI data. Thus, the µinteraction
rst term provides the flexibility needed to fit the
data, but is specified so that it plays as minimal a role as possible so that signal is captured in
the non-interaction terms and can drive the shape of forecasts.
µall µstate µseason µinteraction π θ
0.07
Alabama 2015
●
2 1.0 2 0.06
πAL,2015
−3.5 −3.0
θAL,2015
● ●
µinteraction
1 0.5 1
µseason
0.05 ●●
AL,2015
µstate
●
2015
●
−4.0
µall
0.04
AL
0 0.0 0 −3.5 ●
●
●
●
●● ●●●● ●
−4.5 −0.5 0.03 ● ●
●●●
●●●
−1 −1 ●●
●●●● ● ●
−4.0 0.02 ●
−5.0 −2 −1.0 −2
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
−4
2 1.0 2 ●
0.015
Iowa 2015
−3.5
πIA,2015
µinteraction
θIA,2015
0.5 −5 ●
1 1
µseason
● ●
IA,2015
µstate
0.010 ●●
2015
−4.0
µall
●
0 0.0 0 ●
IA
●
−6 ●● ● ●●●
●●●
0.005 ●●
−4.5 −1 −0.5 −1 ●
●● ●
●●
● ●●
−7 ● ●●●
−5.0 −2 −1.0 −2 0.000 ●
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Alabama 2017
2 1.0 2 −2 ●
●●●
−3.5 ●
µinteraction
πAL,2017
θAL,2017
1 0.5 1 0.10
µseason
AL,2017
●●
µstate
−3
2017
−4.0
µall
●
AL
0 0.0 0 ●
●
−4.5 −4 0.05 ●
−1 −0.5 −1 ● ●●
●●●●● ●●●
●●●● ●●●●●
●●●●
−5.0 −2 −1.0 −2
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
2 1.0 2 −3
0.04 ●●
Iowa 2017
−3.5 ● ●
µinteraction
1 0.5 1 −4
πIA,2017
θIA,2017
0.03
µseason
●
IA,2017
µstate
●
2017
−4.0
µall
−5 ●●
0 0.0 0 0.02 ● ●
IA
● ●
●
−4.5 −0.5 −6 ● ●●●
−1 −1 0.01 ●●●
●
●●●●● ● ●●●●●●
●●
−5.0 −2 −1.0 −2 −7 0.00
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Figure 5: Posterior summaries for select components, seasons, and states of Dante, fit to seasons
2010 through 2017. Rows, from top to bottom, correspond to Alabama in 2015, Iowa in 2015,
Alabama in 2017, and Iowa in 2017. Columns, from left to right, correspond to µall , µstate , µseason ,
µinteraction , π (all from Equation 5), and θ (Equation 4). The π column is the sum of the µall ,
µstate , µseason , and µinteraction columns, accounting for posterior covariances. The θ column is the
inverse logit of the π column and is back on the scale of the data. The µall component is the
most structured component, as it is common for all states and seasons (i.e., the same for all rows).
The components µstate and µseason are the next most structured components. They describe the
state-specific and season-specific deviations from µall , respectively, and are common for all seasons
within a state (µstate ) and all states within a season (µseason ). The component µinteraction is the
least structured component of Dante, as it is specific to each season/state (i.e., it is different for
each row). Solid lines are posterior means. Ribbons are 95% posterior intervals. In the θ column,
points are data, y.
9
(MCMC), resulting in a sample of M draws that summarize the posterior distribution (details in
SI Section S.2). We use the software JAGS (Just Another Gibbs Sampler ) [24], as called by the
R package rjags [25] within the programming language R [26] to perform the MCMC sampling.
We denote each MCMC draw by the index m. Notationally, we denote the mth sample for a
yet-to-be-observed yrst as yrstm .
Dante’s regional and national forecasts are computed as linear combinations of state forecasts,
where weights are proportional to 2010 US Census population estimates (SI Figure S.4). Let
PR
wr ∈ [0, 1] be the population of state r divided by the US population such that r=1 wr = 1. For
each MCMC draw m, we compute the ILI forecast for aggregated region ρ (indexing all ten HHS
regions and nationally) as:
R
X
yρstm = wrρ yrstm . (6)
r=1
For ρ = region X (e.g., ρ = HHS Region 1), wrρ = 0 if state r is not a member of region X
and
wr I(r ∈ region X)
wrρ = PR (7)
r=1 wr I(r ∈ region X)
and I(r ∈ region X) is an indicator function equal to 1 if state r is in region X and 0 otherwise. By
PR
construction, r=1 wrρ = 1 for any ρ. The aggregation model constitutes a bottom-up forecasting
procedure and ensures forecasts are consistent across scales.
Results
Dante is compared to a leading flu forecasting model, the Dynamic Bayesian Model (DBM). DBM
combines a susceptible-infectious-recovered (SIR) compartmental model with a flexible statistical
discrepancy function to forecast ILI (see [13] for more details). DBM is fit to each geographic
unit separately, thus does not share information across geographic units or geographic scales, but
does share information across flu seasons. In contrast, Dante shares information across geographic
units, geographic scales, and flu seasons. DBM was the 4th place model and a component model
in the 2nd place ensemble model [27] in the prospective national and regional 2017/18 FluSight
10
challenge out of 29 participating models. Dante was the 1st place model in the 2018/19 national
and regional FluSight challenge out of 33 participating models. Dante also came in 1st out of 14
competing models in the 2018/19 state challenge.
We compare Dante and DBM using forecast skill following the scoring rules of the CDC’s FluSight
challenge (details in SI Section S.7), noting that this scoring rule is improper [18, 28, 29]. Forecast
skill ranges between 0 and 1, with 1 being the best possible forecast skill. Conceptually, skill is a
function of both accuracy (a measure of point-estimates) and confidence (a measure of distribu-
tional sharpness). We will show how Dante compares to DBM broadly in terms of skill, and also
in terms of its component pieces. Both models were fit in a leave-one-season out fashion, where
the data for all seasons not being forecasted along with the data for the season being forecasted
up to the forecast date were used for training.
Table 1 shows that Dante outperformed DBM in forecast skill at all geographic scales. For both
models, forecast skill improves as geographic scales coarsen, suggesting that forecast skill degrades
as we move to finer scale geographies where volatility is greater.
Table 1: Dante and DBM average forecast skill comparisons across geographic scales. Forecast
skill improves for both models as the geographic scale coarsens. Dante outperformed DBM at all
geographic scales.
Model States HHS Regions National
Dante 0.372 0.413 0.439
DBM 0.337 0.383 0.426
Table 2 shows that Dante outperformed DBM in terms of accuracy, as measured by mean squared
error (MSE) of point predictions, at all geographic scales. For both models, average MSE decreases
as geographic scales coarsen, suggesting that accuracy degrades as we move to finer scale geogra-
phies where volatility is greater. See SI Section S.10 for further details and figures comparing the
MSE of Dante and DBM.
Table 2: Dante and DBM average mean squared error (MSE) comparisons across geographic scales.
MSE improves for both models as the geographic scale coarsens. Dante outperformed DBM at all
geographic scales.
Model States HHS Regions National
Dante 3.164 2.441 1.921
DBM 3.390 2.674 2.244
Figure 6 shows the difference in forecast skill between Dante and DBM for each state, region, and
nationally. Dante outperformed DBM for the majority of geographic regions, with the exception
11
of HHS Region 7 and the states Wyoming, Puerto Rico, and Kentucky.
Figure 7 shows forecast skill broken down by targets (left) and flu seasons (right) for each ge-
ographic scale. Dante outperformed DBM for all scales and targets, except for peak intensity
regionally and onset nationally. Improvement over DBM is largest for the 1-week ahead forecast
target. Dante also outperformed DBM for all scales and flu seasons, except for 2017 nation-
ally. While forecast skill for DBM and Dante are close for all seasons nationally (sans 2016),
Dante consistently and appreciably outperformed DBM for all seasons at the regional and state
scales.
Figure 8 provides context as to how Dante is outperforming DBM. Figure 8 displays the differ-
ence in forecast skill at each scale for all short-term forecasts against the difference in the 90%
highest posterior density (HPD) widths for each of the short-term target’s posterior predictive
distributions. HPDs are similar to confidence intervals as they capture the range of probability
concentration, but are more appropriate than confidence intervals for distributions that are not
unimodal. Figure 8 shows that for all scales and short-term forecasts, Dante has smaller HPD
interval widths, indicating that Dante’s forecasts are more concentrated than DBM and thus, more
confident. Dante’s increased forecast confidence resulted in higher forecast skill than DBM. This
is a promising finding, as more confident forecasts, if accurate, provide more information to public
health decision makers.
Figure 9 shows that Dante’s forecasts are more confident than DBM’s for all short-term targets
at all geographic scales. For each short-term target, forecasts for both DBM and Dante become
more confident as the geographic scale coarsens. DBM makes more confident short-term forecasts
because the (w)ILI DBM is modeling is less volatile, i.e. because (w)ILI becomes less volatile as ge-
ographic scales coarsen. Dante makes more confident short-term forecasts at coarsening geographic
scales as a result of the aggregation model. Dante’s 3-week-ahead 90% HPD interval widths nation-
ally and regionally are 1.4% and 2.1%, respectively, about the same as Dante’s 1-week-ahead 90%
HPD interval widths are regionally and at the state-level, respectively. Said another way, Dante
loses about 2 weeks of confidence in its short-term forecasts for each disaggregating geographic
scale.
Discussion
The plot of skill score (Figure 7) and total patients seen and volatility (Figure 3) suggests that
Dante (and DBM) can forecast geographic regions better when the forecasted estimate is based
on more data than less. This result suggests that expanded ILI surveillance participation plays
a role in improving model forecasts, not just improvements to the models themselves. This idea
12
Arizona ●
Wisconsin ●
Virginia ●
New York City ●
Michigan ●
Colorado ●
HHS Region 1 ●
New Jersey ●
HHS Region 2 ●
Georgia ●
Pennsylvania ●
Idaho ●
HHS Region 8 ●
Nebraska ●
South Dakota ●
Illinois ●
Maine ●
Ohio ●
Massachusetts ●
Minnesota ●
New Hampshire ●
Oregon ●
West Virginia ●
New Mexico ●
Maryland ●
Alabama ●
HHS Region 9 ●
Montana ●
HHS Region 3 ●
New York ●
Nevada ●
Utah ●
Hawaii ●
HHS Region 10 ●
Texas ●
Indiana ●
Missouri ●
North Carolina ●
Delaware ●
Mississippi ●
Kansas ●
HHS Region 6 ●
Vermont ●
Rhode Island ●
South Carolina ●
District of Columbia ●
Tennessee ●
Alaska ●
Louisiana ●
Iowa ●
Arkansas ●
North Dakota ●
Connecticut ●
nat ●
HHS Region 5 ●
Oklahoma ●
California ●
Washington ●
HHS Region 4 ●
Virgin Islands ●
HHS Region 7 ●
Wyoming ●
Puerto Rico ●
Kentucky ●
0.000 0.025 0.050 0.075 0.100
Dante Average Forecast Skill − DBM Average Forecast Skill
Figure 6: Difference in forecast skill between Dante and DBM, for all states, regions, and nationally.
Dante had higher forecast skill for all geographic regions except for HHS Region 7, Kentucky,
Wyoming, and Puerto Rico.
13
States States
0.8 0.8
0.6 0.6
●
● ●
●
0.4 ● ● ● 0.4 ●● ●
●
●
● ● ●
●
●
●● ●
●
●
● ●
● ●
●
●
●
●
● ●
●
● ●
●
0.2 0.2
0.8 0.8
0.6 ● 0.6
● ● ●
●
● ● ● ●
●
●
0.4 ● ● 0.4 ●● ●
●
●
●
●
●
●
● ● ●●
●
● ● ●
●
●
●
●
●
●
●
●
0.2 0.2
National National
0.8 0.8
●
●
0.6 0.6
●
● ●
●
● ●●
●
●
●
●
●
●
●
0.4 ● ● 0.4
● ● ●
●
● ●
●
●
●●
●
0.2 0.2
1wk 2wk 3wk 4wk Onset PI PT 2012 2013 2014 2015 2016 2017
Figure 7: (left) Average forecast skill by scales and targets. PI and PT stand for peak intensity
and peak timing, respectively. Dante outperformed DBM for all scales and targets, except for
onset nationally and for peak intensity (PI) regionally. (right) Average forecast skill by scales and
flu seasons. Dante outperformed DBM for all scales and targets, except for 2017 nationally.
14
Dante Average Forecast Skill − DBM
1wk
0.10 ●
−0.05
−7.5 −5.0 −2.5 0.0 2.5
Dante 90% HPD Interval Width − DBM 90% HPD Interval Width
Figure 8: Difference in average forecast skill versus difference in 90% highest posterior density
(HPD) interval widths for short-term forecasting targets. For all short-term forecasting targets
and geographic scales, Dante produced more confident (i.e., smaller 90% HPD interval widths)
and higher scoring forecasts than DBM.
●
3.00% ●
90% HPD Interval Width (%)
●
2.50%
● ●
2.00%
●
1.50% ●
● ●
●
1.00%
●
1wk 2wk 3wk 4wk
● Dante DBM
Figure 9: Average 90% highest posterior density (HPD) interval widths for the short-term fore-
casting targets. Both Dante and DBM produce more confident (i.e., smaller 90% HPD interval
widths) forecasts for coarser geographic resolutions. For all short-term forecasting targets and
geographic scales, Dante produced more confident forecasts than DBM.
15
is not surprising. Disease forecasting has been compared to weather forecasting [30], a field that
has continued to make consistent progress through parallel efforts of improved modeling and data
collection.
We found that Dante made more confident (or sharper) forecasts, as measured by smaller 90%
HPD interval widths than DBM, a model that fits aggregate data directly. Similar findings were
noted by [23] when comparing models fit to norovirus data in Berlin stratified by regions and age
groups — models fit to the finest scales and subsequently upscaled had sharper predictions than
models fit to the aggregated scales directly. This suggests that continued stratification of ILI,
such as partitioning state-level ILI by age groups, by flu strain type, or county-level, may provide
further sharpening of forecasts at aggregate scales.
Dante’s first place finish in the 2018/19 FluSight challenge may come as a surprise given that
it is a purely statistical model and uses only ILINet data, while many of its competitors are
based in full or in part on mechanistic disease transmission models and/or are augmented with
alternative data sources (e.g., Google search data). Dante’s superior performance suggests that
these mechanistic components or alternative data sources may be integrated into those models
in a way that is improperly aligned with the truth. For example, DBM includes an SIR model
component via a season-specific “I” term but a given season may have multiple circulating influenza
strains responsible for the “true” flu component in the ILI data, thus rendering the use of a single
“I” term inappropriate.
Revisions made to ILI data after its initial release are referred to as backfill and constitute a
meaningful source of uncertainty when prospectively forecasting ILI. In this work, backfill was
ignored for the forecasting of both Dante and DBM. As a result, the forecasting results for Dante
and DBM are directly comparable to each other in this paper but are not directly comparable
to previous FluSight challenge results. The reason backfill was ignored in this paper is because
Dante uses state-level ILI data directly, and state-level backfill data is only available starting for
the 2017/18 season. Though backfill was not addressed in this paper, Dante’s winning 2018/19
FluSight challenge entry did include a backfill model to account for the revision process of real-time
ILI data.
Linking understandable processes to observed patterns in the data via models while maintaining
high performance is the next frontier in ILI modeling. To do so will require a fuller consideration of
the “ILI data generating process.” This process non-exhaustively includes a disease transmission
process(es) (e.g., things often modeled with a compartmental model(s)), a healthcare visitation
process (i.e., a set of processes related to who interacts with the healthcare system and when), an
ILINet participation process (i.e., certain providers participate in ILINet while others do not, and
the composition of provider networks varies temporally and spatially), and a reporting process
16
(e.g., backfill).
Further stratification is a promising direction for incorporating known facets of the ILI data
generating process into Dante in a flexible way. For example, if provider-level ILINet data were
available state-level models could be decomposed into models for emergency department (ED) ILI
and non-ED ILI. We hypothesize that a systematic difference exists between patients visiting ED
and non-ED providers, specifically that the proportion of ED patients with ILI is higher than that
of non-ED patients. If so, then provider composition could help explain some state-level variation
in ILI magnitude (we expect states with more ED providers have higher reported ILI). It could
also explain part of the holiday-specific spikes in observed ILI (we expect that spikes in ILI activity
on holidays are partially due to the provider composition in ILINet changes for that week – more
clinics are closed and thus ED providers have a relatively higher contribution).
Modeling across geographic scales in a single, unified model ensures forecasts are simultaneously
coherent, a feature that is not present in many FluSight submissions. Ongoing work by our team
will provide a model-agnostic tool by which users can modify outputs from a non-unified model
so as to attain coherency. Our team is also working to incorporate internet data sources (i.e.,
nowcasting) into future iterations of Dante. When internet data sources were incorporated into
DBM the performance increased, which leads us to be hopeful that Dante will also be improved
by the thoughtful incorporation of internet data.
Acknowledgements
This work was funded by the Department of Energy at Los Alamos National Laboratory un-
der contract 89233218CNA000001 through the Laboratory-Directed Research and Development
Program, specifically LANL LDRD grant 20190546ECR. The authors thank C.C. Essix for her
encouragement and support of this work, as well as the helpful conversations with the Delphi group
at Carnegie Mellon University. Approved for unlimited release under LA-UR-19-28977.
References
[1] Centers for Disease Control and Prevention. Disease burden of influenza, 2019. Accessed:
06-13-2019.
[2] Matthew Biggerstaff, David Alper, Mark Dredze, Spencer Fox, Isaac Chun-Hai Fung, Kyle S
Hickmann, Bryan Lewis, Roni Rosenfeld, Jeffrey Shaman, Ming-Hsiang Tsou, Paola Velardi,
Alessandro Vespignani, Lyn Finelli, and the Influenza Forecasting Contest Working Group.
17
Results from the Centers for Disease Control and Prevention’s Predict the 2013–2014 Influenza
Season Challenge. BMC Infectious Diseases, 16(1):357, 2016.
[3] Matthew Biggerstaff, Michael Johansson, David Alper, Logan C Brooks, Prithwish
Chakraborty, David C Farrow, Sangwon Hyun, Sasikiran Kandula, Craig McGowan, Naren
Ramakrishnan, Roni Rosenfeld, Jeffrey Shaman, Rob Tibshirani, Ryan J Tibshirani, Alessan-
dro Vespignani, Wan Yang, Qian Zhang, and Carrie Reed. Results from the second year of a
collaborative effort to forecast influenza seasons in the United States. Epidemics, 24:26–33,
2018.
[4] Craig J McGowan, Matthew Biggerstaff, Michael Johansson, Karyn M Apfeldorf, Michal
Ben-Nun, Logan Brooks, Matteo Convertino, Madhav Erraguntla, David C Farrow, John
Freeze, Saurav Ghosh, Sangwon Hyun, Sasikiran Kandula, Joceline Lega, Yang Liu, Nicholas
Michaud, Haruka Morita, Jarad Niemi, Naren Ramakrishnan, Evan L Ray, Nicholas G Reich,
Pete Riley, Jeffrey Shaman, Ryan Tibshirani, Alessandro Vespignani, Qian Zhang, Carrie
Reed, and The Influenza Forecasting Working Group. Collaborative efforts to forecast seasonal
influenza in the United States, 2015–2016. Scientific Reports, 9(1):683, 2019.
[5] Sen Pei, Sasikiran Kandula, Wan Yang, and Jeffrey Shaman. Forecasting the spatial trans-
mission of influenza in the United States. Proceedings of the National Academy of Sciences,
115(11):2752–2757, 2018.
[6] Sasikiran Kandula, Teresa Yamana, Sen Pei, Wan Yang, Haruka Morita, and Jeffrey Shaman.
Evaluation of mechanistic and statistical methods in forecasting influenza-like illness. Journal
of The Royal Society Interface, 15(144):20180174, 2018.
[7] Sen Pei and Jeffrey Shaman. Counteracting structural errors in ensemble forecast of influenza
outbreaks. Nature Communications, 8(1):925, 2017.
[8] Sasikiran Kandula and Jeffrey Shaman. Reappraising the utility of Google Flu Trends. PLoS
Computational Biology, 15(8):e1007258, 2019.
[9] Logan C Brooks, David C Farrow, Sangwon Hyun, Ryan J Tibshirani, and Roni Rosenfeld.
Flexible modeling of epidemics with an empirical Bayes framework. PLoS Computational
Biology, 11(8):e1004382, 2015.
[10] David C Farrow, Logan C Brooks, Sangwon Hyun, Ryan J Tibshirani, Donald S Burke,
and Roni Rosenfeld. A human judgment approach to epidemiological forecasting. PLoS
Computational Biology, 13(3):e1005248, 2017.
18
[11] Logan C Brooks, David C Farrow, Sangwon Hyun, Ryan J Tibshirani, and Roni Rosenfeld.
Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions.
PLoS Computational Biology, 14(6):e1006134, 2018.
[12] Dave Osthus, Kyle S Hickmann, Petruţa C Caragea, Dave Higdon, and Sara Y Del Valle.
Forecasting seasonal influenza with a state-space SIR model. The Annals of Applied Statistics,
11(1):202, 2017.
[13] Dave Osthus, James Gattiker, Reid Priedhorsky, and Sara Y Del Valle. Dynamic Bayesian
influenza forecasting in the United States with hierarchical discrepancy (with discussion).
Bayesian Analysis, 14(1):261–312, 2019.
[14] Dave Osthus, Ashlynn R Daughton, and Reid Priedhorsky. Even a good influenza forecast-
ing model can benefit from internet-based nowcasts, but those benefits are limited. PLoS
Computational Biology, 15(2):e1006599, 2019.
[15] Kyle S Hickmann, Geoffrey Fairchild, Reid Priedhorsky, Nicholas Generous, James M Hyman,
Alina Deshpande, and Sara Y Del Valle. Forecasting the 2013–2014 influenza season using
Wikipedia. PLoS Computational Biology, 11(5):e1004239, 2015.
[16] Evan L Ray, Krzysztof Sakrejda, Stephen A Lauer, Michael A Johansson, and Nicholas G
Reich. Infectious disease prediction with kernel conditional density estimation. Statistics in
Medicine, 36(30):4908–4929, 2017.
[17] Evan L Ray and Nicholas G Reich. Prediction of infectious disease epidemics via weighted
density ensembles. PLoS Computational Biology, 14(2):e1005910, 2018.
[18] Nicholas G Reich, Logan C Brooks, Spencer J Fox, Sasikiran Kandula, Craig J McGowan,
Evan Moore, Dave Osthus, Evan L Ray, Abhinav Tushar, Teresa K Yamana, Matthew Bigger-
staff, Michael A Johansson, Roni Rosenfeld, and Jeffrey Shaman. A collaborative multiyear,
multimodel assessment of seasonal influenza forecasting in the United States. Proceedings of
the National Academy of Sciences, 116(8):3146–3154, 2019.
[19] Michal Ben-Nun, Pete Riley, James Turtle, David P Bacon, and Steven Riley. Forecast-
ing national and regional influenza-like illness for the USA. PLoS Computational Biology,
15(5):e1007013, 2019.
[20] Shihao Yang, Mauricio Santillana, and Samuel C Kou. Accurate estimation of influenza
epidemics using Google search data via ARGO. Proceedings of the National Academy of
Sciences, 112(47):14473–14478, 2015.
19
[21] Fred S Lu, Mohammad W Hattab, Cesar Leonardo Clemente, Matthew Biggerstaff, and
Mauricio Santillana. Improved state-level influenza nowcasting in the United States leveraging
Internet-based data and network approaches. Nature Communications, 10(1):147, 2019.
[22] Michael W Davidson, Dotan A Haim, and Jennifer M Radin. Using networks to combine
“big data” and traditional surveillance to improve influenza predictions. Scientific Reports,
5:8154, 2015.
[23] Leonhard Held, Sebastian Meyer, and Johannes Bracher. Probabilistic forecasting in infec-
tious disease epidemiology: the 13th Armitage lecture. Statistics in Medicine, 36(22):3443–
3460, 2017.
[24] Martyn Plummer. Jags: A program for analysis of Bayesian graphical models using Gibbs
sampling. In Proceedings of the 3rd international workshop on distributed statistical comput-
ing, volume 124, page 10. Vienna, Austria., 2003.
[25] Martyn Plummer. rjags: Bayesian Graphical Models using MCMC, 2018. R package version
4-8.
[26] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria, 2018.
[27] Nicholas G Reich, Craig J McGowan, Teresa K Yamana, Abhinav Tushar, Evan L Ray, Dave
Osthus, Sasikiran Kandula, Logan C Brooks, Willow Crawford-Crudell, Graham Casey Gib-
son, Evan Moore, Rebecca Silva, Matthew Biggerstaff, Michael A Johansson, Roni Rosenfeld,
and Jeffrey Shaman. A collaborative multi-model ensemble for real-time influenza season
forecasting in the US. bioRxiv, page 566604, 2019.
[28] Johannes Bracher. On the multibin logarithmic score used in the FluSight competitions.
Proceedings of the National Academy of Sciences, 2019.
[29] Nicholas G Reich, Dave Osthus, Evan L Ray, Teresa K Yamana, Matthew Biggerstaff,
Michael A Johansson, Roni Rosenfeld, and Jeffrey Shaman. Reply to Bracher: Scoring prob-
abilistic forecasts to maximize public health interpretability. Proceedings of the National
Academy of Sciences, 2019.
[30] Kelly R Moran, Geoffrey Fairchild, Nicholas Generous, Kyle Hickmann, Dave Osthus, Reid
Priedhorsky, James Hyman, and Sara Y Del Valle. Epidemic forecasting is messier than
weather forecasting: The role of human behavior and internet data streams in epidemic
forecast. The Journal of Infectious Diseases, 214(suppl 4):S404–S408, 2016.
1
Supplemental Information:
Multiscale Influenza Forecasting
Dave Osthus∗a and Kelly R. Morana,b
a
Los Alamos National Laboratory, Statistical Sciences Group
b
Duke University, Department of Statistical Science
Throughout Dante’s specification, pragmatic prior distributional and hyperparameter choices were
made that we believe are, on balance, reasonable. We feel justified in this position, given Dante’s
winning performance in the CDC’s prospective 2018/19 FluSight challenge. That said, it is pos-
sible further improvements to Dante’s forecasting performance could be achieved through a more
rigorous investigation into choices of priors and hyperparameters. A rigorous investigation could
include performing formal cross-validation over combinations of prior distributions and hyperpa-
rameters.
Let yrst ∈ (0, 1) represent observed influenza-like illness (ILI) as a proportion for state r =
1, 2, . . . , R, for flu season s = 1, 2, . . . , S, and week of season t = 1, 2, . . . , T . Dante models ILI
as
∗ Email correspondence can be directed to dosthus@lanl.gov
2
yrst |θrst , λr ∼ Beta(λr θrst , λr (1 − θrst )), (8)
where θrst ∈ (0, 1). That is, observed ILI/100 is modeled with a Beta distribution where
ILI (proportion)
ILI (proportion)
ILI (proportion)
0.02
0.02 0.02
0.02
0.01
0.01
0.01
0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30
Figure S.1: The underlying mean of Dante’s data model (left) and three realizations from Dante’s
data model under three different values of λr . When λr is large, realizations (black line) closely
match the underlying mean process (grey line). As λr gets smaller, realizations from the data
model deviate more significantly from the underlying mean process.
where t[0,∞) is a half, non-standardized t-distribution with support in the interval [0, ∞). The
degrees of freedom parameter is set to be 3 to ensure that the mean and variance for the non-
3
standardized t-distribution are defined while still maintaining a heavy-tailed distribution. The
non-centrality parameter is set to 0, corresponding to a central, non-standardized t-distribution.
The Gamma(5,5) prior for λprec was selected so that the prior expectation for λprec is 1. A non-
standardized t-distribution with non-centrality parameter equal to 0 and λ = 1 corresponds to
the Student’s t-distribution with only a degrees of freedom parameter. The parameter λr was
not modeled explicitly as a function of number of patient visits as 1) the form of the relationship
between λr and patient visit volume was not known a priori and 2) the number of future patient
visits are not available for forecasting purposes.
The process model is a model for the quantity θrst , the latent expected ILI proportion for state r
in season s for week of season t. Dante model’s θrst as
That is, Dante models θrst as a function of the sum of four components: µall state
t , µrt , µseason
st , and
µinteraction
rst . The models for those components are subsequently described.
2,common
µall
1 |σ1 ∼ N(0, σ12,common ), (15)
µall all
t |µt−1 , σ
2,common
∼ N(µall
t−1 , σ
2,common
), (18)
4
S.1.2.2 Process Model: µstate
rt
seasons within a state but distinct across states. The term µstate
rt is modeled as a hierarchical
random walk.
For r = 1, 2, . . . , R, µstate
r,1 is modeled as
2,state
µstate
r,1 |σ0 ∼ N(0, σ02,state ), (21)
For r = 1, 2, . . . , R and t = 2, 3, . . . , T ,
µstate
rt |µstate 2,state
r,t−1 , σr ∼ N(µstate 2,state
r,t−1 , σr ). (24)
For r = 1, 2, . . . , R,
with
states within a season but distinct across seasons. The term µseason
st is modeled as a reverse random
walk. The reverse random walk requires a prior specification for the final week of the season, T ,
rather than the first week of the season as is required with a random walk. Terms indexed by
season s are prone to large uncertainty intervals when used for forecasting, as no data are available
to constrain their trajectories. By modeling µseason
st as a reverse random walk and placing a prior
on time T , we effectively turn our forecasting extrapolation problem into an interpolation problem.
Though the flu season is highly variable in the middle of the season, it is quite well-constrained
5
at the end of the season (e.g., the end of May). This makes specifying a prior at the end of the
flu season a safer proposition than it may at first appear.
µseason
sT |σT2,season ∼ N(0, σT2,season ), (27)
µseason
st |µseason
s,t+1 , σ
2,season
∼ N(µseason
s,t+1 , σ
2,season
), (30)
The model for σ 2,season is constrained to be less than σT2,season helping constrain the trajectories
of µseason
st from wandering wildly.
2,interaction 2,interaction
µinteraction
rsT |ηrinteraction , σrT ∼ N(ηrinteraction , σrT ), (32)
where the variance of 20 was selected as a wide, uninformative prior for σ02,interaction . For r =
1, 2, . . . , R, s = 1, 2, . . . , S and t = 1, 2, . . . , T − 1,
2,interaction 2,interaction
µinteraction
rst |µinteraction
rs,t+1 , αrinteraction , σrt ∼ N(αrinteraction µinteraction
rs,t+1 , σrt ). (35)
6
The autoregressive term αrinteraction ∈ (0, 1) helps regularize µinteraction
rst towards 0. Because
µinteraction
rst is modeled as a reverse-random walk, as we step backward in time from T to T − 1 all
the way to current time t (i.e., T minus some k ≥ 0), the shrinking effect on
2,interaction
E(µinteraction
rst |µinteraction
rs,t+1 , αrinteraction , σrt )
due to αrinteraction ∈ (0, 1) increases with distance from T (i.e., with increasing k). Specifically,
by the law of iterated expectations and noting all expectations are implicitly conditioned on
2,interaction
ηrinteraction , αrinteraction , and σrt for t = 1, 2, . . . , T , we have the following:
E(µinteraction
rsT ) = ηrinteraction , (Equation (32))
E(µinteraction
rs,T −1 ) = E(E(µinteraction
rs,T −1 |µinteraction
rsT ))
= E(αrinteraction µinteraction
rsT ) (Equation (35))
= αrinteraction E(µinteraction
rsT )
= αrinteraction ηrinteraction ,
..
.
E(µinteraction
rs,T −k ) = E(E(µinteraction
rs,T −k |µinteraction
rs,T −(k+1) ))
= E(αrinteraction µinteraction
rs,T −(k+1) )
= αrinteraction E(µinteraction
rs,T −(k+1) )
..
.
= (αrinteraction )k E(µinteraction
rsT )
= (αrinteraction )k ηrinteraction .
We found this regularization helps the forecast intervals from getting too large, as µinteraction
rst is
the most challenging term in Dante to learn as only data from the state r and season s directly
constrain it. For forecasting purposes, we have no or only partial data available for season s. In
general, more care needs to be taken to model and constrain components indexed by s, as there
will be little data to constrain those terms when Dante is used for forecasting.
7
αrinteraction |νainteraction , νbinteraction ∼ Beta(νainteraction , νbinteraction ), (36)
The prior means for νainteraction and νainteraction are both 1. A Beta(1,1) is a Uniform(0,1) distribu-
tion. Thus, the prior choices for αrinteraction reflect a lack of prior information for the autoregressive
term αrinteraction .
2,interaction interaction
σrt |λ ∼ t[0,∞) (0, λinteraction , 3), (39)
Samples from the posterior distributions and posterior predictive distributions of Dante are drawn
via Markov chain Monte Carlo (MCMC). We use the software JAGS (Just Another Gibbs Sampler)
as called from the rjags package in R. We run three MCMC chains, each run for 30,000 iterations,
keeping every 10th iteration. We throw out the first 1,500 thinned samples as burnin, resulting in
1,500 samples per chain, or a total of M = 4,500 samples per JAGS run.
Below is the JAGS code used to fit and sample from Dante. The following quantities are passed
into JAGS:
• NT is a scalar indicating how many weeks are to be modeled in the flu season. It is set equal
to 35.
• NR_state is a scalar for the number of states to be modeled. It is set equal to 54, including
Florida.
• NS is a scalar for the number of flu seasons to be modeled. It is set equal to 8, corresponding
to the 8 seasons inclusively between the 2010/11 and 2017/18 seasons.
• nobs is a scalar denoting how many weeks of the flu season have been observed at the time
of forecasting. It is set to an integer between 5 and 29, inclusively.
• fcstseason is a scalar denoting the index for the flu season being forecasted. It is set to
an integer between 3 and 8, inclusively. Due to missing data, we do not forecast seasons
8
2010/11 and 2011/12. However, the available data for those two seasons are used for fitting.
• yobs_nat is a vector of length nobs with the national wILI/100 estimates corresponding to
the first nobs weeks of the fcstseason flu season.
• yobs_region is an nobs × 10 matrix, with each row corresponding to a week of the flu
season and each column corresponding to an HHS Region. The yobs region[t,r] entry is
the wILI/100 estimate corresponding to week t of flu season fcstseason for HHS Region r.
• yobs_state is an NR state × NS × nobs dimensional array where the first dimension indexes
states, the second dimension indexes flu season, and the third dimension indexes week of flu
season. The yobs state[r,fcstseason,t] entry is the ILI/100 estimate for state r of flu
season fcstseason of week of flu season t.
• census_weights is an NR state × 11 matrix where each column sums to 1 and each en-
try is greater than or equal to 0. The first 10 columns correspond to the 10 HHS Re-
gions. The 11th column corresponds to the nation. Each row corresponds to a state. The
census weights[r,i] entry is the relative weight state r has in geographic region i, as
determined by 2010 US Census population estimates. If the entry census weights[r,i]
equals 0, that means state r is not in geographic region i.
The JAGS code for Dante, state-level model and aggregation model, is below.
model{
#############################################
## AGGREGATION MODEL
#############################################
## national forecast model
for(t in (nobs+1):NT){
futurey_nat[t] <- t(census_weights[,11]) %*% futurey_state[,t]
}
for(t in 1:nobs){
futurey_nat[t] <- yobs_nat[t]
9
}
#############################################
## region forecast model
for(r in 1:10){
for(t in (nobs+1):NT){
futurey_hhs[r,t] <- t(census_weights[,r]) %*% futurey_state[,t]
}
}
for(r in 1:10){
for(t in 1:nobs){
futurey_hhs[r,t] <- yobs_region[t,r]
}
}
#############################################
## draw from state-level posterior predictive distribution
for(r in 1:NR_state){
for(t in (nobs+1):NT){
futurey_state[r,t] ~ dbeta(lambda[r] * pi_all[r,fcstseason,t],
lambda[r] * (1 - pi_all[r,fcstseason,t]))
}
}
for(r in 1:NR_state){
for(t in 1:nobs){
futurey_state[r,t] <- yobs_state[r,fcstseason,t]
}
}
##############################################
## DANTE’S DATA MODEL
##############################################
## state data model
for(r in 1:NR_state){
for(s in 1:NS){
for(t in 1:NT){
## data model
y_state[r,s,t] ~ dbeta(lambda[r] * pi_all[r,s,t],
lambda[r] * (1 - pi_all[r,s,t]))
## data mean
pi_all[r,s,t] <- ilogit(mu_all[t] + mu_season[s,NT+1-t] + mu_state[r,t] + mu_interaction[r,s,NT+1-t])
}
}
10
}
for(r in 1:NR_state){
lambda[r] ~ dt(0, lambda_prec, 3) T(0,)
}
lambda_prec ~ dgamma(5, 5)
###########################################
## DANTE’S PROCESS MODEL
###########################################
## mu_all time series
mu_all[1] ~ dnorm(0, pow(var_all_init, -1))
for(t in 2:NT){
mu_all[t] ~ dnorm(mu_all[t-1], pow(var_all,-1))
}
###########################################
## mu_season time series
for(s in 1:NS){
mu_season[s,1] ~ dnorm(0, pow(var_season_init, -1))
}
for(s in 1:NS){
for(t in 2:NT){
mu_season[s,t] ~ dnorm(mu_season[s,t-1], pow(var_season, -1))
}
}
###########################################
## mu_state time series
for(r in 1:NR_state){
mu_state[r,1] ~ dnorm(0, pow(var_state_init, -1))
11
}
for(r in 1:NR_state){
for(t in 2:NT){
mu_state[r,t] ~ dnorm(mu_state[r,t-1], pow(var_state[r], -1))
}
}
for(r in 1:NR_state){
var_state[r] ~ dt(0, prec_state, 3) T(0,)
}
prec_state ~ dgamma(5, 5)
###########################################
## mu_interaction time series
## mu_interaction initialization
for(r in 1:NR_state){
for(s in 1:NS){
mu_interaction[r,s,1] ~ dnorm(mu_interaction_mean[r], pow(var_interaction[r,1], -1))
}
mu_interaction_mean[r] ~ dnorm(0,pow(var_interaction_mean,-1))
}
var_interaction_mean ~ dnorm(0,pow(.05,-1)) T(0,)
for(r in 1:NR_state){
for(s in 1:NS){
for(t in 2:NT){
mu_interaction[r,s,t] ~ dnorm(alpha_interaction[r] * mu_interaction[r,s,t-1], pow(var_interaction[r,t], -1))
}
}
}
for(r in 1:NR_state){
alpha_interaction[r] ~ dbeta(alpha_interaction_a, alpha_interaction_b)
}
alpha_interaction_a ~ dgamma(5, 5)
alpha_interaction_b ~ dgamma(5, 5)
12
}
prec_interaction ~ dgamma(5, 5)
}
S
X
vr = S −1 vrs , (41)
s=1
where
v
u
u T
X
vrs = t(T − 1)−1 ∗ − y∗
(yrst 2
rs,t−1 ) . (42)
t=2
Furthermore,
∗
yrst = (yrst − ȳrs )/σrs , (43)
where
T
X
ȳrs = T −1 yrst , (44)
t=1
T
X
σrs = (T − 1)−1 (yrst − ȳrs )2 , (45)
t=1
∗
S is the number of flu seasons, and T is the number of epidemic weeks. yrst is the region/season
standardized (w)ILI which has a mean of zero and a standard deviation of one by construction.
Without standardizing, regions with higher (w)ILI would appear to be more volatile simply because
they have higher levels of (w)ILI. The average standardized week-to-week volatility for state r
represents a measure of how volatile the ILI time series is for that state.
13
S.4 State ILI
Figure S.2 displays ILI for all states for the 2010 through the 2017 flu seasons. The black line
represents the average national wILI over that same time period for reference. Significant state-to-
state variability exists, with some states having ILI consistently above the national average (e.g.,
Alabama, Oklahoma, Puerto Rico, Texas), and some states consistently below the national average
(e.g., Montana, New Hampshire, Oregon, Rhode Island). Some states are more volatile (e.g., North
Dakota, Puerto Rico) while some are less volatile (e.g., Virginia, New York City).
For concreteness, we show that wILI on EW49 of 2017 for HHS Region 9, reported as 2.596, can
be computed as the weighted combination of state ILI. HHS Region 9 has four states: Arizona,
California, Hawaii, and Nevada. The 2010 US Census population estimates for the four HHS
Region 9 states are 6,407,774 (AZ), 37,320,903 (CA), 1,363,963 (HI), and 2,702,464 (NV), for a
total of 47,795,104 people in HHS Region 9. The state weights are then 0.134 (AZ), 0.781 (CA),
0.029 (HI), and 0.057 (NV). CDC reported ILI for EW49 of 2017 for those states are 3.284 (AZ),
2.498 (CA), 4.341 (HI), and 1.434 (NV).
14
S.6 Forecasting targets
The forecasting targets as selected by the CDC for the FluSight challenge are defined as fol-
lows:
– The (w)ILI percentage for each target week n weeks ahead of the most recently released
(w)ILI data, where n = 1, . . . , 4.
• Peak intensity:
• Peak timing:
– The week(s) in which the highest value of (w)ILI in a given season is reached.
• Onset:
– The first week in which (w)ILI reaches or exceeds the given regional/national baseline
value for three consecutive weeks in a season. Note that ‘baseline,’ a threshold used
to define epidemic onset, is calculated by adding two standard deviations to the mean
observed ILI levels in non-flu weeks during the previous three influenza seasons; see
https://www.cdc.gov/flu/weekly/overview.htm.
Due to backfill, the CDC sets a validation date at which to record all of the above ‘validation’
values for each forecast challenge. For this paper, data for all seasons were collected from the
CDC’s website on Friday, October 12th of 2018 and validation %ILI values are considered those
present in the collected data set.
The ILINet national and regional baseline values are available at www.cdc.gov/flu/weekly/
overview.htm. Note that state data do not have defined historic baseline values.
A visual example of Dante’s national-level predictions for each of the forecasting targets may be
found in Figure S.5.
Scoring protocols used in this paper correspond to the stated scoring procedure used by the CDC’s
FluSight challenge for the 2018/19 season. Complete national and regional FluSight challenge de-
scription and scoring guidelines may be found by navigating to the ‘Guidance Documents’ page
15
from predict.cdc.gov/post/5ba1504e5619f003acb7e18f. State FluSight guidelines are avail-
able by navigating to the ‘Guidance Documents’ page from https://predict.cdc.gov/post/
5ba5389fa983f303b832726b.
All forecasts are evaluated using the validation %ILI data. That is, the larger effects of backfill
should have subsided and the ILINet observations used are stable from week-to-week. This lack
of backfill leads to a relatively higher performance of both Dante and DBM relative to their
performance in a real-time FluSight challenge.
Note that for the purposes of finding the peak timing, onset, and scoring, all (w)ILI values are
rounded to the nearest tenth. For example, a reported (w)ILI of 5.387% would become 5.4%. The
scoring bin width is one tenth of a percentage for short-term targets and peak intensity, and one
week for peak timing and season onset.
– The probability assigned to the correct bin plus the probabilities assigned to the five
preceding and proceeding bins are summed.
– For example, if (w)ILI at week 49 is 2.5%, the probabilities assigned to all bins ranging
from 2.0% to 3.0% inclusively are summed in the 1-week-ahead forecast from week 48
to get that week’s 1-week ahead forecast skill.
• Peak intensity:
– The probability assigned to the correct bin plus the probabilities assigned to the five
preceding and proceeding bins are summed.
– For example, if (w)ILI peaks at 5.4%, the probabilities assigned to all bins ranging from
4.9% to 5.9% inclusively are summed to get the skill for peak timing.
• Peak timing:
– The probability assigned to the correct week(s) plus the probabilities assigned to the
immediately preceding and proceeding weeks are summed.
– In the case of multiple peaks, the probability assigned to each correct week plus the
probability assigned to the non-overlapping preceding and proceeding weeks for each
peak is summed; each week’s probability is only counted once.
16
– For example, if (w)ILI peaks at 5.4% on weeks 3 and 5, then the probabilities assigned
to weeks 2, 3, 4, 5, and 6 are summed to get the skill for peak timing.
• Onset:
– The probability assigned to the correct week(s) plus the probability assigned to the
immediately preceding and proceeding weeks is summed.
– For example, if the onset is week 48, then the probabilities assigned to weeks 47, 48,
and 49 are summed to get the skill for onset.
For all targets, if the correct week/bin is near the first or last possible bin then the total number
of weeks/bins summed for scoring is reduced accordingly. For example, if the validation (w)ILI
value is 0.3%, then only bins ranging from 0% to 0.8% will be counted for scoring n−week ahead
predictions (there won’t be extra bins counted above 0.8% to ‘make up for’ the decrease in the
number of lower bins).
In this paper, forecasts are made beginning at epi-time 5 (i.e., epi-week 45, after four weeks of
data from a given season are available) and the last forecast is made at epi-time 29 (i.e., after the
influenza season has long been over).
When calculating an overall score for a given location, season, and/or model, we take the geometric
mean of a subset of the scores for that location/season/model. That is, relevant log skills across
the targets of interest are averaged and the result is exponentiated. However, the weeks at which
a forecast is made and the target chosen impact whether or not a given week’s forecast skill
is included. The scoring inclusion window used in this paper is the intersection of this paper’s
forecast window (epi-time 5 through 29) and the CDC definition for weeks to score (discussed
below). Note that when skill is 0 (i.e., when the probability assigned to the scorable bin(s) by the
model is 0), the log score is set to -10 for that target.
For state forecasts, all weeks are included in the average score calculation because there are
no state-level baseline values. For regional and national forecasts, the CDC chooses evaluation
periods based on utility of said forecasts. For all seasonal targets (i.e. onset, peak timing, and peak
intensity), the CDC evaluation period begins with the first forecast submission. For onset, the
17
CDC evaluation period ends six weeks after the validation onset. For peak timing and intensity,
the evaluation period ends after (w)ILI is observed to go below baseline for the final time during
an influenza season (we interpret this to mean the scored weeks are those up to and including the
first week below baseline after which all weeks are below baseline). For short-term forecasts (i.e.,
n−week ahead predictions) the CDC evaluation period begins four weeks prior to the observed
onset week and ends three weeks after (w)ILI is observed to go below baseline for the final time
during an influenza season. We assume all windows are inclusive (i.e., three weeks after implies
that the third week out is counted). The idea is that forecasts within these windows are those
that would be most practically useful.
As an example, suppose the goal is to generate an overall score for Dante’s onset forecasts for
HHS6 in 2014. The historical baseline for HHS6 in 2014 was 3.2, and the observed onset week was
week 47 (epi-time 8). The CDC evaluation period would include forecasts made during epi-time 1
through 14, inclusive. We produce model forecasts during epi-time 5 through 29. The overall score
would be the geometric mean (i.e., the exponentiated log average score) for HHS6 onset forecasts
made on epi-time 5 through 14, inclusive, in 2014. For the purposes of illustration, we round the
following values. Specifically, the skill at epi-time 5 (week 44) is 0.27, meaning that in week 44
the sum of the probability assigned to weeks 46, 47, and 48 (i.e., the validation onset week and its
preceding and proceeding week) was 0.27. At epi-time 6, the skill is 0.22. At epi-times 7 through
14 the skills are 0.10, 0.68, 0.99, 0.99, 0.99, 0.99, 0.99, and 0.99. Thus the summary score for onset
of the 2014/2015 season of flu in HHS6 is:
ln(0.27)+ln(0.22)+ln(0.10)+ln(0.68)+ln(0.99)+ln(0.99)+ln(0.99)+ln(0.99)+ln(0.99)+ln(0.99)
e 10 = 0.57
In order to get an overall score for Dante’s forecasts for HHS6 across all targets and seasons, the
geometric mean would be taken of all of the relevant scores from each season-target combina-
tion.
The state data were pulled from the CDC’s website on Friday, October 12th of 2018 before being
cleaned. In the data, the ili variable is the percent of patients having ILI symptoms out of the
total number of patients seen. For the majority of observations, the ili variable is equivalent to 100
times the ilitotal variable divided by the total patients variable. When total patients= 0,
18
ili is set to NA because the true proportion of ILI among the population is unknown. However,
there are cases in which there are ili = NA estimates, even though the total patients > 0 and
ilitotal= 0. We change the value of ili in this scenario (i.e., the scenario when total patients
> 0 and ilitotal= 0) to be 0 rather than NA as there is information about the ILI estimate. It
just happens to be an estimate of 0.
Table S.1: (Left) The relationship between ili, ilitotal, and total patients pre-cleaning.
Note that in the third row ili is NA but total patients> 0. (Right) The relationship between
ili, ilitotal, and total patients post-cleaning. In the third row the NA has been replaced
with a 0. The NA in the fourth row persists because for this observation total patients is 0.
Note that in recent releases of (w)ILI data, the presence/absence of NA coding may differ from
that used in older data releases. Should this model be used for new data, cleaning said data with
the logic used above for when (w)ILI should be 0 vs. NA is recommended.
After using the above data cleaning techniques, 3 missing state-season-week observations persisted
(see Figure S.6), excluding completely missing state-seasons (e.g., Florida, for which no data were
available for any state-season-week, and Puerto Rico in 2012). For a given state-season, if any
weeks are missing data it is theoretically possible for that week to be the validation peak week.
Thus, the seasonal targets of peak timing and peak intensity have NA for a ground truth in instances
in which that state-season has any missing data and the forecast skill for those seasonal targets
must also be NA.
Practically speaking, however, not every missing week need preclude scoring seasonal targets for
a whole state-season. For example, it is incredibly unlikely for influenza to peak in mid-May.
On the other hand, a missing week in mid-December right near the highest observed ILI value
has a much higher chance of being the validation peak. For determining which weeks should be
considered ‘irrelevant’ missing weeks, we calculate the historic minimum and maximum observed
peak timing from the 2010/2011 season to the 2017/2018 season for each state. We then utilize
this state-specific min/max historic peak timing minus/plus a three week ‘buffer’ as the (inclusive)
19
window in which an NA will lead to that state-season seasonal forecasts are determined to be un-
scorable.
There are no NAs present in the region-level data. When a state-season-week is missing for a given
region, that state is omitted from the weighted average calculation for that region’s wILI and the
weights are adjusted accordingly.
As mentioned previously, a skill of 0 means that the log-score is set to -10 for that target per
CDC guidelines. In practice, we place a small probability in every bin, to eliminate the possibility
of incredibly damaging -10s. For peak week and onset, we choose 0.00018 to be the smallest
probability in every bin, where the worst possible log score will then be ln(3 ∗ .00018) ≈ −7.5
(using the multibin log score); for peak percentage and short-term forecasts, we choose .00005
so the worst possible log score will be about ln(11 ∗ .00005) ≈ −7.5. Note that these particular
padding values were chosen as a simple risk mitigation step. Further study could be done to
determine some “optimal” pad for scoring, but for the purposes of this paper the main goal was
comparability of Dante and DBM so the same numeric pad was used for each.
We consider detailed results for Danta’s forecast skill for each prediction time broken down by
target-season at the national level and the state level for two example states. Discussion is based
on Figures S.7–S.10. Note that in Figures S.7–S.9 the x-axis shows the time at which the forecast
was made, while the (w)ILI data that are overlaid have been shifted to the left n weeks for each
n-week ahead skill plot. For example, for the 1-week-ahead forecast skill plots the skill plotted at
epi time 5 is the skill in forecasting (w)ILI at epi time 6, while the scaled inverse (w)ILI plotted
at epi time 5 is the corresponding (w)ILI value from epi time 6.
Figure S.7 shows Dante’s forecast skill for each prediction time broken down by target-season.
Early in the season, the seasonal targets each tend to have low scores; by the end of the season
20
(after the seasonal metric has been observed) the scores increase to near 1. The short-term
forecasts are much more variable, but broadly skill is inversely related to wILI.
Consider the first row of Figure S.7. By at most three weeks after an onset has been observed (the
defining feature of onset being the consistent wILI above baseline for three weeks), the onset skill
reaches near 1 because the model uses the observed wILI data as its prediction upon observation
and there is no backfill to contend with, unlike with current-season forecasts. Of note is that for
2015, which was an unusually late-peaking season, the onset skill is near 0 early in the season
because at this point the model is basing its predictions on previous seasons’ trajectories, meaning
that a low-flu late-peaking season has a low probability of occurring before observing much data
from the current season.
The final four rows of Figure S.7 illustrate the inverse relationship between skill and wILI. This
general phenomenon is likely partly due to the nature of the scoring for short-term targets. Recall
that the probabilities assigned to the bins from -0.5 to 0.5 away from the validation wILI level are
summed. If the validation wILI level is very high, the model has more room to be ‘wrong’ even
if it is generally right that levels are high. For example, if the validation wILI one week ahead
is 7 and the model puts most of its weight around a wILI of 6, the model will score very poorly
even though the concept of high wILI is correct. On the other hand, if the validation wILI is low
and the model predicts that wILI is low, more probability will tend to be near the truth simply
because there are fewer ‘low’ wILI values. The higher skill during early/late season is likely also
partly due to the fact that wILI behaves much more consistently at the beginning/end of seasons
when flu is not circulating as heavily, and week-to-week changes tend to be smaller and occur
more gradually, so the trajectory learned from previous seasons and other states in the model is
more useful to a new given state-season combination at the tails of the season.
Figures S.8 and S.9 show target-season skill plots for two example states (Alabama and Montana,
respectively). The same general trends with seasonal target skill for national data are found
in the state-level seasonal target skills with some notable differences. For many target-season
combinations, the rise of state-level peak incidence/timing (pi/pt) skill to near 1 occurs later after
the peak has occurred than for the national pi/pt skill. For example consider the pi/pt skill for
Montana in the 2016 season (the first and second rows in the second from the rightmost column
in Figure S.9). After the peak has occurred, it is nearly the end of the season before pi/pt skill
rises to near 1. Looking at the bottom row in the second from the rightmost column in Figure
S.10, which shows the validation ILI for Montana in 2016, it is evident that this was a particularly
21
early/weak-peaking and low-magnitude season for Montana. The model had trouble telling that a
peak had in fact occurred, so some probability remained allocated to later peak timing and higher
peak incidence. In Montana, pi/pt skill rises almost immediately as the peak occurs in 2015, likely
because this is a fairly prominent (by Montana standards) peak occurring late in the season, so
the model very readily places high probability on it being the peak after its occurrence.
The skill for short-term targets is in general much higher for Montana than for Alabama, par-
ticularly for seasons in which ILI levels in Montana lack much of a distinct peak (i.e., 2016 and
2017). The inverse relationship between ILI and and short-term forecast skill is less consistent
at the state level than it was at the national level. In Alabama (the bottom four rows of Figure
S.8), the short-term forecasts in 2015 and 2016 track extremely well with inverse ILI; in particular,
note the spiked pattern in both skill and inverse ILI from epidemic time (ET) 11 to 15 for 4-week
ahead forecasts in 2016 (shifted right by one with each decrease in week, so 12 to 16 for 3-week
ahead, etc.). On the other hand, the 1-, 2-, and 4-week ahead forecasts in 2014 corresponding to
the peak week (made at ET 12, 11, and 9, respectively) have high skill and correspond to high ILI.
When looking at the corresponding subplot in Figure S.10, we see that the slope to the peak is
relatively constant save for the 3rd week before the peak for which the 3-week ahead forecast was
poor, suggesting that the high performance may be due to the consistency of the observed path
at the time of forecast with the future path. Another interesting feature of forecasts in Alabama
is that short-term skill tends not to begin as high as that of national forecasts. Of note, consider
the near-0 starting skill at ET 5 in 2012 for 3- and 4-week forecasts. Looking at the middle row of
the leftmost column of Figure S.10, we see that in 2012 ILI in Alabama experienced a dip at ET
4, followed by a small rise in ET 5 and increasingly steeper rises over the subsequent 4 weeks. The
3/4-week forecast from ET 5 is unable to anticipate this rise, hence the extremely poor skill.
Accuracy is measured by mean squared error (MSE) of model point predictions. In the main paper,
we showed that Dante was both more skillful (i.e., higher forecast skill scores) and more confident
(i.e., narrower 90% HPD interval widths) than DBM. We also showed that Dante outperformed
DBM in terms of accuracy across all geographic scales. However, we did now show the relative
performance of Dante and DBM broken down by region, season, or target. Figures S.11 and S.12
provide this additional information. Note that these plots are analogous to the skill plots in the
main paper, but compare MSE rather than skill. Averages were calculated using the same set of
weeks as defined by the CDC guidelines for scoring forecast skill.
Specifically, Figure S.11 shows the difference in accuracy between Dante and DBM for each state,
22
region, and nationally. Dante outperformed DBM for the majority of geographic regions, but its
relative performance is less consistently better as measured by accuracy than it is by skill.
Figure S.12 shows accuracy broken down by targets (left) and flu seasons (right) for each geographic
scale. Dante outperformed DBM for all scales and targets, except for onset nationally and for
peak intensity (PI) regionally and by state. Relative magnitude of differences for peak timing
(PT) appear higher because the MSE are in terms of the target (i.e., MSE for PT is in terms of
squared weeks). Dante also outperformed DBM for the majority of scales and flu seasons, except
for 2013 nationally and for 2014 and 2016 by state.
S.11.1 Normal(µ, σ 2 )
!
2 1 (x − µ)2
f (x|µ, σ ) = exp − (46)
2πσ 2 2σ 2
where x ∈ (−∞, ∞), µ ∈ (−∞, ∞), and σ 2 > 0. The mean and variance for a Normal distribution
are E(X) = µ and Var(X) = σ 2 .
S.11.2 t(µ, λ, k)
! 12 ( )− (k+1)
Γ( k+1
2
2 ) λ λ(x − µ)2
f (x|µ, λ, k) = 1+ (47)
Γ( k2 ) kπ k
where x ∈ (−∞, ∞), the non-centrality parameter µ ∈ (−∞, ∞), the precision λ > 0, and the
k
degrees of freedom k > 0. For k > 1, E(X) = µ. For k > 2, Var(X) = λ(k−2) .
S.11.3 Gamma(α, β)
23
β α α−1
f (x|α, β) = x exp(−βx) (48)
Γ(α)
where x ∈ (0, ∞), α > 0 is the shape parameter and β > 0 is the rate parameter. The mean and
α α
variance for the Gamma distribution are E(X) = β and Var(X) = β2 .
S.11.4 Beta(α, β)
Γ(α + β) α−1
f (x|α, β) = x (1 − x)β−1 (49)
Γ(α)Γ(β)
where x ∈ [0, 1], α > 0, and β > 0. The mean and variance for the Beta distribution are
α αβ
E(X) = α+β and Var(X) = (α+β)2 (α+β+1) .
24
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware
20
8 6
6 9 6 4 6
15
6 3 4
10 4 6 4 4
4 2
2 3 2 2
5 2 1
2
0 0 0 0 0 0
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York
5 10.0 5 10.0
4 6
9 4 9 9
7.5 7.5
3 3 4
6 5.0 6 6
2 5.0
2
2 3
ILI (%)
3 1 2.5 3 2.5
1
0 0 0.0 0 0 0
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
25
New York City North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Puerto Rico
10.0 8 6
8 15
6 6
7.5 6
6 10 4 10
4 4 4
5.0
4 5 2
2.5 2 2 5
2
2
0.0 0 0 0 0
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virgin Islands
8 15 15 8
6 6
6 4 7.5 6
10 10
4 5.0 4 4 4
5 2
2 2.5 5 2 2 2
0 0 0 0.0 0 0 0
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
6 3 4 4 4
2
3 2 2 2
1
0 0 0 0
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Figure S.2: ILI for the 2010 through 2017 flu seasons (grey lines) for all states. Black line is the average wILI nationally for the same time period.
The weeks of the flu season (x-axis) run from roughly the first week of October (x=1) through the end of May (x=35).
2012 2013 2014
10
−5
10
26
0
−5
1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35 1 5 10 15 20 25 30 35
Figure S.3: State detrended ILI time series where the state detrended time series is ILI for a given season minus the average ILI for the state averaged
over all seasons. Pink/dark green lines correspond to seasons where ILI for that season was above/below its state-specific average, respectively. The
black line for each season is the average state detrended ILI trajectory, averaged over all states within a season. 2015 was a mild season for the
majority of states, while 2017 was an intense season for the majority of states.
hhs1 hhs2 hhs3 hhs4 hhs5 hhs6 hhs7 hhs8 hhs9 hhs10
10
●
●
●
●●
●●
●●
●●
●
●●
●
● ●
5 ● ●●
● ●
● ●
● ●
●
●
2010
●
●
●
●
●●
●
● ●●
● ●
●●
●
●● ●
● ●
●
● ●
●
●
●● ●
●●
●
●
●
● ●
●
●● ●
●●
● ●
●● ●
●
●
●● ●
●
●
●●● ●
●●
●
●
● ●●
●
●
●
●●
●
● ●
●●
● ●
●● ●●
●
●
● ●
●●
●
● ● ● ●
●
●
●●
● ●
●●
●
●●
●●
● ●
●
●●
● ●
●
●●
● ●
●●
●
● ●
● ●
●
●
●●
● ●
●● ●●
●
●
●
● ●
●●
●
●
● ●
●●
●
●
●
●
●
●● ●
●
●●
●
●
● ●
●
●
●● ●
●
●
●
●●
● ●
●
●●●
●
● ●
●●
●
● ●
●
●
●
●● ●
●
●
●
● ●
●
●
●
● ●
●
●●
●
●
●●
●
●● ●
●
● ●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●●
● ●
●
●
● ●
●
●●
●
●
● ●
●
●
●●
●
● ●
●
●
●●
●
●
●
●
●
●
●
● ● ●
●
●
●
●● ●
●
● ●
●
●
0 ●
10
5
2011
●
●● ●
●
●
●●
● ●
●
●● ●
●
●●
●
●
●
● ● ●
●
●●
●● ●
●
●
● ●●
●
●
●
●●
●
●
●
●●
● ●
●
●
●
●●
●
● ●●
●
●
● ●
●
●
●
●
●●
●
● ●●
●
● ●
●
●●
● ●
●
●
●●
●
●
● ●●
●
●●
● ●●
● ●
●●
●
●
●●
● ●
●
●
●
●
●
● ●●
●
●
●
●●
●
● ● ●
●
●●
●
● ●
●
●●
●
●
● ●
● ●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
● ●
●
●
● ● ●
●
●
●
●
●
● ●
●
●
●
●●
● ●
●
●
●
● ●
●
●
●
●
●●
●
0 ● ●
10 ●
●
● ●
●
●
●●
●
● ●
● ● ●●
● ●
● ●
●
●
● ●
●● ●
5 ●●
● ●
●●
● ●
●
● ● ●
●
● ●
●
●
●●●
● ●
● ● ● ●
2012
● ●
● ● ●
● ●●
●
●● ●
●●
●
●●
●
● ●●
●
●
●
● ● ●
●● ●
●●
● ●
● ●
●●
●
● ●
●●● ●
●● ●●
●● ●●●
● ●
●●
● ●
●●
●
●
● ●●
●
●
●
●●
● ●
●●●
●
● ●
●
●
●
● ●
●●
●
● ●
●●
●● ●●
●
● ●●
● ●
●●
●●
● ●
●●
●
●
● ●
●
●●
●
●
●
●● ●
●
●●
● ●
●● ●
●
●
●●
● ●
●
●●
●
● ●●
●
●
●
●●
● ●
●
●● ●
●●
● ●
●
●
●
●●
● ●
●
●●
●
●●
●
●
●
● ●
●
●●
●
●● ●
●●●
●
● ●
●●
●
●
● ●
●
●
●
●●
● ●
●
●
●
●
● ●
●●
● ●
●
●●
● ●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●●
● ●
●
● ●
●
● ●
●
●
●
●●
● ●
●
●
●
●
● ●
●
●
●
●●
● ●
●
●
●
●
● ●
●●
●
●
●●
0 ● ● ● ●
●
●
●
●
●
10 ●
●
●
●●
●
●
●
●
●●
●
●
5 ● ●
●●
● ● ●
●
2013
●● ●
●
●● ●
●
● ●●
●
●
● ●
●
●
●
● ●●
●
●
●● ●● ●
●
● ●
●
●●
●
●
●
● ● ●
●● ●
●
●
●
●
●●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●● ●
●
●●● ●
●
●
●
●●
● ●
●
●
● ●●●
● ●●
●
● ●●
●
●
●
● ●●
●
●
●
●●
● ●
●
●●
● ●
●
●
●●
●
● ●●
●
● ●
●
●
●
●
●
● ●● ●
●
●● ●
●
●
●●
●
● ●
●
●
●●
●
●
●
●● ●
●
●
●
●
● ●●
●
●
●●
● ●
●●
●
●
● ●
●
●
●●
●
●
●
● ●●
●
●
● ●
●
●●
●
●
●● ●
●
●
●
●● ●
●
●
●●●
●
●
●
●
●
●
●●
● ●
●
●
●
● ●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
● ●
●
●
●●
●
● ●
●
●
●
●●
● ● ●
●
●●
●
● ●
●
● ● ●
●
●
●
●
0 ●
●
●
10 ●
●●
●
●
● ●
● ●●
●
●
●●
●
● ●
● ● ●
● ● ●●
●
● ●●●
5 ●
●
●
●
● ●
● ●●
● ● ●● ●
● ●
● ●
● ● ● ● ●
●
2014
● ●
● ●
●
● ● ●
●
●●● ●
● ● ●
●
●
●
●
●●
●
● ●● ●● ●●
●● ●● ●●
●
● ●
●
●●● ●
●●●
● ●●
● ●●
●
●
●
●
●● ●●
●
●
●
● ●
●●● ●
●
●
●●
● ●● ●
● ●● ●●
●
●
●● ●
●
●
●● ●
●
●
●●
●● ●
●●
●
●
●
● ●
●
●
● ●
●●
●
●
● ●●
●
●
●●
● ●
●●
●
●
●
●●● ●
●
●
●
●
●
● ●●
● ●
●
●
●●
●
● ●●
●
●
●
●
●●
●
● ●
●
● ●
●
●
●●
●
●
● ●
●
●
●●
● ●
●●
●
●
● ●
●
●
● ●
●●
● ●
●●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●● ●
●
●
●
●
●●
● ●●
●
●
●
●● ●
●
●
●
●
●●
●
● ●
●
●
●●
●
●
●
● ● ● ●
●
● ● ●
●
●
●
0
27
Computed HHS Region wILI
10
5 ●●●
●
●
●
2015
●
●
●● ● ●
●
●
●●
● ●
●●
●
●
●
●
● ●● ●●
● ●● ●
●●
●
●
●
● ●●●
● ●
●●
●● ●
●●
● ●
●
●
●● ●
●
● ●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●● ●●
●
●
●●
●
● ●
●●
●
●
●
● ●
●
●
●●
● ●●
●
● ●
●
● ●
●● ●●
●
● ●
●●
●
●
●
● ●
●
●●
●
●
●●
●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●●
●
● ●
●
●●
●
●
●● ●
●
●
●
●
●●
● ●
●
●●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
●● ●
●●
●
●●
● ●
●
●
●
●
●●
● ● ●
●●
●
●
●
● ●
●
● ● ●
●
●
0
10 ●
●
●
●●
●
●
●● ●●● ●
●
●●
●● ● ●●
● ●
●
●●
●● ●
●
● ●
●● ● ●●
5 ●
●
● ●
● ●●
● ●
●
2016
●
● ● ●●
●
● ● ●
●
● ●
●
●
●
●
● ●● ●
●
●
●
● ●●
●
●
●
●● ●
●
●
●
●
●
●● ●
●
●●
● ●
●●●
●
● ●
●
●
●●
● ●
●●
●●
● ● ●●
●
●
●● ●
●
●
●
●●●
● ●
●●
●
● ●● ●
●● ●
●
●● ●
●
●
●●
● ●
●
●● ●
●●● ●
●
●●
●
●
● ●
●●●
●
●
●●
●
●
●●
● ●
●
●
●
● ●
●
●●
●
●
● ●
●
●
●●
●
●
● ●
●●
●
● ●
●
●
●
● ●
●●
●
● ●
●
●●
●
● ●
●
●●
●
●
●
●
●● ●
●
●●
●
●
●
●●
● ●
●
●
●
●●
● ●
●
●
●
● ●
●
●
●
●●
● ● ●
●
●●
●
● ●
●●
●
●
●● ●
●
● ●
●●
●
●
●
●
● ● ●
● ●
●
●
●
●
●● ●
●
●
● ●
●
●
●
●
●
●●
●
0 ●
●
●
●
●●
●
●
●
●
● ●●
●●
10 ●
●
●
●
● ●
●
●
● ●
●
●
● ●● ●
●●
● ●●
● ● ●
●
●●●
●● ●
●
●
● ●
● ●●
● ●
●● ●●
● ●
● ● ●
●
● ●
● ●●
● ●
●
●● ●
●
5 ●●● ●
●● ●● ●
●● ●
●
●● ●
●
● ● ●
● ● ● ● ●● ● ●
2017
●
● ●
●
● ● ●
● ●
●
●
●
●● ●
●
● ●●
●
●
● ●
●
●●
●
●
●
●●● ●●
● ●
●●
●
● ●●●● ●● ●●
● ● ●
●
●
●●
●
●
● ●
●
●
●●●
● ●
●●
●
●
● ●
●
● ●●
●● ●
●●
● ●●
●
● ●●
● ●
●●
●
●
● ●
●●
●
●
● ●
●●●
●
●
●
●
●● ●
●●
●
●●
● ●●
●
●
●
● ●
●●
●
●
● ●
●●
●
● ●
●
●
●
●●
● ●● ●
●
●
●●
● ●
●
●
●●
● ●●
●
●
●●
●
●●
●
●
●
●● ●
●
●
● ●●
●
●
●●
●
●
● ●●
●
●
●●
● ●
●
●
●
●●
● ●
●
●
●
● ●●
●●
● ●●
●
● ●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
● ●
●
●●
●
●
● ●
●
●●
●
●
● ●
●
●●
●
●
●●
● ●
●
● ●
●
● ●
●
●
●
0
0 5 10 0 5 10 0 5 10 0 5 10 0 5 10 0 5 10 0 5 10 0 5 10 0 5 10 0 5 10
Issued HHS Region wILI
Figure S.4: Health and Human Services (HHS) Region weighted influenza-like illness (wILI) issued by the Centers for Disease Control and Prevention
(CDC) on the x-axis against HHS Region wILI computed as the weighted combination of state-level ILI issued by the CDC on the y-axis. The
weights are proportional to the 2010 US Census population estimates. Computed and issued HHS Regional wILI are nearly identical. Each panel
corresponds to an HHS Region (column) and a flu season (row).
6
Weighted ILI (&)
6
Weighted ILI (%)
● peak intensity
●
●
● ● national baseline
●
2 ●
●
● 1−4 wk ahead
● ● ●
43
46
49
52
12
15
18
21
Epidemic Week
Figure S.5: Top: national wILI for the 2012 through 2017 flu seasons, i.e. those used in the
leave-one-season-out training and testing of DBM and Dante. The 2017 flu season, the hold-out
season considered in the lower plot, is emphasized in black. Bottom: Dante’s national forecasts for
the CDC targets in 2017 made with (w)ILI data from epidemic week 46 being the last available
data. Posterior mode and 95% HPD interval for each target are shown as an open point and
line, respectively. For these forecasts, the model is utilizing all data from the 2012 through 2016
flu seasons, and the first 7 weeks of data from the 2017 flu season (indicated by the solid black
points). The remainder of the 2017 flu season (shown in the bottom figure as a grey line) has yet
to be observed.
28
District of Columbia Puerto Rico Virgin Islands
20
15
10 2012
5
0
20
15
10 2013
5
0
20
15
10 2014
5
0
ILI
20
15
10 2015
5
0
20
15
10 2016
5
0
20
15
10 2017
5
0
1 10 20 30 1 10 20 30 1 10 20 30
Epi time
Figure S.6: The three missing observations among state-season combinations which are not com-
pletely unobserved are denoted by the red vertical lines in each subplot. The inner long dashed
blue lines show the minimum and maximum historic peak weeks from the 2010/2011 to 2017/2018
seasons, while the outer short dashed blue lines show the three week buffer about this historic
window. The validation ILI values are shown in black. For Puerto Rico in 2013 and the Virgin
Islands in 2014, the NA occurs within the buffered historic window and leads the seasonal targets
in those state-seasons to be un-scorable. In the District of Columbia in 2015, on the other hand,
the NA occurs outside the buffered historic window and so seasonal targets for that state-season
are scorable.
29
2012 2013 2014 2015 2016 2017
1.00
0.75
onset
0.50
0.25
0.00
1.00
0.75
pi
0.50
0.25
0.00
1.00
0.75
pt
0.50
0.25
0.00
1.00
0.75
skill
1wk
0.50
0.25
0.00
1.00
0.75
2wk
0.50
0.25
0.00
1.00
0.75
0.50 3wk
0.25
0.00
1.00
0.75
4wk
0.50
0.25
0.00
10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
epi time
Figure S.7: Dante’s forecast skill by prediction time for each target at the national level (black
line). The validation onset (peak timing) for each season is shown in the first (second/third) row
as a vertical blue line. Shifted national wILI data for each season multiplied by −1 is shown as
the thin blue line overlaid on the short-term target subplots (scaled so as to have the same range
as skill in each subplot) For n−week ahead forecasts −wILI is shifted back n weeks such that
−wILI for the week being predicted is overlaid on the week in which the prediction was made. It
is evident that short-term target skill is inversely related to wILI.
30
2012 2013 2014 2015 2016 2017
1.00
0.75
pi
0.50
0.25
0.00
1.00
0.75
pt
0.50
0.25
0.00
1.00
0.75
1wk
0.50
0.25
0.00
skill
1.00
0.75
2wk
0.50
0.25
0.00
1.00
0.75
3wk
0.50
0.25
0.00
1.00
0.75
4wk
0.50
0.25
0.00
10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
epi time
Figure S.8: Dante’s forecast skill by prediction time for each target in Alabama (black line). The
validation onset (peak timing) for each season is shown in the first (second/third) row as a vertical
blue line. Shifted ILI data for each season multiplied by −1 is shown as the thin blue line overlaid
on the short-term target subplots (scaled so as to have the same range as skill in each subplot) For
n−week ahead forecasts −ILI is shifted back n weeks such that −ILI for the week being predicted
is overlaid on the week in which the prediction was made. Short-term target skill tends to decrease
as ILI increases.
31
2012 2013 2014 2015 2016 2017
1.00
0.75
pi
0.50
0.25
0.00
1.00
0.75
pt
0.50
0.25
0.00
1.00
0.75
1wk
0.50
0.25
0.00
skill
1.00
0.75
2wk
0.50
0.25
0.00
1.00
0.75
3wk
0.50
0.25
0.00
1.00
0.75
4wk
0.50
0.25
0.00
10 20 30 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
epi time
Figure S.9: Dante’s forecast skill by prediction time for each target in Montana (black line). The
validation onset (peak timing) for each season is shown in the first (second/third) row as a vertical
blue line or lines. Shifted ILI data for each season multiplied by −1 is shown as the thin blue line
overlaid on the short-term target subplots (scaled so as to have the same range as skill in each
subplot) For n−week ahead forecasts −ILI is shifted back n weeks such that −ILI for the week
being predicted is overlaid on the week in which the prediction was made.
32
2012 2013 2014 2015 2016 2017
20
15
10 National
0
20
15
10 Alabama
(w)ILI
5
33
0
20
15
10 Montana
0
1 10 20 30 1 10 20 30 1 10 20 30 1 10 20 30 1 10 20 30 1 10 20 30
epi time
Figure S.10: National wILI data and state ILI data for the example skill plots discussed. This figure is intended as a reference for magnitude of
(w)ILI data because the -(w)ILI data in the skill plots has been scaled to lie in the same range as skill. Note that Alabama has exceptionally high
ILI relative to the national average, while Montana has particularly low ILI relative to the national average. The vertical dashed blue line marks the
peak week(s) for each region-season combination.
District of Columbia ●
Puerto Rico ●
South Carolina ●
Alaska ●
Kentucky ●
California ●
Connecticut ●
North Dakota ●
Arizona ●
HHS Region 9 ●
Nevada ●
Colorado ●
Maryland ●
New York ●
Washington ●
South Dakota ●
Utah ●
Maine ●
Massachusetts ●
Iowa ●
HHS Region 7 ●
Wyoming ●
Missouri ●
Illinois ●
HHS Region 6 ●
Louisiana ●
Arkansas ●
HHS Region 10 ●
Texas ●
New Hampshire ●
HHS Region 3 ●
HHS Region 2 ●
HHS Region 1 ●
New Mexico ●
Pennsylvania ●
New Jersey ●
National ●
Indiana ●
Oregon ●
HHS Region 4 ●
Tennessee ●
Alabama ●
Mississippi ●
West Virginia ●
HHS Region 5 ●
Idaho ●
North Carolina ●
HHS Region 8 ●
Minnesota ●
Kansas ●
Michigan ●
Rhode Island ●
Vermont ●
Oklahoma ●
Ohio ●
Georgia ●
Wisconsin ●
Hawaii ●
Virginia ●
Montana ●
Nebraska ●
Delaware ●
New York City ●
Virgin Islands ●
−2 −1 0 1 2 3
Dante Average MSE − DBM Average MSE
Figure S.11: Difference in mean squared error (MSE) between Dante and DBM, for all states,
regions, and nationally. Dante had lower MSE for the majority of geographic regions.
34
States States
●
●
●
● 6
9
●
●
6
4 ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●● ●
●
●
3 ●
● ●●
●
●
● ●
● 2
●●
●
●
●●
●
●
●
●
●
●
●
4
6
●
●
● ●
●
●
●
● ●
●
3 2 ●
● ●
●
●
● ●
●
●
● ●
●
●
●●
● ●
●
●
●
● ●
0 ●●
● ●
●
●
National National
●
●
6
●
●
●
●
9
●
●
4
6
●
●
3 2
●
●
● ●
●
●
●
●
● ●
●
●
●● ●
●
●
0 ●● ●
● ●●
●
●
●●
●
●
1wk 2wk 3wk 4wk onset pi pt 2012 2013 2014 2015 2016 2017
● Dante ● DBM
● ●
● Dante ● DBM
● ●
Figure S.12: (left) Average mean squared error (MSE) by scales and targets. PI and PT stand for
peak intensity and peak timing, respectively. Dante outperformed DBM for all scales and targets,
except for onset nationally and for peak intensity (PI) regionally and by state. (right) Average
forecast skill by scales and flu seasons. Dante outperformed DBM for the majority of scales and
flu seasons, except for 2013 nationally and for 2014 and 2016 by state.
35