Worked Example¶
This page assumes that you have already installed the
epifx
package, and shows how to generate forecasts using an SEIR model and
the epifx-forecast command.
Note
You can replace the SEIR model with any other model, as long as it provides the necessary methods.
Note
You can download the completed files at the bottom of this page:
This page also assumes that you have a time series that characterises disease activity. For example:
year date cases
2017 2017-01-08 3
2017 2017-01-15 4
2017 2017-01-22 2
...
Major decisions¶
- Choose an infection model; here we will use the
SEIR model provided by
epifx
. - Choose an appropriate observation model; here we will
use the
PopnCounts
observation model, which relates disease incidence to count data where the denominator is assumed or known to be the population size. - Define the observation model parameters; in this case, they are:
- bg_obs – The background signal in the data.
- pr_obs – The probability of observing an infected individual.
- disp – The dispersion parameter.
- bg_var – The variance of the background signal.
- Define the simulation parameters, such as the population size and the number of particles.
Note
All of these settings must be defined in epifxlocns.py
(located in the
working directory) in order to use the epifx
commands.
Defining the forecast settings¶
Each combination of infection model, observation model, and parameters, is
referred to as a “location” and is identified by a unique name.
The epifxlocns.py
file will define all of these settings, for every
location that you define.
Here is a minimal example that defines no locations:
"""Local settings for epidemic forecasting."""
def local_settings(locn_id=None):
"""Return the forecast settings for a particular location."""
# This dictionary will contain all of the forecast settings.
locations = {}
# Ensure each location includes its own unique ID.
for locn in locations:
locations[locn]['id'] = locn
if locn_id is None:
# Return a list of forecast locations.
return(locations.keys())
elif locn_id in locations:
# Return the settings for this location.
return(locations[locn_id])
else:
raise ValueError("Invalid location '{}'".format(locn_id))
The rest of this page will demonstrate how to expand on the above template in order to run epidemic forecasts against a single data file.
Creating a new location¶
This is as simple as adding a new key to the locations
dictionary, which
defines the unique name for this location:
locations = {
'some-city': {}
}
Note
This unique name should only contain alpha-numeric characters and hyphens. It should not contain any spaces.
As shown above, the settings for a forecast location are collected into a dictionary. In the example above, this dictionary is empty. We will now demonstrate how to define all of the necessary forecast settings for this location within this dictionary.
Basic settings¶
Each location should have a name — an arbitrary string that may contain spaces and will be used as a title for forecast outputs.
Other common settings (such as the output directory) and fixed model parameters (such as the population size) may also be defined here. The next section will demonstrate how these values will be added to the simulation parameters.
locations = {
'some-city': {
'name': 'Some City',
'popn': 1000000,
'out_dir': '.',
'tmp_dir': './tmp',
'get_params': get_locn_params,
}
}
Note
Simulation parameters are not provided directly.
Instead, a function that returns the appropriate parameters is stored under
the 'get_params'
key.
See the following section for an example of such a function.
Simulation parameters¶
Simulation parameters are defined by a separate function, which should also be
contained in the file epifxlocns.py
.
This function takes one argument: the settings dictionary for the forecast
location.
import epifx
import pypfilt
import errno
import os.path
def get_locn_params(locn_settings):
"""Return the parameters dictionary for a forecast location."""
model = epifx.SEIR()
time = pypfilt.Datetime()
popn_size = locn_settings['popn']
# The number of particles.
px_count = 2000
# The seed for the pseudo-random number generator.
prng_seed = 3001
params = epifx.default_params(px_count, model, time, popn_size, prng_seed)
# Customise the default parameters as desired.
params['resample']['threshold'] = 0.25
params['resample']['regularisation'] = True
params['epifx']['stoch'] = False
# Set the output and temporary directories.
out_dir = locn_settings['out_dir']
tmp_dir = locn_settings['tmp_dir']
for reqd_dir in [out_dir, tmp_dir]:
if not os.path.isdir(reqd_dir):
# Create the directory (and missing parents) with mode -rwxr-x---.
try:
os.makedirs(reqd_dir, mode=0o750)
except OSError as e:
# Potential race condition with multiple script instances.
if e.errno != errno.EEXIST:
print("Warning: could not create {}".format(reqd_dir))
print(e)
params['out_dir'] = out_dir
params['tmp_dir'] = tmp_dir
return params
Note
In this example, the population size is determined by inspecting the
settings dictionary for the location: locn_settings['popn']
.
Note that this was defined in the previous section (“Basic settings”).
The same is also true for the output and temporary directories.
Adjusting model priors¶
You probably also want to modify the default model priors.
def get_locn_params(locn_settings):
"""Return the parameters dictionary for a forecast location."""
# Other lines omitted ...
# Restrict R0 to values between 1.3 and 1.5.
params['param_min'][4] = 1.3
params['param_max'][4] = 1.5
# Keep eta fixed at 1 (i.e., enforce homogeneous mixing).
params['param_min'][7] = 1.0
params['param_max'][7] = 1.0
# Introduce the first infection after 12-16 weeks.
params['param_min'][9] = 12 * 7
params['param_max'][9] = 16 * 7
# Update the model priors.
params['prior'] = params['model'].priors(params)
# Other lines omitted ...
return params
Observation model¶
The observation model is also stored in the location settings:
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'obs_model': epifx.obs.PopnCounts('Weekly Cases', 7),
}
}
Observations are read from an external file using the observation model’s
from_file()
method.
The file name, and additional arguments for the from_file()
method, are
defined as follows:
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'obs_file': 'case-counts.ssv',
'from_file_args': {
'date_col': 'date',
'value_col': 'cases',
},
}
}
Observation model parameters for the forecasts are defined in a dictionary
stored under the 'forecast'
key.
Each parameter is identified by name, and values can take the form of (a) a
single numeric value; (b) a list of numeric values; or (c) a dictionary that
maps years to a single numerical value or to a list of numeric values.
All of these forms are shown in the example below:
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'forecast': {
'bg_obs': {2016: [6, 8], 2017: 8},
'bg_var': 6,
'pr_obs': [0.01, 0.02],
'disp': 100,
},
}
}
Finally, the format strings that are used to construct each output file name
must be defined.
This has two parts: the 'om_name'
dictionary defines a string to use for
each observation parameter name, and the 'om_format'
dictionary
defines the numeric format for each observation parameter value:
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'om_format': {
'bg_obs': '02.0f',
'bg_var': '02.0f',
'pr_obs': '0.2f',
'disp': '03.0f',
},
'om_name': {
'bg_obs': 'bg',
'bg_var': 'bgvar',
'pr_obs': 'pr',
'disp': 'disp',
},
}
}
Note
Given the above definitions, a forecast for all data up to 2017-01-08 and
using an observation probability of 0.02
would produce the following
output file:
some-city-2017-01-08-bg-08-bgvar-06-disp-100-pr-0.02.hdf5
Simulation period¶
The default simulation period for forecasts is March to October
(inclusive).
These defaults can be overridden in the 'extra_args'
dictionary.
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'extra_args': {
'start': get_start_date,
'until': get_until_date,
},
}
}
Note that these dates are defined by separate functions, which should also be
contained in the file epifxlocns.py
.
These functions takes one argument: the forecasting year.
import datetime
def get_start_date(year):
"""Begin forecasts on the first of January."""
return datetime.datetime(year, 1, 1)
def get_until_date(year):
"""End forecasts on the last day of December."""
return datetime.datetime(year, 12, 31)
Note
These functions must return datetime.datetime
instances (i.e.,
not datetime.date
instances) if you are using the
pypfilt.Datetime
time scale.
Forecasting dates¶
By default, the epifx-forecast
command will generate a forecast for the
most recent Sunday (we have typically used weeks ending on Sunday as our
observation interval).
This can be overridden in the 'extra_args'
dictionary.
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'extra_args': {
# Other lines omitted ...
'live_fs_dates': get_live_fs_dates,
},
}
}
Note that the forecasting date(s) are defined by a separate function, which
should also be contained in the file epifxlocns.py
.
This function takes three arguments:
- The forecasting year;
- The list of all available observations; and
- The date from which to begin forecasting (may be
None
, meaning that only the most recent forecasting date should be returned).
def get_live_fs_dates(year, all_obs, fs_from):
"""Generate a forecast at the time of each observation."""
fs_dates = sorted(o['date'] for o in all_obs)
if fs_from is None:
# Return only the most recent forecasting date.
# Important: *must* be a list.
return [fs_dates[-1]]
else:
return [d for d in fs_dates if d >= fs_from]
Note
This function must return a list of datetime.datetime
instances
(i.e., not datetime.date
instances) if you are using the
pypfilt.Datetime
time scale.
Summary tables¶
You may wish to override the collection of summary tables that are generated
for each forecast.
This can be overridden in the 'extra_args'
dictionary.
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'extra_args': {
# Other lines omitted ...
'make_summary': make_summary,
},
}
}
Note that this is defined by a separate function, which should also be
contained in the file epifxlocns.py
.
This function must accept the same arguments as epifx.summary.make()
.
import epifx.summary
import pypfilt.summary
def make_summary(params, all_obs, **kwargs):
"""Return a summary object for the forecast simulations."""
meta_builder = epifx.summary.Metadata()
meta = meta_builder.build(params)
# Create a summary object with no tables, can now choose what tables
# to add. Alternatively, can call epifx.summary.make().
summary = pypfilt.summary.HDF5(params, all_obs, meta=meta, **kwargs)
peaks = epifx.summary.PeakMonitor()
# The summary tables to add.
tables = [
pypfilt.summary.ModelCIs(),
epifx.summary.ObsLikelihood(),
# The following tables are needed for the interactive forecast plots.
epifx.summary.PredictiveCIs(peaks),
epifx.summary.PeakForecastCIs(peaks),
]
# Also add a summary table for each set of observations.
# These tables are needed for the interactive forecast plots.
for obs_unit in params['obs']:
tables.append(epifx.summary.Obs(obs_unit))
summary.add_tables(*tables)
return summary
Forecast plots¶
Both the epifx-forecast
and epifx-summary
commands will produce JSON
files by default (in addition to the standard HDF5 output files).
The results stored in these JSON files can then be plotted and viewed
interactively in a web browser.
Here we define:
- The directory that will contain the JSON forecast outputs (
'json_dir'
); - The y-axis label (
'obs_axis_lbl'
); - The decimal precision of the y-axis ticks (
'obs_axis_prec'
); - The tool-tip label for individual observations (
'obs_datum_lbl'
); and - The tool-tip decimal precision (
'obs_datum_prec'
).
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'json_dir': './www/data',
'obs_axis_lbl': 'Weekly Cases',
'obs_axis_prec': 0,
'obs_datum_lbl': 'cases/week',
'obs_datum_prec': 0,
}
}
Running the forecasts¶
First, ensure that the epifxlocns.py
is correctly detected and the
forecast location is identified by the epifx-locn
command:
$ epifx-locn
some-city
$
Then use the epifx-forecast
command to generate forecasts for all
combinations of the observation model parameter values:
$ epifx-forecast --year 2017 --from 2017-03-10 some-city
INFO:epifx.cmd.forecast:Forecasting from 2017-03-12 00:00:00 ...
Visualising the forecasts¶
You can use the epifx-template
command to generate the necessary files to
view the interactive forecast plots in your browser.
$ epifx-template
You will also need to download the d3
JavaScript library.
$ wget https://github.com/d3/d3/releases/download/v3.5.17/d3.zip
$ unzip d3.zip
$ mv d3.min.js www/
Finally, edit www/index.html
and add each of the forecast JSON files
to the drop-down list, replacing the template values.
<div class="params">
<form id="data-file-form">
<label for="data_file">Forecast:</label>
<select id="data_file">
<option value="data/some-city-2017-bg-08-bgvar-06-disp-100-pr-0.01.json" selected>pr_obs = 0.01</option>
<option value="data/some-city-2017-bg-08-bgvar-06-disp-100-pr-0.02.json">pr_obs = 0.02</option>
</select>
</form>
</div>
Then open www/index.html
in your browser.
Retrospective forecasts¶
The epifx-scan command is similar to epifx-forecast
, except
that it generates forecasts from retrospective data.
It is intended to explore broad ranges of observation model parameter values,
so that appropriate values for these parameters can be identified.
Note
The same data file (stored under the 'obs_file'
key) will be used for
live forecasts (epifx-forecast
) and for retrospective forecasts
(epifx-scan
).
Observation model parameters¶
Observation model parameters for the retrospective forecasts are defined in a
dictionary stored under the 'scan'
key.
Each parameter is identified by name, and values can take the form of (a) a
single numeric value; (b) a list of numeric values; or (c) a dictionary that
maps years to a single numerical value or to a list of numeric values.
The list of years for which retrospective forecasts should be generated is
stored under the 'scan_years'
key.
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'scan': {
'bg_obs': [6, 8],
'bg_var': 6,
'pr_obs': [0.005, 0.010, 0.015, 0.020, 0.025],
'disp': [10, 100, 1000],
},
'scan_years': [2015, 2016],
}
}
Retrospective forecasting dates¶
By default, the epifx-scan
command will generate forecasts from mid-March
until the observed epidemic peak in each year.
This can be overridden in the 'extra_args'
dictionary.
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'extra_args': {
# Other lines omitted ...
'scan_fs_dates': get_scan_fs_dates,
},
}
}
Note that the retrospective forecasting date(s) are defined by a separate
function, which should also be contained in the file epifxlocns.py
.
This function takes only two arguments (i.e., not the same as the
'live_fs_dates'
function):
- The forecasting year; and
- The list of all available observations.
def get_scan_fs_dates(year, all_obs):
"""Generate a forecast at the time of each observation."""
return sorted(o['date'] for o in all_obs)
Note
This function must return a list of datetime.datetime
instances
(i.e., not datetime.date
instances) if you are using the
pypfilt.Datetime
time scale.
Running the retrospective forecasts¶
Use the epifx-scan
command to generate retrospective forecasts for all
combinations of the observation model parameter values:
$ epifx-scan some-city
Download the complete files¶
epifxlocns.py
case-counts.ssv
www/index.html
- www/d3.js version 3.5.17.
Note
Run epifx-template
to generate www/plot.js
and
www/style.css
, as no modifications to these files are required.