Worked Example

This page assumes that you have already installed the epifx package, and shows how to generate forecasts using an SEIR model and the epifx-forecast command.

Note

You can replace the SEIR model with any other model, as long as it provides the necessary methods.

Note

You can download the completed files at the bottom of this page:

This page also assumes that you have a time series that characterises disease activity. For example:

year date       cases
2017 2017-01-08 3
2017 2017-01-15 4
2017 2017-01-22 2
...

Major decisions

  1. Choose an infection model; here we will use the SEIR model provided by epifx.
  2. Choose an appropriate observation model; here we will use the PopnCounts observation model, which relates disease incidence to count data where the denominator is assumed or known to be the population size.
  3. Define the observation model parameters; in this case, they are:
    • bg_obs – The background signal in the data.
    • pr_obs – The probability of observing an infected individual.
    • disp – The dispersion parameter.
    • bg_var – The variance of the background signal.
  4. Define the simulation parameters, such as the population size and the number of particles.

Note

All of these settings must be defined in epifxlocns.py (located in the working directory) in order to use the epifx commands.

Defining the forecast settings

Each combination of infection model, observation model, and parameters, is referred to as a “location” and is identified by a unique name. The epifxlocns.py file will define all of these settings, for every location that you define.

Here is a minimal example that defines no locations:

"""Local settings for epidemic forecasting."""

def local_settings(locn_id=None):
    """Return the forecast settings for a particular location."""
    # This dictionary will contain all of the forecast settings.
    locations = {}

    # Ensure each location includes its own unique ID.
    for locn in locations:
        locations[locn]['id'] = locn

    if locn_id is None:
        # Return a list of forecast locations.
        return(locations.keys())
    elif locn_id in locations:
        # Return the settings for this location.
        return(locations[locn_id])
    else:
        raise ValueError("Invalid location '{}'".format(locn_id))

The rest of this page will demonstrate how to expand on the above template in order to run epidemic forecasts against a single data file.

Creating a new location

This is as simple as adding a new key to the locations dictionary, which defines the unique name for this location:

locations = {
    'some-city': {}
}

Note

This unique name should only contain alpha-numeric characters and hyphens. It should not contain any spaces.

As shown above, the settings for a forecast location are collected into a dictionary. In the example above, this dictionary is empty. We will now demonstrate how to define all of the necessary forecast settings for this location within this dictionary.

Basic settings

Each location should have a name — an arbitrary string that may contain spaces and will be used as a title for forecast outputs.

Other common settings (such as the output directory) and fixed model parameters (such as the population size) may also be defined here. The next section will demonstrate how these values will be added to the simulation parameters.

locations = {
    'some-city': {
        'name': 'Some City',
        'popn': 1000000,
        'out_dir': '.',
        'tmp_dir': './tmp',
        'get_params': get_locn_params,
    }
}

Note

Simulation parameters are not provided directly. Instead, a function that returns the appropriate parameters is stored under the 'get_params' key.

See the following section for an example of such a function.

Simulation parameters

Simulation parameters are defined by a separate function, which should also be contained in the file epifxlocns.py. This function takes one argument: the settings dictionary for the forecast location.

import epifx
import pypfilt

import errno
import os.path

def get_locn_params(locn_settings):
    """Return the parameters dictionary for a forecast location."""
    model = epifx.SEIR()
    time = pypfilt.Datetime()
    popn_size = locn_settings['popn']
    # The number of particles.
    px_count = 2000
    # The seed for the pseudo-random number generator.
    prng_seed = 3001

    params = epifx.default_params(px_count, model, time, popn_size, prng_seed)

    # Customise the default parameters as desired.
    params['resample']['threshold'] = 0.25
    params['resample']['regularisation'] = True
    params['epifx']['stoch'] = False

    # Set the output and temporary directories.
    out_dir = locn_settings['out_dir']
    tmp_dir = locn_settings['tmp_dir']
    for reqd_dir in [out_dir, tmp_dir]:
        if not os.path.isdir(reqd_dir):
            # Create the directory (and missing parents) with mode -rwxr-x---.
            try:
                os.makedirs(reqd_dir, mode=0o750)
            except OSError as e:
                # Potential race condition with multiple script instances.
                if e.errno != errno.EEXIST:
                    print("Warning: could not create {}".format(reqd_dir))
                    print(e)

    params['out_dir'] = out_dir
    params['tmp_dir'] = tmp_dir

    return params

Note

In this example, the population size is determined by inspecting the settings dictionary for the location: locn_settings['popn']. Note that this was defined in the previous section (“Basic settings”). The same is also true for the output and temporary directories.

Adjusting model priors

You probably also want to modify the default model priors.

def get_locn_params(locn_settings):
    """Return the parameters dictionary for a forecast location."""

    # Other lines omitted ...

    # Restrict R0 to values between 1.3 and 1.5.
    params['param_min'][4] = 1.3
    params['param_max'][4] = 1.5
    # Keep eta fixed at 1 (i.e., enforce homogeneous mixing).
    params['param_min'][7] = 1.0
    params['param_max'][7] = 1.0
    # Introduce the first infection after 12-16 weeks.
    params['param_min'][9] = 12 * 7
    params['param_max'][9] = 16 * 7
    # Update the model priors.
    params['prior'] = params['model'].priors(params)

    # Other lines omitted ...

    return params

Observation model

The observation model is also stored in the location settings:

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'obs_model': epifx.obs.PopnCounts('Weekly Cases', 7),
    }
}

Observations are read from an external file using the observation model’s from_file() method. The file name, and additional arguments for the from_file() method, are defined as follows:

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'obs_file': 'case-counts.ssv',
        'from_file_args': {
            'date_col': 'date',
            'value_col': 'cases',
        },
    }
}

Observation model parameters for the forecasts are defined in a dictionary stored under the 'forecast' key. Each parameter is identified by name, and values can take the form of (a) a single numeric value; (b) a list of numeric values; or (c) a dictionary that maps years to a single numerical value or to a list of numeric values.

All of these forms are shown in the example below:

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'forecast': {
            'bg_obs': {2016: [6, 8], 2017: 8},
            'bg_var': 6,
            'pr_obs': [0.01, 0.02],
            'disp': 100,
        },
    }
}

Finally, the format strings that are used to construct each output file name must be defined. This has two parts: the 'om_name' dictionary defines a string to use for each observation parameter name, and the 'om_format' dictionary defines the numeric format for each observation parameter value:

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'om_format': {
            'bg_obs': '02.0f',
            'bg_var': '02.0f',
            'pr_obs': '0.2f',
            'disp': '03.0f',
        },
        'om_name': {
            'bg_obs': 'bg',
            'bg_var': 'bgvar',
            'pr_obs': 'pr',
            'disp': 'disp',
        },
    }
}

Note

Given the above definitions, a forecast for all data up to 2017-01-08 and using an observation probability of 0.02 would produce the following output file:

some-city-2017-01-08-bg-08-bgvar-06-disp-100-pr-0.02.hdf5

Simulation period

The default simulation period for forecasts is March to October (inclusive). These defaults can be overridden in the 'extra_args' dictionary.

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'extra_args': {
            'start': get_start_date,
            'until': get_until_date,
        },
    }
}

Note that these dates are defined by separate functions, which should also be contained in the file epifxlocns.py. These functions takes one argument: the forecasting year.

import datetime

def get_start_date(year):
    """Begin forecasts on the first of January."""
    return datetime.datetime(year, 1, 1)

def get_until_date(year):
    """End forecasts on the last day of December."""
    return datetime.datetime(year, 12, 31)

Note

These functions must return datetime.datetime instances (i.e., not datetime.date instances) if you are using the pypfilt.Datetime time scale.

Forecasting dates

By default, the epifx-forecast command will generate a forecast for the most recent Sunday (we have typically used weeks ending on Sunday as our observation interval). This can be overridden in the 'extra_args' dictionary.

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'extra_args': {
            # Other lines omitted ...
            'live_fs_dates': get_live_fs_dates,
        },
    }
}

Note that the forecasting date(s) are defined by a separate function, which should also be contained in the file epifxlocns.py. This function takes three arguments:

  • The forecasting year;
  • The list of all available observations; and
  • The date from which to begin forecasting (may be None, meaning that only the most recent forecasting date should be returned).
def get_live_fs_dates(year, all_obs, fs_from):
    """Generate a forecast at the time of each observation."""
    fs_dates = sorted(o['date'] for o in all_obs)
    if fs_from is None:
        # Return only the most recent forecasting date.
        # Important: *must* be a list.
        return [fs_dates[-1]]
    else:
        return [d for d in fs_dates if d >= fs_from]

Note

This function must return a list of datetime.datetime instances (i.e., not datetime.date instances) if you are using the pypfilt.Datetime time scale.

Summary tables

You may wish to override the collection of summary tables that are generated for each forecast. This can be overridden in the 'extra_args' dictionary.

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'extra_args': {
            # Other lines omitted ...
            'make_summary': make_summary,
        },
    }
}

Note that this is defined by a separate function, which should also be contained in the file epifxlocns.py. This function must accept the same arguments as epifx.summary.make().

import epifx.summary
import pypfilt.summary

def make_summary(params, all_obs, **kwargs):
    """Return a summary object for the forecast simulations."""
    meta_builder = epifx.summary.Metadata()
    meta = meta_builder.build(params)

    # Create a summary object with no tables, can now choose what tables
    # to add. Alternatively, can call epifx.summary.make().
    summary = pypfilt.summary.HDF5(params, all_obs, meta=meta, **kwargs)

    peaks = epifx.summary.PeakMonitor()
    # The summary tables to add.
    tables = [
        pypfilt.summary.ModelCIs(),
        epifx.summary.ObsLikelihood(),
        # The following tables are needed for the interactive forecast plots.
        epifx.summary.PredictiveCIs(peaks),
        epifx.summary.PeakForecastCIs(peaks),
    ]
    # Also add a summary table for each set of observations.
    # These tables are needed for the interactive forecast plots.
    for obs_unit in params['obs']:
        tables.append(epifx.summary.Obs(obs_unit))

    summary.add_tables(*tables)

    return summary

Forecast plots

Both the epifx-forecast and epifx-summary commands will produce JSON files by default (in addition to the standard HDF5 output files). The results stored in these JSON files can then be plotted and viewed interactively in a web browser.

Here we define:

  • The directory that will contain the JSON forecast outputs ('json_dir');
  • The y-axis label ('obs_axis_lbl');
  • The decimal precision of the y-axis ticks ('obs_axis_prec');
  • The tool-tip label for individual observations ('obs_datum_lbl'); and
  • The tool-tip decimal precision ('obs_datum_prec').
locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'json_dir': './www/data',
        'obs_axis_lbl': 'Weekly Cases',
        'obs_axis_prec': 0,
        'obs_datum_lbl': 'cases/week',
        'obs_datum_prec': 0,
    }
}

Running the forecasts

First, ensure that the epifxlocns.py is correctly detected and the forecast location is identified by the epifx-locn command:

$ epifx-locn
some-city
$

Then use the epifx-forecast command to generate forecasts for all combinations of the observation model parameter values:

$ epifx-forecast --year 2017 --from 2017-03-10 some-city
INFO:epifx.cmd.forecast:Forecasting from 2017-03-12 00:00:00 ...

Visualising the forecasts

You can use the epifx-template command to generate the necessary files to view the interactive forecast plots in your browser.

$ epifx-template

You will also need to download the d3 JavaScript library.

$ wget https://github.com/d3/d3/releases/download/v3.5.17/d3.zip
$ unzip d3.zip
$ mv d3.min.js www/

Finally, edit www/index.html and add each of the forecast JSON files to the drop-down list, replacing the template values.

<div class="params">
  <form id="data-file-form">
    <label for="data_file">Forecast:</label>
    <select id="data_file">
      <option value="data/some-city-2017-bg-08-bgvar-06-disp-100-pr-0.01.json" selected>pr_obs = 0.01</option>
      <option value="data/some-city-2017-bg-08-bgvar-06-disp-100-pr-0.02.json">pr_obs = 0.02</option>
    </select>
  </form>
</div>

Then open www/index.html in your browser.

Retrospective forecasts

The epifx-scan command is similar to epifx-forecast, except that it generates forecasts from retrospective data. It is intended to explore broad ranges of observation model parameter values, so that appropriate values for these parameters can be identified.

Note

The same data file (stored under the 'obs_file' key) will be used for live forecasts (epifx-forecast) and for retrospective forecasts (epifx-scan).

Observation model parameters

Observation model parameters for the retrospective forecasts are defined in a dictionary stored under the 'scan' key. Each parameter is identified by name, and values can take the form of (a) a single numeric value; (b) a list of numeric values; or (c) a dictionary that maps years to a single numerical value or to a list of numeric values.

The list of years for which retrospective forecasts should be generated is stored under the 'scan_years' key.

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'scan': {
            'bg_obs': [6, 8],
            'bg_var': 6,
            'pr_obs': [0.005, 0.010, 0.015, 0.020, 0.025],
            'disp': [10, 100, 1000],
        },
        'scan_years': [2015, 2016],
    }
}

Retrospective forecasting dates

By default, the epifx-scan command will generate forecasts from mid-March until the observed epidemic peak in each year. This can be overridden in the 'extra_args' dictionary.

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'extra_args': {
            # Other lines omitted ...
            'scan_fs_dates': get_scan_fs_dates,
        },
    }
}

Note that the retrospective forecasting date(s) are defined by a separate function, which should also be contained in the file epifxlocns.py. This function takes only two arguments (i.e., not the same as the 'live_fs_dates' function):

  • The forecasting year; and
  • The list of all available observations.
def get_scan_fs_dates(year, all_obs):
    """Generate a forecast at the time of each observation."""
    return sorted(o['date'] for o in all_obs)

Note

This function must return a list of datetime.datetime instances (i.e., not datetime.date instances) if you are using the pypfilt.Datetime time scale.

Running the retrospective forecasts

Use the epifx-scan command to generate retrospective forecasts for all combinations of the observation model parameter values:

$ epifx-scan some-city

Download the complete files

Note

Run epifx-template to generate www/plot.js and www/style.css, as no modifications to these files are required.