Epidemic forecasting with mechanistic infection models

Welcome to the epifx documentation. This package uses a bootstrap particle filter to generate influenza epidemic forecasts from mechanistic infection models.

Installation

Ensure that pypfilt is already installed, then clone the epifx repository and install it a Virtual Environment (here, called venv-pypfilt):

# Activate the virtual environment.
source venv-pypfilt/bin/activate
# Clone the epifx repository.
git clone https://bitbucket.org/robmoss/epidemic-forecasting-for-python.git
cd epidemic-forecasting-for-python
# Install epifx in the virtual environment.
python setup.py install

If you are not using a virtual environment, and you don’t have permission to install epifx system-wide, you can install the package locally:

# Clone the epifx repository.
git clone https://bitbucket.org/robmoss/epidemic-forecasting-for-python.git
cd epidemic-forecasting-for-python
# Install epifx in the user's "site-packages" directory.
python setup.py install --user

User Documentation

Getting Started

This page assumes that you have already installed the epifx package, and shows how to generate forecasts using an SEIR model.

Simulation parameters

Default values for the particle filter parameters are provided:

epifx.default_params(p_exp, px_scale, model=epifx.SEIR)

The default simulation parameters.

Parameters:
  • p_exp – The daily probability of seeding an exposure event in a naive population.
  • px_scale – The ratio of particles to seeded exposures.
  • model – The infection model.

Note that epifx uses an independent pseudo-random number generator (PRNG) for sampling from model priors and adding stochastic noise to model parameters and flows (params['epifx']['rnd']) as distinct from the pypfilt PRNG instance (params['rnd']). This PNRG can be initialised with a custom seed:

epifx.init_prng(params, seed)

Initialise the epifx PRNG instance (params['epifx']['rnd']).

Parameters:
  • params – The simulation parameters.
  • seed – The PRNG seed; see the numpy.random.RandomState documentation for details.

The behaviour of epifx 0.1.x (one common PRNG instance) can be recovered by setting the following parameter:

params['epifx']['independent_prng'] = False

Temporal forcing

The force of infection can be subject to temporal forcing (e.g., by climatic factors such as absolute humidity, or social factors such as media coverage) as mediated by the model parameter \(\sigma\) (see Infection models). This requires the simulation parameters to include a function that maps datetime.datetime instances to scalar values:

epifx.daily_forcing(filename, date_fmt=u'%Y-%m-%d')

Return a temporal forcing look-up function, which should be stored in params['epifx']['forcing'] in order to enable temporal forcing.

Parameters:
  • filename – A file that contains two columns separated by whitespace, the column first being the date and the second being the force of the temporal modulation. Note that the first line of this file is assumed to contain column headings and will be ignored.
  • date_fmt – The format in which dates are stored.

Infection models

class epifx.SEIR

An SEIR compartment model for a single circulating influenza strain, under the assumption that recovered individuals are completely protected against reinfection.

\[\begin{split}\frac{dS}{dt} &= - \alpha \cdot S^\eta I \\[0.5em] \frac{dE}{dt} &= \alpha \cdot S^\eta I - \beta \cdot E \\[0.5em] \frac{dI}{dt} &= \beta \cdot E - \gamma I \\[0.5em] \frac{dR}{dt} &= \gamma I\end{split}\]
Parameter Meaning
\(\alpha\) Force of infection
\(\beta\) Incubation period (day -1)
\(\gamma\) Infectious period (day -1)
\(\eta\) Inhomogeneous social mixing
\(\sigma\) Temporal forcing coefficient

The force of infection can be subject to temporal forcing \(F(t)\), as mediated by \(\sigma\):

\[\alpha'(t) = \alpha \cdot (1 + \sigma \cdot F(t))\]

Note that this requires the forcing function \(F(t)\) to be defined in the simulation parameters (see the Temporal Forcing documentation).

static priors(params)

Return a dictionary of model parameter priors.

Parameters:params – Simulation parameters.

Observation models

Observation models comprise a log-likelihood function (log_llhd) that relates observations to particles, and an expectation function (expect) that relates particles to observations:

def log_llhd(params, op, obs, pr_indiv, curr, hist):
    """Calculate the log-likelihood of obtaining one or more observations
    from each particle.

    :param params: The simulation parameters.
    :param op: The observation model parameters.
    :param obs: The list of observations for the current time-step.
    :param pr_indiv: A 2 x N matrix (for N particles) of the probability
        of an individual being uninfected and infected, respectively.
    :param curr: The particle state vectors.
    :param hist: The particle state histories, indexed by observation period.
    """
    return ...

def expect(params, op, period, pr_inf, prev, curr):
    """Determine the expected value for a given infection probability.

    :param params: The observation model parameters.
    :param op: The observation model parameters.
    :param period: The duration of the observation period (in days).
    :param pr_inf: The probability of an individual becoming infected
        during the observation period.
    :param prev: The state vectors at the start of the observation period.
    :param curr: The state vectors at the end of the observation period.
    """
    return ...

Observation models are identified by unit, and must be stored in the parameters dictionary as shown below:

params = epifx.default_params(...)
obs_unit = 'Some identifier'
params['obs'][obs_unit] = {
    'log_llhd_fn': log_llhd,
    'expect_fn': expect,
    # Add observation model parameters here.
}

The epifx.obs module provides generic observation models for both count data and fractional data, as well as functions for reading observations from disk.

Forecast summaries

epifx.summary.make(params, all_obs, extra_tbls=None, pkgs=None, **kwargs)

A convenience function that collects all of the summary statistics defined in the pypfilt.summary and epifx.summary modules.

Parameters:
  • params – The simulation parameters.
  • all_obs – A list of all observations.
  • extra_tbls – A list of extra summary statistic tables to include.
  • pkgs – A dictionary of python modules whose versions should be recorded in the simulation metadata. By default, all of the modules recorded by pypfilt.summary.metadata are included, as is the epifx package itself.
  • **kwargs – Extra arguments to pass to pypfilt.summary.HDF5.
class epifx.summary.PrOutbreak(name='pr_epi')

Record the daily outbreak probability, defined as the sum of the weights of all particles in which an outbreak has been seeded.

Parameters:name – the name of the table in the output file.
class epifx.summary.PeakMonitor

Record epidemic peak forecasts, for use with other statistics.

peak_size = None

A dictionary that maps observation systems to the size of each particle’s peak with respect to that system: peak_size[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_date = None

A dictionary that maps observation systems to the date of each particle’s peak with respect to that system: peak_date[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_time = None

A dictionary that maps observation systems to the time of each particle’s peak with respect to that system, measured in (fractional) days from the start of the forecasting period: peak_time[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_weight = None

A dictionary that maps observation systems to the weight of each particle at the time that its peak occurs: peak_weight[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

expected_obs = None

The expected observation for each particle for the duration of the current simulation window.

Note that this is only valid for tables to inspect in each call to add_rows(), and not in a call to finished().

days_to(date)

Convert a date to the (fractional) number of days from the start of the forecasting period.

class epifx.summary.PeakForecastEnsembles(peak_monitor, name='peak_ensemble')

Record the weighted ensemble of peak size and time predictions for each forecasting simulation.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • name – the name of the table in the output file.
class epifx.summary.PeakForecastCIs(peak_monitor, probs=None, name='peak_cints')

Record fixed-probability central credible intervals for the peak size and time predictions.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • probs – an array of probabilities that define the size of each central credible interval. The default value is numpy.uint8([0, 50, 90, 95, 99, 100]).
  • name – the name of the table in the output file.
class epifx.summary.PeakSizeAccuracy(peak_monitor, name='peak_size_acc', toln=None)

Record the accuracy of the peak size predictions against multiple accuracy tolerances.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • name – the name of the table in the output file.
  • toln – The accuracy thresholds for peak size predictions, expressed as percentages of the true size. The default is np.array([10, 20, 25, 33]).
class epifx.summary.PeakTimeAccuracy(peak_monitor, name='peak_time_acc', toln=None)

Record the accuracy of the peak time predictions against multiple accuracy tolerances.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • name – the name of the table in the output file.
  • toln – The accuracy thresholds for peak time predictions, expressed as numbers of days. The default is np.array([7, 10, 14]).
class epifx.summary.ExpectedObs(peak_monitor, probs=None, name='forecasts')

Record fixed-probability central credible intervals for the expected observations.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • probs – an array of probabilities that define the size of each central credible interval. The default value is numpy.uint8([0, 50, 90, 95, 99, 100]).
  • name – the name of the table in the output file.

Generating forecasts

import datetime
import epifx

# Simulation parameters
p_exp = 1.0 / 36
px_scale = 100
params = epifx.default_params(p_exp, px_scale)
epifx.init_prng(params, seed=42)

# Simulate from the 1st of May to the 31st of October, 2015.
start = datetime.datetime(2015, 5, 1)
until = start + datetime.timedelta(days=183)

# Load the observations and define the observation model parameters.
obs_list = ...
params['obs'][obs_unit] = {...}

# Generate weekly forecasts for the first 9 weeks (and the start date).
fs_dates = [start + datetime.timedelta(days=week * 7)
            for week in range(10)]

# Summary statistics and output file.
summary = epifx.summary.make(params, obs_list))
out = "forecasts.hdf5"

# Generate the forecasts and save them to disk.
pypfilt.forecast(params, start, until, [obs_list], fs_dates, summary, out)

Comparing forecast outputs

Output files can be compared for equality, which is useful for ensuring that different systems produce identical results.

epifx.cmp.files(path1, path2, verbose=True, examine=None)

Compare two HDF5 files for identical simulation outputs.

Parameters:
  • path1 – The filename of the first HDF5 file.
  • path2 – The filename of the second HDF5 file.
  • verbose – Whether to report successful matches.
  • examine – The data groups to examine for equality; the default is to examine the simulation outputs ('/data') and ignore the associated metadata ('/meta').
Returns:

True if the files contain identical simulation outputs, otherwise False.

This functionality is also provided as a command-line script:

cmp-output --help
cmp-output file1.hdf5 file2.hdf5

Note that simulation outputs have been observed to differ depending on the installed versions of the NumPy and SciPy packages, due to changes in the random number generator.

Observation Models

A generic base class for negative binomial observation models is provided, which assumes that the model is parameterised by a background signal (\(bg_\mathrm{obs}\)), an observation probability (\(p_\mathrm{obs}\)), and a dispersion parameter (\(k\)).

class epifx.obs.NB(obs_unit, obs_period)

The base class for generic observation models that assume the relationship between observed numerical quantities \(y_t\) and disease incidence in particle \(x_t\) follows a negative binomial distribution with mean \(\mathbb{E}[x]\) and dispersion parameter \(k\).

\[\mathcal{L}(y_t \mid x_t) \sim NB(\mathbb{E}[x], k)\]
Parameters:
  • obs_unit – A descriptive name for the data.
  • obs_period – The observation period (in days).
define_params(params, bg_obs, pr_obs, disp)

Add observation model parameters to the simulation parameters.

Parameters:
  • bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
  • pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
  • disp – The dispersion parameter (\(k\)).
expect(params, op, period, pr_inf, prev, curr)

Determine the expected value for a given infection probability. The default implementation, for a population of size \(N\), is:

\[\mathbb{E}[x] = bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\]
Parameters:
  • params – The observation model parameters.
  • op – The observation model parameters.
  • period – The duration of the observation period (in days).
  • pr_inf – The probability of an individual becoming infected during the observation period (\(p_\mathrm{inf}\)).
  • prev – The state vectors at the start of the observation period.
  • curr – The state vectors at the end of the observation period.
log_llhd(params, op, obs, pr_indiv, curr, hist)

Calculate the log-likelihood of obtaining an observation from each particle.

Parameters:
  • params – The simulation parameters.
  • op – The observation model parameters.
  • obs – The observation that was made.
  • pr_indiv – A 2 x N matrix (for N particles) of the probability of an individual being uninfected and infected, respectively.
  • curr – The particle state vectors.
  • hist – The particle state histories, indexed by observation period.
Raises:

NotImplementedError – Derived classes must implement this method.

Two sub-classes are provided, which cater for count data and fractional data, respectively.

Count data

class epifx.obs.NBCounts(obs_unit, obs_period)

Generic negative binomial observation model for relating disease incidence to count data.

Parameters:
  • obs_unit – A descriptive name for the data.
  • obs_period – The observation period (in days).
from_file(filename, year=None, date_col='to', value_col='count')

Load count data from a space-delimited text file with column headers defined in the first line.

Parameters:
  • filename – The file to read.
  • year – Only returns observations for a specific year. The default behaviour is to return all recorded observations.
  • date_col – The name of the observation date column.
  • value_col – The name of the observation value column.
Returns:

A list of observations, ordered as per the original file.

log_llhd(params, op, obs, pr_indiv, curr, hist)

Calculate the log-likelihood of obtaining an observation from each particle.

Parameters:
  • params – The simulation parameters.
  • op – The observation model parameters.
  • obs – The observation that was made.
  • pr_indiv – A 2 x N matrix (for N particles) of the probability of an individual being uninfected and infected, respectively.
  • curr – The particle state vectors.
  • hist – The particle state histories, indexed by observation period.
Raises:

NotImplementedError – Derived classes must implement this method.

define_params(params, bg_obs, pr_obs, disp)

Add observation model parameters to the simulation parameters.

Parameters:
  • bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
  • pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
  • disp – The dispersion parameter (\(k\)).
expect(params, op, period, pr_inf, prev, curr)

Determine the expected value for a given infection probability. The default implementation, for a population of size \(N\), is:

\[\mathbb{E}[x] = bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\]
Parameters:
  • params – The observation model parameters.
  • op – The observation model parameters.
  • period – The duration of the observation period (in days).
  • pr_inf – The probability of an individual becoming infected during the observation period (\(p_\mathrm{inf}\)).
  • prev – The state vectors at the start of the observation period.
  • curr – The state vectors at the end of the observation period.

Fractional data

class epifx.obs.NBFractions(obs_unit, obs_period, percentages=False)

Generic negative binomial observation model for relating disease incidence to fractional and percentage data, where the denominator is known and reported.

Parameters:
  • obs_unit – A descriptive name for the data.
  • obs_period – The observation period (in days).
  • percentages – Indicates whether the data are fractional values (False, default) or percentages (True).
from_file(filename, year=None, date_col='to', value_col='value', denom_col='denominator')

Load fractional data from a space-delimited text file with column headers defined in the first line.

Parameters:
  • filename – The file to read.
  • year – Only returns observations for a specific year. The default behaviour is to return all recorded observations.
  • date_col – The name of the observation date column.
  • value_col – The name of the observation value column.
  • denom_col – The name of the observation denominator column.
Returns:

A list of observations, ordered as per the original file.

expect(params, op, period, pr_inf, prev, curr)

Determine the expected value for a given infection probability. The default implementation, for a population of size \(N\), is:

\[\mathbb{E}[x] = bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\]
Parameters:
  • params – The observation model parameters.
  • op – The observation model parameters.
  • period – The duration of the observation period (in days).
  • pr_inf – The probability of an individual becoming infected during the observation period (\(p_\mathrm{inf}\)).
  • prev – The state vectors at the start of the observation period.
  • curr – The state vectors at the end of the observation period.
log_llhd(params, op, obs, pr_indiv, curr, hist)

Calculate the log-likelihood of obtaining an observation from each particle.

Parameters:
  • params – The simulation parameters.
  • op – The observation model parameters.
  • obs – The observation that was made.
  • pr_indiv – A 2 x N matrix (for N particles) of the probability of an individual being uninfected and infected, respectively.
  • curr – The particle state vectors.
  • hist – The particle state histories, indexed by observation period.
Raises:

NotImplementedError – Derived classes must implement this method.

define_params(params, bg_obs, pr_obs, disp)

Add observation model parameters to the simulation parameters.

Parameters:
  • bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
  • pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
  • disp – The dispersion parameter (\(k\)).

Reading observations from disk

The from_file() methods provided by NBCounts and NBFractions both use read_table() to read specific columns from data files. This function is intended to be a flexible wrapper around numpy.loadtxt, and should be sufficient for reading almost any whitespace-delimited data file with named columns.

epifx.obs.read_table(filename, columns, date_fmt=None)

Load data from a space-delimited text file with column headers defined in the first line.

Parameters:
  • filename – The file to read.
  • columns – The columns to read, represented as a list of (name, type) tuples; type should either be a built-in NumPy scalar type, or either datetime.date or datetime.datetime, in which case string values will be automatically converted to datetime.datetime objects by datetime_conv().
  • date_fmt – A format string for parsing date columns; see datetime_conv() for details and the default format string.
Returns:

A NumPy structured array.

Example:

The code below demonstrates how to read observations from a file that includes columns for the year, the observation date, the observed value, and free-text metadata (up to 20 characters in length).

import datetime
import numpy as np
import epifx.obs
columns = [('year', np.int32), ('date', datetime.datetime),
           ('count', np.int32), ('metadata', 'S20')]
df = epifx.obs.read_table('my-data-file.ssv', columns,
                          date_fmt='%Y-%m-%d')
epifx.obs.datetime_conv(text, fmt='%Y-%m-%d')

Convert date strings to datetime.datetime instances. This is a convenience function intended for use with, e.g., numpy.loadtxt.

Parameters:
  • test – A string containing a date or date-time value.
  • fmt – A format string that defines the textual representation. See the Python strptime documentation for details.