Getting Started¶

This page assumes that you have already installed the epifx package, and shows how to generate forecasts using an SEIR model.

Simulation parameters¶

Default values for the particle filter parameters are provided:

epifx.default_params(p_exp, px_scale, model=epifx.SEIR)¶

The default simulation parameters.

Parameters:	p_exp – The daily probability of seeding an exposure event in a naive population. px_scale – The ratio of particles to seeded exposures. model – The infection model.

Note that epifx uses an independent pseudo-random number generator (PRNG) for sampling from model priors and adding stochastic noise to model parameters and flows (params['epifx']['rnd']) as distinct from the pypfilt PRNG instance (params['rnd']). This PNRG can be initialised with a custom seed:

epifx.init_prng(params, seed)¶

Initialise the epifx PRNG instance (params['epifx']['rnd']).

Parameters:	params – The simulation parameters. seed – The PRNG seed; see the `numpy.random.RandomState` documentation for details.

The behaviour of epifx 0.1.x (one common PRNG instance) can be recovered by setting the following parameter:

params['epifx']['independent_prng'] = False

Temporal forcing¶

The force of infection can be subject to temporal forcing (e.g., by climatic factors such as absolute humidity, or social factors such as media coverage) as mediated by the model parameter \(\sigma\) (see Infection models). This requires the simulation parameters to include a function that maps datetime.datetime instances to scalar values:

epifx.daily_forcing(filename, date_fmt=u'%Y-%m-%d')¶

Return a temporal forcing look-up function, which should be stored in params['epifx']['forcing'] in order to enable temporal forcing.

Parameters:	filename – A file that contains two columns separated by whitespace, the column first being the date and the second being the force of the temporal modulation. Note that the first line of this file is assumed to contain column headings and will be ignored. date_fmt – The format in which dates are stored.

Infection models¶

class epifx.SEIR¶

An SEIR compartment model for a single circulating influenza strain, under the assumption that recovered individuals are completely protected against reinfection.

\[\begin{split}\frac{dS}{dt} &= - \alpha \cdot S^\eta I \\[0.5em] \frac{dE}{dt} &= \alpha \cdot S^\eta I - \beta \cdot E \\[0.5em] \frac{dI}{dt} &= \beta \cdot E - \gamma I \\[0.5em] \frac{dR}{dt} &= \gamma I\end{split}\]

Parameter	Meaning
\(\alpha\)	Force of infection
\(\beta\)	Incubation period (day ^-1)
\(\gamma\)	Infectious period (day ^-1)
\(\eta\)	Inhomogeneous social mixing
\(\sigma\)	Temporal forcing coefficient

The force of infection can be subject to temporal forcing \(F(t)\), as mediated by \(\sigma\):

\[\alpha'(t) = \alpha \cdot (1 + \sigma \cdot F(t))\]

Note that this requires the forcing function \(F(t)\) to be defined in the simulation parameters (see the Temporal Forcing documentation).

static priors(params)¶

Return a dictionary of model parameter priors.

Parameters:	params – Simulation parameters.

Observation models¶

Observation models comprise a log-likelihood function (log_llhd) that relates observations to particles, and an expectation function (expect) that relates particles to observations:

def log_llhd(params, op, obs, pr_indiv, curr, hist):
    """Calculate the log-likelihood of obtaining one or more observations
    from each particle.

    :param params: The simulation parameters.
    :param op: The observation model parameters.
    :param obs: The list of observations for the current time-step.
    :param pr_indiv: A 2 x N matrix (for N particles) of the probability
        of an individual being uninfected and infected, respectively.
    :param curr: The particle state vectors.
    :param hist: The particle state histories, indexed by observation period.
    """
    return ...

def expect(params, op, period, pr_inf, prev, curr):
    """Determine the expected value for a given infection probability.

    :param params: The observation model parameters.
    :param op: The observation model parameters.
    :param period: The duration of the observation period (in days).
    :param pr_inf: The probability of an individual becoming infected
        during the observation period.
    :param prev: The state vectors at the start of the observation period.
    :param curr: The state vectors at the end of the observation period.
    """
    return ...

Observation models are identified by unit, and must be stored in the parameters dictionary as shown below:

params = epifx.default_params(...)
obs_unit = 'Some identifier'
params['obs'][obs_unit] = {
    'log_llhd_fn': log_llhd,
    'expect_fn': expect,
    # Add observation model parameters here.
}

The epifx.obs module provides generic observation models for both count data and fractional data, as well as functions for reading observations from disk.

Forecast summaries¶

epifx.summary.make(params, all_obs, extra_tbls=None, pkgs=None, **kwargs)¶

A convenience function that collects all of the summary statistics defined in the pypfilt.summary and epifx.summary modules.

Parameters:

params – The simulation parameters.
all_obs – A list of all observations.
extra_tbls – A list of extra summary statistic tables to include.
pkgs – A dictionary of python modules whose versions should be recorded in the simulation metadata. By default, all of the modules recorded by pypfilt.summary.metadata are included, as is the epifx package itself.
**kwargs – Extra arguments to pass to pypfilt.summary.HDF5.

class epifx.summary.PrOutbreak(name='pr_epi')¶

Record the daily outbreak probability, defined as the sum of the weights of all particles in which an outbreak has been seeded.

Parameters:	name – the name of the table in the output file.

class epifx.summary.PeakMonitor¶

Record epidemic peak forecasts, for use with other statistics.

peak_size = None¶

A dictionary that maps observation systems to the size of each particle’s peak with respect to that system: peak_size[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_date = None¶

A dictionary that maps observation systems to the date of each particle’s peak with respect to that system: peak_date[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_time = None¶

A dictionary that maps observation systems to the time of each particle’s peak with respect to that system, measured in (fractional) days from the start of the forecasting period: peak_time[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_weight = None¶

A dictionary that maps observation systems to the weight of each particle at the time that its peak occurs: peak_weight[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

expected_obs = None¶

The expected observation for each particle for the duration of the current simulation window.

Note that this is only valid for tables to inspect in each call to add_rows(), and not in a call to finished().

days_to(date)¶: Convert a date to the (fractional) number of days from the start of the forecasting period.

class epifx.summary.PeakForecastEnsembles(peak_monitor, name='peak_ensemble')¶

Record the weighted ensemble of peak size and time predictions for each forecasting simulation.

Parameters:	peak_monitor – an instance of `PeakMonitor`. name – the name of the table in the output file.

class epifx.summary.PeakForecastCIs(peak_monitor, probs=None, name='peak_cints')¶

Record fixed-probability central credible intervals for the peak size and time predictions.

Parameters:	peak_monitor – an instance of `PeakMonitor`. probs – an array of probabilities that define the size of each central credible interval. The default value is `numpy.uint8([0, 50, 90, 95, 99, 100])`. name – the name of the table in the output file.

class epifx.summary.PeakSizeAccuracy(peak_monitor, name='peak_size_acc', toln=None)¶

Record the accuracy of the peak size predictions against multiple accuracy tolerances.

Parameters:	peak_monitor – an instance of `PeakMonitor`. name – the name of the table in the output file. toln – The accuracy thresholds for peak size predictions, expressed as percentages of the true size. The default is `np.array([10, 20, 25, 33])`.

class epifx.summary.PeakTimeAccuracy(peak_monitor, name='peak_time_acc', toln=None)¶

Record the accuracy of the peak time predictions against multiple accuracy tolerances.

Parameters:	peak_monitor – an instance of `PeakMonitor`. name – the name of the table in the output file. toln – The accuracy thresholds for peak time predictions, expressed as numbers of days. The default is `np.array([7, 10, 14])`.

class epifx.summary.ExpectedObs(peak_monitor, probs=None, name='forecasts')¶

Record fixed-probability central credible intervals for the expected observations.

Parameters:	peak_monitor – an instance of `PeakMonitor`. probs – an array of probabilities that define the size of each central credible interval. The default value is `numpy.uint8([0, 50, 90, 95, 99, 100])`. name – the name of the table in the output file.

Generating forecasts¶

import datetime
import epifx

# Simulation parameters
p_exp = 1.0 / 36
px_scale = 100
params = epifx.default_params(p_exp, px_scale)
epifx.init_prng(params, seed=42)

# Simulate from the 1st of May to the 31st of October, 2015.
start = datetime.datetime(2015, 5, 1)
until = start + datetime.timedelta(days=183)

# Load the observations and define the observation model parameters.
obs_list = ...
params['obs'][obs_unit] = {...}

# Generate weekly forecasts for the first 9 weeks (and the start date).
fs_dates = [start + datetime.timedelta(days=week * 7)
            for week in range(10)]

# Summary statistics and output file.
summary = epifx.summary.make(params, obs_list))
out = "forecasts.hdf5"

# Generate the forecasts and save them to disk.
pypfilt.forecast(params, start, until, [obs_list], fs_dates, summary, out)

Comparing forecast outputs¶

Output files can be compared for equality, which is useful for ensuring that different systems produce identical results.

epifx.cmp.files(path1, path2, verbose=True, examine=None)¶

Compare two HDF5 files for identical simulation outputs.

Parameters:	path1 – The filename of the first HDF5 file. path2 – The filename of the second HDF5 file. verbose – Whether to report successful matches. examine – The data groups to examine for equality; the default is to examine the simulation outputs (`'/data'`) and ignore the associated metadata (`'/meta'`).
Returns:	`True` if the files contain identical simulation outputs, otherwise `False`.

This functionality is also provided as a command-line script:

cmp-output --help
cmp-output file1.hdf5 file2.hdf5

Note that simulation outputs have been observed to differ depending on the installed versions of the NumPy and SciPy packages, due to changes in the random number generator.