Getting Started¶

This page assumes that you have already installed the epifx package, and shows how to generate forecasts using an SEIR model.

Simulation parameters¶

Default values for the particle filter parameters are provided:

epifx.default_params(px_count, model, time, popn_size, prng_seed=None)¶

The default simulation parameters.

Parameters:	px_count – The number of particles. model – The infection model. time – The simulation time scale. popn_size – The population size. prng_seed – The seed for both the `epifx` and `pypfilt` pseudo-random number generators.

Temporal forcing¶

The force of infection can be subject to temporal forcing (e.g., by climatic factors such as absolute humidity, or social factors such as media coverage) as mediated by the model parameter \(\sigma\) (see Infection models). This requires the simulation parameters to include a function that maps datetime.datetime instances to scalar values:

epifx.daily_forcing(filename, date_fmt=u'%Y-%m-%d')¶

Return a temporal forcing look-up function, which should be stored in params['epifx']['forcing'] in order to enable temporal forcing.

Parameters:	filename – A file that contains two columns separated by whitespace, the column first being the date and the second being the force of the temporal modulation. Note that the first line of this file is assumed to contain column headings and will be ignored. date_fmt – The format in which dates are stored.

Infection models¶

class epifx.SEIR¶

An SEIR compartment model for a single circulating influenza strain, under the assumption that recovered individuals are completely protected against reinfection.

\[\begin{split}\frac{dS}{dt} &= - \beta S^\eta I \\[0.5em] \frac{dE}{dt} &= \beta S^\eta I - \sigma E \\[0.5em] \frac{dI}{dt} &= \sigma E - \gamma I \\[0.5em] \frac{dR}{dt} &= \gamma I \\[0.5em] \beta &= R_0 \cdot \gamma\end{split}\]

Parameter	Meaning
\(R_0\)	Basic reproduction number
\(\sigma\)	Inverse of the incubation period (day ^-1)
\(\gamma\)	Inverse of the infectious period (day ^-1)
\(\eta\)	Inhomogeneous social mixing coefficient
\(\alpha\)	Temporal forcing coefficient

The force of infection can be subject to temporal forcing \(F(t)\), as mediated by \(\alpha\):

\[\beta(t) = \beta \cdot \left[1 + \alpha \cdot F(t)\right]\]

Note that this requires the forcing function \(F(t)\) to be defined in the simulation parameters (see the Temporal Forcing documentation).

Initialise the model instance.

init(params, vec)¶

Initialise a state vector.

Parameters:	params – Simulation parameters. vec – An uninitialised state vector of correct dimensions (see `state_size()`).

priors(params)¶

Return a dictionary of model parameter priors.

Parameters:	params – Simulation parameters.

set_params(vec)¶: Read the model parameters from the provided matrix.

load_params(sample_file)¶: Load the model parameters from a file.

save_params(vec, sample_file)¶: Save the model parameters to a file.

update(params, step_date, dt, is_fs, prev, curr)¶

Perform a single time-step.

Parameters:	params – Simulation parameters. step_date – The date and time of the current time-step. dt – The time-step size (days). is_fs – Indicates whether this is a forecasting simulation. prev – The state before the time-step. curr – The state after the time-step (destructively updated).

Arbitrary model priors¶

In addition to sampling each model parameter independently, the epifx.select module provides support for sampling particles according to arbitrary target distributions, using an accept-reject sampler.

epifx.select.select(params, start, end, proposal, target, seed, info=False)¶

Select particles according to a target distribution.

Parameters:

params – The simulation parameters (note: the parameter dictionary will be emptied once the particles have been selected).
start – The start of the simulation period.
end – The end of the simulation period.
proposal – The proposal distribution.
target – The target distribution.
seed – The PRNG seed used for sampling and accepting particles.
info – Whether to return additional information about the particles.

Returns:

If info is False, returns the initial state vector for each accepted particle. If info is True, returns a tuple that contains the initial state vectors, a boolean array that indicates which of the proposed particles were accepted, and the summary tables for all proposed particles.

# Save the accepted particles to disk.
vec = epifx.select.select(params, start, end, proposal, target, seed)

# Load the accepted particles for use in a simulation.
model = epifx.SEIR()
model.set_params(vec)
params = epifx.default_params(px_count, model, time, popn_size)

Any proposal distribution can be used with this sampler, including the default model prior:

class epifx.select.Proposal¶

The base class for proposal particle distributions.

sample(params, hist, prng)¶

Draw particle samples from the proposal distribution.

Parameters:	params – The simulation parameters. hist – The particle history matrix into which the samples should be written. prng – The PRNG instance to use for any random sampling.

class epifx.select.DefaultProposal¶: A proposal distribution that independently samples each parameter from the prior distributions provided in the simulation parameters.

Any target distribution for which a probability density can be defined can be used with this sampler:

class epifx.select.Target¶

The base class for target particle distributions.

prepare_summary(summary)¶

Add the necessary tables to a summary object so that required summary statistics are recorded.

Parameters:	summary – The summary object to which the required tables should be added.

logpdf(output)¶

Return the log of the target probability density for each particle.

Parameters:	output – The state object returned by `pypfilt.run`; summary tables are located at `output['summary'][table_name]`.

Two target distributions are provided by this module.

The TargetAny distribution accepts all particles with equal likelihood, for the case where the proposal distribution is identical to the desired target distribution:

class epifx.select.TargetAny¶: A distribution that accepts all proposals with equal likelihood.

The TargetPeakMVN distribution is a multivariate normal distribution for the peak timing and size, as defined by previously-observed peaks:

class epifx.select.TargetPeakMVN(peak_sizes, peak_times)¶

A multivariate normal distribution for the peak timing and size.

Parameters:	peak_sizes – An array of previously-observed peak sizes. peak_time – An array of previously-observed peak times.

Observation models¶

The epifx.obs module provides generic observation models for count data with known or unknown denominators, as well as functions for reading observations from disk and a base class for custom observation models.

Each observation model must have a unique unit, and is used to calculate likelihoods for all observations that share this same unit.

import epifx.obs

# Create the simulation parameters.
params = ...
# Create an observation model for weekly data (with a period of 7 days),
# that pertains to all observations whose unit is "obs_unit".
obs_model = epifx.obs.PopnCounts("obs_unit", obs_period=7)
# Define the observation model parameters.
obs_model.define_params(params, bg_obs=300, pr_obs=0.01, disp=100)

Forecast summaries¶

epifx.summary.make(params, all_obs, default=True, extra_tbls=None, pkgs=None, **kwargs)¶

A convenience function that collects all of the summary statistics defined in the pypfilt.summary and epifx.summary modules.

Parameters:

params – The simulation parameters.
all_obs – A list of all observations.
default – Whether to add all of the tables defined in the pypfilt.summary and epifx.summary modules.
extra_tbls – A list of extra summary statistic tables to include.
pkgs – A dictionary of python modules whose versions should be recorded in the simulation metadata. By default, all of the modules recorded by pypfilt.summary.Metadata are included, as is the epifx package itself.
**kwargs – Extra arguments to pass to pypfilt.summary.HDF5.

For example:

from epifx.summary import make
params = ...
all_obs = ...
stats = make(params, all_obs, first_day=True, only_fs=True)

class epifx.summary.Metadata¶

Document the simulation parameters and system environment for a set of simulations. This extends the default pypfilt black-list to ignore parameters specific to epifx.

build(params, pkgs=None)¶

Construct a metadata dictionary that documents the simulation parameters and system environment.

Parameters:	params – The simulation parameters. pkgs – A dictionary that maps package names to modules that define appropriate `__version__` attributes, used to record the versions of additional relevant packages.

class epifx.summary.PrOutbreak(name=u'pr_epi')¶

Record the daily outbreak probability, defined as the sum of the weights of all particles in which an outbreak has been seeded.

Parameters:	name – the name of the table in the output file.

class epifx.summary.PeakMonitor¶

Record epidemic peak forecasts, for use with other statistics.

peak_size = None¶

A dictionary that maps observation systems to the size of each particle’s peak with respect to that system: peak_size[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_date = None¶

A dictionary that maps observation systems to the date of each particle’s peak with respect to that system: peak_date[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_time = None¶

A dictionary that maps observation systems to the time of each particle’s peak with respect to that system, measured in (fractional) days from the start of the forecasting period: peak_time[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_weight = None¶

A dictionary that maps observation systems to the weight of each particle at the time that its peak occurs: peak_weight[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

expected_obs = None¶

The expected observation for each particle for the duration of the current simulation window.

Note that this is only valid for tables to inspect in each call to add_rows(), and not in a call to finished().

days_to(date)¶: Convert a date to the (fractional) number of days from the start of the forecasting period.

class epifx.summary.PeakForecastEnsembles(peak_monitor, name=u'peak_ensemble', fs_only=True)¶

Record the weighted ensemble of peak size and time predictions for each forecasting simulation.

Parameters:	peak_monitor – an instance of `PeakMonitor`. name – the name of the table in the output file.

class epifx.summary.PeakForecastCIs(peak_monitor, probs=None, name=u'peak_cints')¶

Record fixed-probability central credible intervals for the peak size and time predictions.

Parameters:	peak_monitor – an instance of `PeakMonitor`. probs – an array of probabilities that define the size of each central credible interval. The default value is `numpy.uint8([0, 50, 90, 95, 99, 100])`. name – the name of the table in the output file.

class epifx.summary.PeakSizeAccuracy(peak_monitor, name=u'peak_size_acc', toln=None)¶

Record the accuracy of the peak size predictions against multiple accuracy tolerances.

Parameters:	peak_monitor – an instance of `PeakMonitor`. name – the name of the table in the output file. toln – The accuracy thresholds for peak size predictions, expressed as percentages of the true size. The default is `np.array([10, 20, 25, 33])`.

class epifx.summary.PeakTimeAccuracy(peak_monitor, name=u'peak_time_acc', toln=None)¶

Record the accuracy of the peak time predictions against multiple accuracy tolerances.

Parameters:	peak_monitor – an instance of `PeakMonitor`. name – the name of the table in the output file. toln – The accuracy thresholds for peak time predictions, expressed as numbers of days. The default is `np.array([7, 10, 14])`.

class epifx.summary.ExpectedObs(peak_monitor, probs=None, name=u'expected_obs')¶

Record fixed-probability central credible intervals for the expected observations.

Parameters:	peak_monitor – an instance of `PeakMonitor`. probs – an array of probabilities that define the size of each central credible interval. The default value is `numpy.uint8([0, 50, 90, 95, 99, 100])`. name – the name of the table in the output file.

class epifx.summary.ObsLikelihood(name=u'obs_llhd', extra_obs=None)¶

Record the likelihood of each observation according to each particle. Note that this table registers its record_obs_llhd method in the parameter dictionary so that it can obtain the observation likelihoods.

Parameters:	name – the name of the table in the output file. extra_obs – Observations that will not be used in the filtering process, but whose likelihoods are of interest.

class epifx.summary.ThresholdMonitor(threshold)¶

Record when expected observations exceed a specific threshold.

Parameters:	threshold – The threshold observation value.

exceed_date = None¶

A dictionary that maps observation systems to the date when each particle exceeded the specific threshold: exceed_date[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

exceed_weight = None¶

A dictionary that maps observation systems to the final weight of each particle: exceed_weight.

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

exceed_mask = None¶

A dictionary that maps observation systems to Boolean arrays that indicate which particles have exceeded the threshold: exceed_mask[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

class epifx.summary.ExceedThreshold(thresh_monitor, name=u'exceed_threshold', fs_only=True)¶

Record when expected observations exceed a specific threshold.

Parameters:	thresh_monitor – an instance of `ThresholdMonitor`. name – the name of the table in the output file.

define_bins(params, start, end, bin_width)¶

Divide the time scale into a finite number of bins.

This table will record the (weighted) proportion of particles that first exceeded the threshold in each of these bins. Note that the bins must be defined before this table can be used.

Parameters:	params – The simulation parameters. start – The time that marks the start of the first bin. end – The time that marks the end of the last bin. bin_width – The scalar bin width.

Generating forecasts¶

import datetime
import epifx
import epifx.obs

# Simulation parameters
num_px = 3600
model = epifx.SEIR()
time = pypfilt.Datetime()
popn = 4000000
prng_seed = 42
params = epifx.default_params(num_px, model, time, popn_size, prng_seed)

# Simulate from the 1st of May to the 31st of October, 2015.
start = datetime.datetime(2015, 5, 1)
until = start + datetime.timedelta(days=183)

# Load the relevant observations.
obs_list = ...
# Create an observation model for weekly data (with a period of 7 days).
obs_model = epifx.obs.PopnCounts("obs_unit", obs_period=7)
# Define the observation model parameters.
obs_model.define_params(params, bg_obs=300, pr_obs=0.01, disp=100)

# Generate weekly forecasts for the first 9 weeks (and the start date).
fs_dates = [start + datetime.timedelta(days=week * 7)
            for week in range(10)]

# Summary statistics and output file.
summary = epifx.summary.make(params, obs_list))
out = "forecasts.hdf5"

# Generate the forecasts and save them to disk.
pypfilt.forecast(params, start, until, [obs_list], fs_dates, summary, out)

Comparing forecast outputs¶

Output files can be compared for equality, which is useful for ensuring that different systems produce identical results.

epifx.cmd.cmp.files(path1, path2, verbose=True, examine=None)¶

Compare two HDF5 files for identical simulation outputs.

Parameters:	path1 – The filename of the first HDF5 file. path2 – The filename of the second HDF5 file. verbose – Whether to report successful matches. examine – The data groups to examine for equality; the default is to examine the simulation outputs (`'/data'`) and ignore the associated metadata (`'/meta'`).
Returns:	`True` if the files contain identical simulation outputs, otherwise `False`.

This functionality is also provided as a command-line script.