Getting Started

This page assumes that you have already installed the epifx package, and shows how to generate forecasts using an SEIR model.

Simulation parameters

Default values for the particle filter parameters are provided:

epifx.default_params(px_count, model, time, popn_size, prng_seed=None)

The default simulation parameters.

Parameters:
  • px_count – The number of particles.
  • model – The infection model.
  • time – The simulation time scale.
  • popn_size – The population size.
  • prng_seed – The seed for both the epifx and pypfilt pseudo-random number generators.

Temporal forcing

The force of infection can be subject to temporal forcing (e.g., by climatic factors such as absolute humidity, or social factors such as media coverage) as mediated by the model parameter \(\sigma\) (see Infection models). This requires the simulation parameters to include a function that maps datetime.datetime instances to scalar values:

epifx.daily_forcing(filename, date_fmt='%Y-%m-%d')

Return a temporal forcing look-up function, which should be stored in params['epifx']['forcing'] in order to enable temporal forcing.

Parameters:
  • filename – A file that contains two columns separated by whitespace, the column first being the date and the second being the force of the temporal modulation. Note that the first line of this file is assumed to contain column headings and will be ignored.
  • date_fmt – The format in which dates are stored.

Infection models

epifx.SEIR

Arbitrary model priors

In addition to sampling each model parameter independently, the epifx.select module provides support for sampling particles according to arbitrary target distributions, using an accept-reject sampler.

epifx.select.select(params, start, end, proposal, target, seed, info=False)

Select particles according to a target distribution.

Parameters:
  • params – The simulation parameters (note: the parameter dictionary will be emptied once the particles have been selected).
  • start – The start of the simulation period.
  • end – The end of the simulation period.
  • proposal – The proposal distribution.
  • target – The target distribution.
  • seed – The PRNG seed used for sampling and accepting particles.
  • info – Whether to return additional information about the particles.
Returns:

If info is False, returns the initial state vector for each accepted particle. If info is True, returns a tuple that contains the initial state vectors, a boolean array that indicates which of the proposed particles were accepted, and the summary tables for all proposed particles.

# Save the accepted particles to disk.
vec = epifx.select.select(params, start, end, proposal, target, seed)

# Load the accepted particles for use in a simulation.
model = epifx.SEIR()
model.set_params(vec)
params = epifx.default_params(px_count, model, time, popn_size)

Any proposal distribution can be used with this sampler, including the default model prior:

class epifx.select.Proposal

The base class for proposal particle distributions.

sample(params, hist, prng)

Draw particle samples from the proposal distribution.

Parameters:
  • params – The simulation parameters.
  • hist – The particle history matrix into which the samples should be written.
  • prng – The PRNG instance to use for any random sampling.
class epifx.select.DefaultProposal

A proposal distribution that independently samples each parameter from the prior distributions provided in the simulation parameters.

sample(params, hist, prng)

Draw particle samples from the proposal distribution.

Parameters:
  • params – The simulation parameters.
  • hist – The particle history matrix into which the samples should be written.
  • prng – The PRNG instance to use for any random sampling.

Any target distribution for which a probability density can be defined can be used with this sampler:

class epifx.select.Target

The base class for target particle distributions.

prepare_summary(summary)

Add the necessary tables to a summary object so that required summary statistics are recorded.

Parameters:summary – The summary object to which the required tables should be added.
logpdf(output)

Return the log of the target probability density for each particle.

Parameters:output – The state object returned by pypfilt.run; summary tables are located at output['summary'][table_name].

Two target distributions are provided by this module.

The TargetAny distribution accepts all particles with equal likelihood, for the case where the proposal distribution is identical to the desired target distribution:

class epifx.select.TargetAny

A distribution that accepts all proposals with equal likelihood.

prepare_summary(summary)

Add the necessary tables to a summary object so that required summary statistics are recorded.

Parameters:summary – The summary object to which the required tables should be added.
logpdf(params, output)

Return the log of the target probability density for each particle.

Parameters:output – The state object returned by pypfilt.run; summary tables are located at output['summary'][table_name].

The TargetPeakMVN distribution is a multivariate normal distribution for the peak timing and size, as defined by previously-observed peaks:

class epifx.select.TargetPeakMVN(peak_sizes, peak_times)

A multivariate normal distribution for the peak timing and size.

Parameters:
  • peak_sizes – An array of previously-observed peak sizes.
  • peak_time – An array of previously-observed peak times.
prepare_summary(summary)

Add the necessary tables to a summary object so that required summary statistics are recorded.

Parameters:summary – The summary object to which the required tables should be added.
logpdf(params, output)

Return the log of the target probability density for each particle.

Parameters:output – The state object returned by pypfilt.run; summary tables are located at output['summary'][table_name].

Observation models

The epifx.obs module provides generic observation models for count data with known or unknown denominators, as well as functions for reading observations from disk and a base class for custom observation models.

Each observation model must have a unique unit, and is used to calculate likelihoods for all observations that share this same unit.

import epifx.obs

# Create the simulation parameters.
params = ...
# Create an observation model for weekly data (with a period of 7 days),
# that pertains to all observations whose unit is "obs_unit".
obs_model = epifx.obs.PopnCounts("obs_unit", obs_period=7)
# Define the observation model parameters.
obs_model.define_params(params, bg_obs=300, pr_obs=0.01, disp=100)

Forecast summaries

epifx.summary.make(params, all_obs, default=True, extra_tbls=None, pkgs=None, **kwargs)

A convenience function that collects all of the summary statistics defined in the pypfilt.summary and epifx.summary modules.

Parameters:
  • params – The simulation parameters.
  • all_obs – A list of all observations.
  • default – Whether to add all of the tables defined in the pypfilt.summary and epifx.summary modules.
  • extra_tbls – A list of extra summary statistic tables to include.
  • pkgs – A dictionary of python modules whose versions should be recorded in the simulation metadata. By default, all of the modules recorded by pypfilt.summary.Metadata are included, as is the epifx package itself.
  • **kwargs – Extra arguments to pass to pypfilt.summary.HDF5.

For example:

from epifx.summary import make
params = ...
all_obs = ...
stats = make(params, all_obs, first_day=True, only_fs=True)
epifx.summary.Metadata
epifx.summary.PrOutbreak
epifx.summary.PeakMonitor
epifx.summary.PeakForecastEnsembles
epifx.summary.PeakForecastCIs
epifx.summary.PeakSizeAccuracy
epifx.summary.PeakTimeAccuracy
epifx.summary.ExpectedObs
epifx.summary.ObsLikelihood
epifx.summary.ThresholdMonitor
epifx.summary.ExceedThreshold

Generating forecasts

import datetime
import epifx
import epifx.obs

# Simulation parameters
num_px = 3600
model = epifx.SEIR()
time = pypfilt.Datetime()
popn = 4000000
prng_seed = 42
params = epifx.default_params(num_px, model, time, popn_size, prng_seed)

# Simulate from the 1st of May to the 31st of October, 2015.
start = datetime.datetime(2015, 5, 1)
until = start + datetime.timedelta(days=183)

# Load the relevant observations.
obs_list = ...
# Create an observation model for weekly data (with a period of 7 days).
obs_model = epifx.obs.PopnCounts("obs_unit", obs_period=7)
# Define the observation model parameters.
obs_model.define_params(params, bg_obs=300, pr_obs=0.01, disp=100)

# Generate weekly forecasts for the first 9 weeks (and the start date).
fs_dates = [start + datetime.timedelta(days=week * 7)
            for week in range(10)]

# Summary statistics and output file.
summary = epifx.summary.make(params, obs_list))
out = "forecasts.hdf5"

# Generate the forecasts and save them to disk.
pypfilt.forecast(params, start, until, [obs_list], fs_dates, summary, out)

Comparing forecast outputs

Output files can be compared for equality, which is useful for ensuring that different systems produce identical results.

epifx.cmd.cmp.files(path1, path2, verbose=True, examine=None)

Compare two HDF5 files for identical simulation outputs.

Parameters:
  • path1 – The filename of the first HDF5 file.
  • path2 – The filename of the second HDF5 file.
  • verbose – Whether to report successful matches.
  • examine – The data groups to examine for equality; the default is to examine the simulation outputs ('/data') and ignore the associated metadata ('/meta').
Returns:

True if the files contain identical simulation outputs, otherwise False.

This functionality is also provided as a command-line script.