Epidemic forecasting in Python

Welcome to the epifx documentation. This package uses a bootstrap particle filter to generate influenza epidemic forecasts from mechanistic infection models.

Installation

You can install epifx using pip:

# Install pypfilt (with plotting support) and epifx.
pip install pypfilt[plot] epifx

See the pypfilt documentation for further details.

Getting Started

This page assumes that you have already installed the epifx package, and shows how to generate forecasts using an SEIR model.

Simulation parameters

Default values for the particle filter parameters are provided:

epifx.default_params(px_count, model, time, popn_size, prng_seed=None)

The default simulation parameters.

Parameters:
  • px_count – The number of particles.
  • model – The infection model.
  • time – The simulation time scale.
  • popn_size – The population size.
  • prng_seed – The seed for both the epifx and pypfilt pseudo-random number generators.

Temporal forcing

The force of infection can be subject to temporal forcing (e.g., by climatic factors such as absolute humidity, or social factors such as media coverage) as mediated by the model parameter \(\sigma\) (see Infection models). This requires the simulation parameters to include a function that maps datetime.datetime instances to scalar values:

epifx.daily_forcing(filename, date_fmt='%Y-%m-%d')

Return a temporal forcing look-up function, which should be stored in params['epifx']['forcing'] in order to enable temporal forcing.

Parameters:
  • filename – A file that contains two columns separated by whitespace, the column first being the date and the second being the force of the temporal modulation. Note that the first line of this file is assumed to contain column headings and will be ignored.
  • date_fmt – The format in which dates are stored.

Infection models

class epifx.SEIR

An SEIR compartment model for a single circulating influenza strain, under the assumption that recovered individuals are completely protected against reinfection.

\[\begin{split}\frac{dS}{dt} &= - \beta S^\eta I \\[0.5em] \frac{dE}{dt} &= \beta S^\eta I - \sigma E \\[0.5em] \frac{dI}{dt} &= \sigma E - \gamma I \\[0.5em] \frac{dR}{dt} &= \gamma I \\[0.5em] \beta &= R_0 \cdot \gamma\end{split}\]
Parameter Meaning
\(R_0\) Basic reproduction number
\(\sigma\) Inverse of the incubation period (day -1)
\(\gamma\) Inverse of the infectious period (day -1)
\(\eta\) Inhomogeneous social mixing coefficient
\(\alpha\) Temporal forcing coefficient

The force of infection can be subject to temporal forcing \(F(t)\), as mediated by \(\alpha\):

\[\beta(t) = \beta \cdot \left[1 + \alpha \cdot F(t)\right]\]

Note that this requires the forcing function \(F(t)\) to be defined in the simulation parameters (see the Temporal Forcing documentation).

Initialise the model instance.

init(params, vec)

Initialise a state vector.

Parameters:
  • params – Simulation parameters.
  • vec – An uninitialised state vector of correct dimensions (see state_size()).
priors(params)

Return a dictionary of model parameter priors.

Parameters:params – Simulation parameters.
set_params(vec)

Read the model parameters from the provided matrix.

load_params(sample_file)

Load the model parameters from a file.

save_params(vec, sample_file)

Save the model parameters to a file.

update(params, step_date, dt, is_fs, prev, curr)

Perform a single time-step.

Parameters:
  • params – Simulation parameters.
  • step_date – The date and time of the current time-step.
  • dt – The time-step size (days).
  • is_fs – Indicates whether this is a forecasting simulation.
  • prev – The state before the time-step.
  • curr – The state after the time-step (destructively updated).
class epifx.SEEIIR(extra_params=None)

An SEEIIR compartment model for a single circulating influenza strain, under the assumption that recovered individuals are completely protected against reinfection.

\[\begin{split}\frac{dS}{dt} &= - \beta S^\eta (I_1 + I_2) \\[0.5em] \frac{dE_1}{dt} &= \beta S^\eta (I_1 + I_2) - 2 \sigma E_1 \\[0.5em] \frac{dE_2}{dt} &= 2 \sigma E_1 - 2 \sigma E_2 \\[0.5em] \frac{dI_1}{dt} &= 2 \sigma E_2 - 2 \gamma I_1 \\[0.5em] \frac{dI_2}{dt} &= 2 \gamma I_1 - 2 \gamma I_2 \\[0.5em] \frac{dR}{dt} &= 2 \gamma I_2 \\[0.5em] \beta &= R_0 \cdot \gamma\end{split}\]
Parameter Meaning
\(R_0\) Basic reproduction number
\(\sigma\) Inverse of the incubation period (day -1)
\(\gamma\) Inverse of the infectious period (day -1)
\(\eta\) Inhomogeneous social mixing coefficient
\(\alpha\) Temporal forcing coefficient

The force of infection can be subject to temporal forcing \(F(t)\), as mediated by \(\alpha\):

\[\beta(t) = \beta \cdot \left[1 + \alpha \cdot F(t)\right]\]

Note that this requires the forcing function \(F(t)\) to be defined in the simulation parameters (see the Temporal Forcing documentation).

init(params, vec)

Initialise a state vector.

Parameters:
  • params – Simulation parameters.
  • vec – An uninitialised state vector of correct dimensions (see state_size()).
priors(params)

Return a dictionary of model parameter priors.

Parameters:params – Simulation parameters.
set_params(vec)

Read the model parameters from the provided matrix.

load_params(sample_file)

Load the model parameters from a file.

save_params(vec, sample_file)

Save the model parameters to a file.

update(params, step_date, dt, is_fs, prev, curr)

Perform a single time-step.

Parameters:
  • params – Simulation parameters.
  • step_date – The date and time of the current time-step.
  • dt – The time-step size (days).
  • is_fs – Indicates whether this is a forecasting simulation.
  • prev – The state before the time-step.
  • curr – The state after the time-step (destructively updated).

Arbitrary model priors

In addition to sampling each model parameter independently, the epifx.select module provides support for sampling particles according to arbitrary target distributions, using an accept-reject sampler.

epifx.select.select(params, start, end, proposal, target, seed, info=False)

Select particles according to a target distribution.

Parameters:
  • params – The simulation parameters (note: the parameter dictionary will be emptied once the particles have been selected).
  • start – The start of the simulation period.
  • end – The end of the simulation period.
  • proposal – The proposal distribution.
  • target – The target distribution.
  • seed – The PRNG seed used for sampling and accepting particles.
  • info – Whether to return additional information about the particles.
Returns:

If info is False, returns the initial state vector for each accepted particle. If info is True, returns a tuple that contains the initial state vectors, a boolean array that indicates which of the proposed particles were accepted, and the summary tables for all proposed particles.

# Save the accepted particles to disk.
vec = epifx.select.select(params, start, end, proposal, target, seed)

# Load the accepted particles for use in a simulation.
model = epifx.SEIR()
model.set_params(vec)
params = epifx.default_params(px_count, model, time, popn_size)

Any proposal distribution can be used with this sampler, including the default model prior:

class epifx.select.Proposal

The base class for proposal particle distributions.

sample(params, hist, prng)

Draw particle samples from the proposal distribution.

Parameters:
  • params – The simulation parameters.
  • hist – The particle history matrix into which the samples should be written.
  • prng – The PRNG instance to use for any random sampling.
class epifx.select.DefaultProposal

A proposal distribution that independently samples each parameter from the prior distributions provided in the simulation parameters.

Any target distribution for which a probability density can be defined can be used with this sampler:

class epifx.select.Target

The base class for target particle distributions.

prepare_summary(summary)

Add the necessary tables to a summary object so that required summary statistics are recorded.

Parameters:summary – The summary object to which the required tables should be added.
logpdf(output)

Return the log of the target probability density for each particle.

Parameters:output – The state object returned by pypfilt.run; summary tables are located at output['summary'][table_name].

Two target distributions are provided by this module.

The TargetAny distribution accepts all particles with equal likelihood, for the case where the proposal distribution is identical to the desired target distribution:

class epifx.select.TargetAny

A distribution that accepts all proposals with equal likelihood.

The TargetPeakMVN distribution is a multivariate normal distribution for the peak timing and size, as defined by previously-observed peaks:

class epifx.select.TargetPeakMVN(peak_sizes, peak_times)

A multivariate normal distribution for the peak timing and size.

Parameters:
  • peak_sizes – An array of previously-observed peak sizes.
  • peak_time – An array of previously-observed peak times.

Observation models

The epifx.obs module provides generic observation models for count data with known or unknown denominators, as well as functions for reading observations from disk and a base class for custom observation models.

Each observation model must have a unique unit, and is used to calculate likelihoods for all observations that share this same unit.

import epifx.obs

# Create the simulation parameters.
params = ...
# Create an observation model for weekly data (with a period of 7 days),
# that pertains to all observations whose unit is "obs_unit".
obs_model = epifx.obs.PopnCounts("obs_unit", obs_period=7)
# Define the observation model parameters.
obs_model.define_params(params, bg_obs=300, pr_obs=0.01, disp=100)

Forecast summaries

epifx.summary.make(params, all_obs, default=True, extra_tbls=None, pkgs=None, **kwargs)

A convenience function that collects all of the summary statistics defined in the pypfilt.summary and epifx.summary modules.

Parameters:
  • params – The simulation parameters.
  • all_obs – A list of all observations.
  • default – Whether to add all of the tables defined in the pypfilt.summary and epifx.summary modules.
  • extra_tbls – A list of extra summary statistic tables to include.
  • pkgs – A dictionary of python modules whose versions should be recorded in the simulation metadata. By default, all of the modules recorded by pypfilt.summary.Metadata are included, as is the epifx package itself.
  • **kwargs – Extra arguments to pass to pypfilt.summary.HDF5.

For example:

from epifx.summary import make
params = ...
all_obs = ...
stats = make(params, all_obs, first_day=True, only_fs=True)
class epifx.summary.Metadata

Document the simulation parameters and system environment for a set of simulations. This extends the default pypfilt black-list to ignore parameters specific to epifx.

build(params, pkgs=None)

Construct a metadata dictionary that documents the simulation parameters and system environment.

Parameters:
  • params – The simulation parameters.
  • pkgs – A dictionary that maps package names to modules that define appropriate __version__ attributes, used to record the versions of additional relevant packages.
class epifx.summary.PrOutbreak(name='pr_epi')

Record the daily outbreak probability, defined as the sum of the weights of all particles in which an outbreak has been seeded.

Parameters:name – the name of the table in the output file.
class epifx.summary.PeakMonitor

Record epidemic peak forecasts, for use with other statistics.

peak_size = None

A dictionary that maps observation systems to the size of each particle’s peak with respect to that system: peak_size[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_date = None

A dictionary that maps observation systems to the date of each particle’s peak with respect to that system: peak_date[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_time = None

A dictionary that maps observation systems to the time of each particle’s peak with respect to that system, measured in (fractional) days from the start of the forecasting period: peak_time[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

peak_weight = None

A dictionary that maps observation systems to the weight of each particle at the time that its peak occurs: peak_weight[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

expected_obs = None

The expected observation for each particle for the duration of the current simulation window.

Note that this is only valid for tables to inspect in each call to add_rows(), and not in a call to finished().

days_to(date)

Convert a date to the (fractional) number of days from the start of the forecasting period.

class epifx.summary.PeakForecastEnsembles(peak_monitor, name='peak_ensemble', fs_only=True)

Record the weighted ensemble of peak size and time predictions for each forecasting simulation.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • name – the name of the table in the output file.
class epifx.summary.PeakForecastCIs(peak_monitor, probs=None, name='peak_cints')

Record fixed-probability central credible intervals for the peak size and time predictions.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • probs – an array of probabilities that define the size of each central credible interval. The default value is numpy.uint8([0, 50, 90, 95, 99, 100]).
  • name – the name of the table in the output file.
class epifx.summary.PeakSizeAccuracy(peak_monitor, name='peak_size_acc', toln=None)

Record the accuracy of the peak size predictions against multiple accuracy tolerances.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • name – the name of the table in the output file.
  • toln – The accuracy thresholds for peak size predictions, expressed as percentages of the true size. The default is np.array([10, 20, 25, 33]).
class epifx.summary.PeakTimeAccuracy(peak_monitor, name='peak_time_acc', toln=None)

Record the accuracy of the peak time predictions against multiple accuracy tolerances.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • name – the name of the table in the output file.
  • toln – The accuracy thresholds for peak time predictions, expressed as numbers of days. The default is np.array([7, 10, 14]).
class epifx.summary.ExpectedObs(peak_monitor, probs=None, name='expected_obs')

Record fixed-probability central credible intervals for the expected observations.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • probs – an array of probabilities that define the size of each central credible interval. The default value is numpy.uint8([0, 50, 90, 95, 99, 100]).
  • name – the name of the table in the output file.
class epifx.summary.PredictiveCIs(peak_monitor, probs=None, name='forecasts')

Record fixed-probability central credible intervals for the observations.

Parameters:
  • peak_monitor – an instance of PeakMonitor.
  • probs – an array of probabilities that define the size of each central credible interval. The default value is numpy.uint8([0, 50, 95]).
  • name – the name of the table in the output file.
class epifx.summary.ObsLikelihood(name='obs_llhd', extra_obs=None)

Record the likelihood of each observation according to each particle. Note that this table registers its record_obs_llhd method in the parameter dictionary so that it can obtain the observation likelihoods.

Parameters:
  • name – the name of the table in the output file.
  • extra_obs – Observations that will not be used in the filtering process, but whose likelihoods are of interest.
class epifx.summary.ThresholdMonitor(threshold)

Record when expected observations exceed a specific threshold.

Parameters:threshold – The threshold observation value.
exceed_date = None

A dictionary that maps observation systems to the date when each particle exceeded the specific threshold: exceed_date[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

exceed_weight = None

A dictionary that maps observation systems to the final weight of each particle: exceed_weight.

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

exceed_mask = None

A dictionary that maps observation systems to Boolean arrays that indicate which particles have exceeded the threshold: exceed_mask[(unit, period)].

Note that this is only valid for tables to inspect in the finished() method, and not in the add_rows() method.

class epifx.summary.ExceedThreshold(thresh_monitor, name='exceed_threshold', fs_only=True)

Record when expected observations exceed a specific threshold.

Parameters:
  • thresh_monitor – an instance of ThresholdMonitor.
  • name – the name of the table in the output file.
define_bins(params, start, end, bin_width)

Divide the time scale into a finite number of bins.

This table will record the (weighted) proportion of particles that first exceeded the threshold in each of these bins. Note that the bins must be defined before this table can be used.

Parameters:
  • params – The simulation parameters.
  • start – The time that marks the start of the first bin.
  • end – The time that marks the end of the last bin.
  • bin_width – The scalar bin width.

Generating forecasts

import datetime
import epifx
import epifx.obs

# Simulation parameters
num_px = 3600
model = epifx.SEIR()
time = pypfilt.Datetime()
popn = 4000000
prng_seed = 42
params = epifx.default_params(num_px, model, time, popn_size, prng_seed)

# Simulate from the 1st of May to the 31st of October, 2015.
start = datetime.datetime(2015, 5, 1)
until = start + datetime.timedelta(days=183)

# Load the relevant observations.
obs_list = ...
# Create an observation model for weekly data (with a period of 7 days).
obs_model = epifx.obs.PopnCounts("obs_unit", obs_period=7)
# Define the observation model parameters.
obs_model.define_params(params, bg_obs=300, pr_obs=0.01, disp=100)

# Generate weekly forecasts for the first 9 weeks (and the start date).
fs_dates = [start + datetime.timedelta(days=week * 7)
            for week in range(10)]

# Summary statistics and output file.
summary = epifx.summary.make(params, obs_list))
out = "forecasts.hdf5"

# Generate the forecasts and save them to disk.
pypfilt.forecast(params, start, until, [obs_list], fs_dates, summary, out)

Comparing forecast outputs

Output files can be compared for equality, which is useful for ensuring that different systems produce identical results.

epifx.cmd.cmp.files(path1, path2, verbose=True, examine=None)

Compare two HDF5 files for identical simulation outputs.

Parameters:
  • path1 – The filename of the first HDF5 file.
  • path2 – The filename of the second HDF5 file.
  • verbose – Whether to report successful matches.
  • examine – The data groups to examine for equality; the default is to examine the simulation outputs ('/data') and ignore the associated metadata ('/meta').
Returns:

True if the files contain identical simulation outputs, otherwise False.

This functionality is also provided as a command-line script.

Observation Models

Two observation models are provided, both of which related disease incidence in the simulation model to count data, but differ in how they treat the denominator.

Observations from the entire population

The PopnCounts class provides a generic observation model for relating disease incidence to count data where the denominator is assumed to be the population size \(N\) (i.e., the denominator is assumed constant and is either not known or is known to be the population size \(N\)).

This observation model assumes that the relationship between observed case counts \(y_t\) and disease incidence in particle \(x_t\) follow a negative binomial distribution with mean \(\mathbb{E}[y_t]\) and dispersion parameter \(k\).

\[ \begin{align}\begin{aligned}\mathcal{L}(y_t \mid x_t) &\sim NB(\mathbb{E}[y_t], k)\\\mathbb{E}[y_t] &= (1 - p_\mathrm{inf}) \cdot bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\\\operatorname{Var}[y_t] &= \mathbb{E}[y_t] + \frac{\left(\mathbb{E}[y_t]\right)^2}{k} \ge bg_\mathrm{var}\end{aligned}\end{align} \]

The observation model parameters comprise:

  • The background observation rate \(bg_\mathrm{obs}\);
  • The variance in the background signal \(bg_\mathrm{var}\) (the minimum variance);
  • The probability of observing an infected individual \(p_\mathrm{obs}\); and
  • The dispersion parameter \(k\), which controls the relationship between the mean \((\mathbb{E}[y_t])\) and the variance; as \(k \to \infty\) the distribution approaches the Poisson, as \(k \to 0\) the distribution becomes more and more over-dispersed with respect to the Poisson.
Probability mass functions for the PopnCounts observation model.

Probability mass functions for the PopnCounts observation model with different values of the dispersion parameter \(k\), for the expected value \(\mathbb{E}[y_t] = 200\) (vertical dashed line).

Observations from population samples

The SampleCounts class provides a generic observation model for relating disease incidence to count data where the denominator is reported and may vary (e.g., weekly counts of all patients and the number that presented with influenza-like illness), and where the background signal is not a fixed value but rather a fixed proportion.

For this observation model, each observed value \((y_t)\) is the fraction of all patients \((N_t)\) that were identified as cases \((c_t)\). This observation model assumes that the relationship between the observed fraction \(y_t\), the denominator \(N_t\), and disease incidence in particle \(x_t\) follows a Beta-binomial distribution with probability of success \(\mathbb{E}[y_t]\) and dispersion parameter \(k\).

\[ \begin{align}\begin{aligned}y_t &= \frac{c_t}{N_t}\\\mathcal{L}(y_t \mid x_t) &= \mathcal{L}(c_t \mid N_t, x_t)\\\mathbb{E}[y_t] &= bg_\mathrm{obs} + p_\mathrm{inf} \cdot \kappa_\mathrm{obs}\\\operatorname{Var}[y_t] &= \frac{p \cdot (1 - p)}{N_t} \cdot \left[ 1 + \frac{N_t - 1}{k + 1} \right] \ge bg_\mathrm{var}\\\mathcal{L}(c_t \mid N_t, x_t) &\sim \operatorname{BetaBin}(N_t, \mathbb{E}[y_t], k)\\\mathbb{E}[c_t] &= N_t \cdot \mathbb{E}[y_t]\\\operatorname{Var}[c_t] &= N_t \cdot p \cdot (1 - p) \cdot \left[ 1 + \frac{N_t - 1}{k + 1} \right]\end{aligned}\end{align} \]

The shape parameters \(\alpha\) and \(\beta\) satisfy:

\[ \begin{align}\begin{aligned}p &= \mathbb{E}[y_t] = \frac{\alpha}{\alpha + \beta}\\k &= \alpha + \beta\\\implies \alpha &= p \cdot k\\\implies \beta &= (1 - p) \cdot k\end{aligned}\end{align} \]

For details, see “Estimation of parameters in the beta binomial model”, Tripathi et al., J Ann Inst Statist Math 46(2): 317–331, 1994 (DOI: 10.1007/BF01720588).

The observation model parameters comprise:

  • The background case fraction \(bg_\mathrm{obs}\);
  • The variance in the background case fraction \(bg_\mathrm{var}\) (the minimum variance);
  • The slope \(\kappa_\mathrm{obs}\) of the relationship between disease incidence and the proportion of cases; and
  • The dispersion parameter \(k\), which controls the relationship between the mean \((\mathbb{E}[y_t])\) and the variance; as \(k \to \infty\) the distribution approaches the binomial, as \(k \to 0\) the distribution becomes more and more over-dispersed with respect to the binomial.
Probability density functions for the SampleCounts observation model.

Probability mass functions for the SampleCounts observation model with different values of the dispersion parameter \(k\), for the expected value \(\mathbb{E}[c_t] = 200\) (vertical dashed line) where \(\mathbb{E}[y_t] = 0.02\) and \(N_t = 10,\!000\).

The PopnCounts class

class epifx.obs.PopnCounts(obs_unit, obs_period, pr_obs_ix=None)

Generic observation model for relating disease incidence to count data where the denominator is assumed or known to be the population size.

Parameters:
  • obs_unit – A descriptive name for the data.
  • obs_period – The observation period (in days).
  • pr_obs_ix – The index of the model parameter that defines the observation probability (\(p_\mathrm{obs}\)). By default, the value in the parameters dictionary is used.
expect(params, op, time, period, pr_inf, prev, curr)

Calculate the expected observation value \(\mathbb{E}[y_t]\) for every particle \(x_t\).

log_llhd(params, op, time, obs, pr_indiv, curr, hist)

Calculate the log-likelihood \(\mathcal{l}(y_t \mid x_t)\) for the observation \(y_t\) (obs) and every particle \(x_t\).

If it is known (or suspected) that the observed value will increase in the future — when obs['incomplete'] == True — then the log-likehood \(\mathcal{l}(y > y_t \mid x_t)\) is calculated instead (i.e., the log of the survival function).

If an upper bound to this increase is also known (or estimated) — when obs['upper_bound'] is defined — then the log-likelihood \(\mathcal{l}(y_u \ge y > y_t \mid x_t)\) is calculated instead.

The upper bound can also be treated as a point estimate by setting params['epifx']['upper_bound_as_obs'] = True — then the log-likelihood \(\mathcal{l}(y_u \mid x_t)\) is calculated.

llhd_in(op, time, mu, wt, y0, y1)

Return the probability mass in \([y_0, y1)\).

Parameters:
  • op – The observation model parameters dictionary.
  • time – The current simulation time, \(t\).
  • mu – The expected case fraction for each particle.
  • wt – The weight associated with each particle.
  • y0 – The (inclusive) minimum fraction of cases (\(y_0\)).
  • y1 – The (exclusive) maximum fraction of cases (\(y_0\)).
quantiles(op, time, mu, wt, probs)

Return the observations \(y_i\) that satisfy:

\[y_i = \inf\left\{ y \in \mathbb{N} : p_i \le \sum_i w_i \cdot \mathcal{L}(y_t \le y \mid x_t^i)\right\}\]
Parameters:
  • op – The observation model parameters dictionary.
  • time – The current simulation time, \(t\).
  • mu – The expected case fraction for each particle, \(\mathbb{E}(y_t)\).
  • wt – The weight associated with each particle, \(w_i\).
  • probs – The probabilities \(p_i\), which must be sorted in ascending order.
define_params(params, bg_obs, pr_obs, disp, bg_var=None)

Add observation model parameters to the simulation parameters.

Parameters:
  • bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
  • pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
  • disp – The dispersion parameter (\(k\)).
  • bg_var – The variance of the background signal (\(bg_\mathrm{var}\)).
from_file(filename, year=None, date_col='to', value_col='count', ub_col=None)

Load count data from a space-delimited text file with column headers defined in the first line.

Parameters:
  • filename – The file to read.
  • year – Only returns observations for a specific year. The default behaviour is to return all recorded observations.
  • date_col – The name of the observation date column.
  • value_col – The name of the observation value column.
  • ub_col – The name of the estimated upper-bound column, optional.
Returns:

A list of observations, ordered as per the original file.

The SampleCounts class

class epifx.obs.SampleCounts(obs_unit, obs_period, denom, k_obs_ix=None)

Generic observation model for relating disease incidence to count data where the denominator is reported.

Parameters:
  • obs_unit – A descriptive name for the data.
  • obs_period – The observation period (in days).
  • denom – The denominator to use when calculating likelihoods and quantiles in the absence of an actual observation.
  • k_obs_ix – The index of the model parameter that defines the disease-related increase in observation rate (\(\kappa_\mathrm{obs}\)). By default, the value in the parameters dictionary is used.
expect(params, op, time, period, pr_inf, prev, curr)

Calculate the expected observation value \(\mathbb{E}[y_t]\) for every particle \(x_t\).

log_llhd(params, op, time, obs, pr_indiv, curr, hist)

Calculate the log-likelihood \(\mathcal{l}(y_t \mid x_t)\) for the observation \(y_t\) (obs) and every particle \(x_t\).

If it is known (or suspected) that the observed value will increase in the future — when obs['incomplete'] == True — then the log-likehood \(\mathcal{l}(y > y_t \mid x_t)\) is calculated instead (i.e., the log of the survival function).

If an upper bound to this increase is also known (or estimated) — when obs['upper_bound'] is defined — then the log-likelihood \(\mathcal{l}(y_u \ge y > y_t \mid x_t)\) is calculated instead.

The upper bound can also be treated as a point estimate by setting params['epifx']['upper_bound_as_obs'] = True — then the log-likelihood \(\mathcal{l}(y_u \mid x_t)\) is calculated.

llhd_in(op, time, mu, wt, y0, y1)

Return the probability mass over the interval \([y_0, y1)\).

Parameters:
  • op – The observation model parameters dictionary.
  • time – The current simulation time, \(t\).
  • mu – The expected fraction of all patients that are cases.
  • wt – The weight associated with each value of mu.
  • y0 – The (inclusive) minimum fraction of cases (\(y_0\)).
  • y1 – The (exclusive) maximum fraction of cases (\(y_0\)).
quantiles(op, time, mu, wt, probs)

Return the observations \(y_i\) that satisfy:

\[y_i = \inf\left\{ y \in \mathbb{N} : p_i \le \sum_i w_i \cdot \mathcal{L}(y_t \le y \mid x_t^i)\right\}\]
Parameters:
  • op – The observation model parameters dictionary.
  • time – The current simulation time, \(t\).
  • mu – The expected case fraction for each particle, \(\mathbb{E}(y_t)\).
  • wt – The weight associated with each particle, \(w_i\).
  • probs – The probabilities \(p_i\), which must be sorted in ascending order.
define_params(params, bg_obs, k_obs, disp, bg_var=None)

Add observation model parameters to the simulation parameters.

Parameters:
  • bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
  • k_obs – The increase in observation rate due to infected individuals (\(\kappa_\mathrm{obs}\)).
  • disp – The dispersion parameter (\(k\)).
  • bg_var – The variance of the background signal (\(bg_\mathrm{var}\)).
from_file(filename, year=None, date_col='to', value_col='cases', denom_col='patients')

Load count data from a space-delimited text file with column headers defined in the first line.

Note that returned observation values represent the fraction of patients that were counted as cases, not the absolute number of cases. The number of cases and the number of patients are recorded under the 'numerator' and 'denominator' keys, respectively.

Parameters:
  • filename – The file to read.
  • year – Only returns observations for a specific year. The default behaviour is to return all recorded observations.
  • date_col – The name of the observation date column.
  • value_col – The name of the observation value column (reported as absolute values, not fractions).
  • denom_col – The name of the observation denominator column.
Returns:

A list of observations, ordered as per the original file.

Raises:

ValueError – If a denominator or value is negative, or if the value exceeds the denominator.

Custom observation models

All observation models should derive from the following base class:

class epifx.Obs

The base class of observation models, which defines the minimal set of methods that are required.

expect(params, op, time, period, pr_inf, prev, curr)

Return the expected observation value \(\mathbb{E}[y_t]\) for every particle \(x_t\), at one or more times \(t\).

Parameters:
  • params – The observation model parameters.
  • op – The observation model parameters dictionary.
  • time – The simulation time(s), \(t\).
  • period – The duration of the observation period (in days).
  • pr_inf – The probability of an individual becoming infected during the observation period(s), \(p_\mathrm{inf}\).
  • prev – The state vectors at the start of the observation period(s), \(x_t\).
  • curr – The state vectors at the end of the observation period(s).
log_llhd(params, op, time, obs, pr_indiv, curr, hist)

Return the log-likelihood \(\mathcal{l}(y_t \mid x_t)\) for the observation \(y_t\) and every particle \(x_t\).

Parameters:
  • params – The observation model parameters.
  • op – The observation model parameters dictionary.
  • time – The current simulation time, \(t\).
  • obs – An observation for the current time-step, \(y_t\).
  • pr_indiv – The probabilities of an individual not being infected and being infected during the observation period, \([1 - p_\mathrm{inf}, p_\mathrm{inf}]\).
  • curr – The particle state vectors, \(x_t\).
  • hist – The particle state histories, indexed by observation period.
llhd_in(op, time, mu, wt, y0, y1)

Return the weighted likelihood that \(y_t \in [y_0, y_1)\):

\[\sum_i w_i \cdot \mathcal{L}(y_0 \le y_t < y_1 \mid x_t^i)\]
Parameters:
  • op – The observation model parameters dictionary.
  • time – The current simulation time, \(t\).
  • mu – The expected case fraction for each particle, \(\mathbb{E}(y_t)\).
  • wt – The weight associated with each particle, \(w_i\).
  • y0 – The (inclusive) minimum fraction of cases, \(y_0\).
  • y1 – The (exclusive) maximum fraction of cases, \(y_1\).
quantiles(op, time, mu, wt, probs)

Return the observations \(y_i\) that satisfy:

\[y_i = \inf\left\{ y \in \mathbb{N} : p_i \le \sum_i w_i \cdot \mathcal{L}(y_t \le y \mid x_t^i)\right\}\]
Parameters:
  • op – The observation model parameters dictionary.
  • time – The current simulation time, \(t\).
  • mu – The expected case fraction for each particle, \(\mathbb{E}(y_t)\).
  • wt – The weight associated with each particle, \(w_i\).
  • probs – The probabilities \(p_i\), which must be sorted in ascending order.

Reading observations from disk

The from_file() methods provided by the observation models listed above use read_table() to read specific columns from data files. This function is intended to be a flexible wrapper around numpy.loadtxt, and should be sufficient for reading almost any whitespace-delimited data file with named columns.

epifx.obs.read_table(filename, columns, date_fmt=None, comment='#')

Load data from a space-delimited ASCII text file with column headers defined in the first non-comment line.

Parameters:
  • filename – The file to read.
  • columns – The columns to read, represented as a list of (name, type) tuples; type should either be a built-in NumPy scalar type, or datetime.date or datetime.datetime (in which case string values will be automatically converted to datetime.datetime objects by datetime_conv()).
  • date_fmt – A format string for parsing date columns; see datetime_conv() for details and the default format string.
  • comment – Prefix string(s) that identify comment lines; either a single Unicode string or a tuple of Unicode strings.
Returns:

A NumPy structured array.

Raises:

ValueError – If the data file contains non-ASCII text (i.e., bytes greater than 127), because numpy.loadtxt cannot process non-ASCII data (e.g., see NumPy issues #3184, #4543, #4600, #4939).

Example:

The code below demonstrates how to read observations from a file that includes columns for the year, the observation date, the observed value, and free-text metadata (up to 20 characters in length).

import datetime
import numpy as np
import epifx.obs
columns = [('year', np.int32), ('date', datetime.datetime),
           ('count', np.int32), ('metadata', 'S20')]
df = epifx.obs.read_table('my-data-file.ssv', columns,
                          date_fmt='%Y-%m-%d')
epifx.obs.datetime_conv(text, fmt='%Y-%m-%d', encoding='utf-8')

Convert date strings to datetime.datetime instances. This is a convenience function intended for use with, e.g., numpy.loadtxt.

Parameters:
  • text – A string (bytes or Unicode) containing a date or date-time.
  • fmt – A format string that defines the textual representation. See the Python strptime documentation for details.
  • encoding – The byte string encoding (default is UTF-8).

Commands

The following commands are part of epifx:

  • epifx-template creates an example configuration, which serves as a starting point for defining data sources, observation models, and simulation settings.
  • epifx-locns lists the locations that are defined in the local configuration.
  • epifx-scan generates forecasts from retrospective data, over a range of parameter values for the observation model(s), so that optimal observation model settings can be identified.
  • epifx-summary summarises forecasting performance for a completed observation model scan.
  • epifx-forecast generates forecasts from live data (i.e., where the future observations are unknown).
  • epifx-replay will replay the observations from an existing set of forecasts, to produce a new set of forecasts (e.g., accounting for incomplete observations).
  • epifx-json converts forecasts into JSON files that can be plotted and viewed interactively in a web browser.
  • epifx-cmp compares simulation outputs between files, which is useful for ensuring that different systems produce identical results.

Local configuration

All of the epifx commands rely on the necessary simulation settings being defined in a user-defined module, epifxlocns. Run the follow command to generate a basic template for this module in the current directory:

epifx-template

The epifx-locns command lists the locations that are defined in this module:

epifx-locns

This module must define the following function:

local_settings(locn_id=None)

Return the simulation settings for the user-defined location uniquely identified by locn_id or, if locn_id is None, return a list of the user-defined locations.

The settings for each location should be returned as a dictionary that contains the following keys:

  • 'id': the unique identifier for this location (should not contain spaces or numbers).
  • 'name': a pretty-printed version of the location identifier (may contain spaces and numbers).
  • 'popn': the population size.
  • 'obs_model': an observation model instance.
  • 'obs_file': the name of the observations file.
  • 'obs_filter': a function use to remove outliers from the observations, if required (otherwise, this key is not required).
  • 'from_file_args': a dictionary of additional keyword arguments to pass to the from_file method of the observation model (e.g., 'value_col' for epifx.obs.PopnCounts.from_file()).
  • 'scan_years': a list of the years for which observation model scans should be performed.
  • 'scan': a dictionary that maps observation model parameter names to one or more values for that parameter, to be covered by observation model scans. Values may be defined as scalar values, lists, or dictionaries that map years to either scalar values or lists.
  • 'forecast': a dictionary that maps observation model parameter names to one or more values for that parameter, to be used when forecasting. Values may be defined as scalar values, lists, or dictionaries that map years to either scalar values or lists.
  • 'om_format': a dictionary that maps observation model parameter names to format specifiers, which are used to include parameter values in output file names.
  • 'om_name': a dictionary that maps observation model parameter names to strings, which are used to identify these parameters in output file names.
  • 'out_dir': the directory in which simulation outputs will be written.
  • 'json_dir': the directory in which JSON forecast files will be written.
  • 'tmp_dir': the directory in which temporary files will be stored when performing simulations.
  • 'get_params': a function that accepts this dictionary as an argument and returns the simulation parameters dictionary.
  • JSON plot settings:
    • 'obs_axis_lbl': the axis label for the observations.
    • 'obs_axis_prec': the decimal precision of axis ticks for the observations.
    • 'obs_datum_lbl': the label for individual observations.
    • 'obs_datum_prec': the decimal precision of individual observations.
  • 'extra_args': used to apply custom settings arguments for observation model scans and for forecasts:
    • 'start': a function that takes one argument, the season, and returns the start of the simulation period.
    • 'until': a function that takes one argument, the season, and returns the end of the simulation period.
    • 'live_fs_dates': a function that takes three arguments: the season, a list of all observations, and an (optional) initial forecasting date, and returns a list of dates for which live forecasts should be generated.
    • 'scan_fs_dates': a function that takes two arguments, the season and the list of observations for that season, and returns the dates for which retrospective forecasts should be generated.
    • 'make_summary': a function that takes the same arguments as epifx.summary.make() and returns a summary object.
epifx-template

Observation model scans

To run simulations for every combination of observation model parameters (as defined in the local configuration) use epifx-scan. In the example below, the simulations will be run in 16 parallel processes and will only be run against observations for the 2015 calendar year:

epifx-scan --spawn 16 --year 2015 location_name

Observation model performance

To summarise the forecasting performance for an observation model scan, use epifx-summary:

epifx-summary location_name

By default, this will also convert the best forecast for each calendar year into a JSON file, for interactive viewing in a web browser.

Live forecasting

To generate forecasts from live data, use epifx-forecast. In the example below, the most recent observation is deemed to be “incomplete” (i.e., an underestimate of the real value) and an upper bound on the actual value is provided:

epifx-forecast --incomplete 1 --upper-bound 200 location_name

By default, each forecast will also be converted into a JSON file, for interactive viewing in a web browser.

Replay forecasts

To generate forecasts against existing data, where a given observation may change over time (e.g., gradual accumulation of count data), use epifx-replay to run forecasts against the data as they were provided at each forecasting date. The observations and forecasting dates will be read from an existing JSON file, as produced by epifx-forecast or epifx-json:

epifx-replay prev-forecasts.json location_name

By default, the forecasting location specified in the JSON file will also be used to generate the new forecasts, but a different forecasting location can be specified (as illustrated above).

Incomplete observations can also be adjusted to have “perfect” upper bound estimates (i.e., corresponding to the observed values in the most recent data snapshot):

epifx-replay --perfect-upper-bounds prev-forecasts.json location_name

Note

If the final reported value for an observation date is smaller than the reported value in an earlier snapshot, the original observation will be left unchanged.

This may be changed in a future version of epifx, but will require changing the semantics (and probably the name) of the observation 'upper_bound' field and the observation model log-likelihood functions.

Interactive forecast plots

Both epifx-summary and epifx-forecast will produce JSON files by default, but epifx-json can be used to convert any set of forecast output files into a JSON file:

epifx-json --location location_name --output forecasts.json *.hdf5

Comparing simulation outputs

Output files can be compared for equality, which is useful for ensuring that different systems produce identical results.

epifx-cmp --help
epifx-cmp file1.hdf5 file2.hdf5

Note that simulation outputs have been observed to differ depending on the installed versions of the NumPy and SciPy packages, due to changes in the random number generator.

Worked Example

This page assumes that you have already installed the epifx package, and shows how to generate forecasts using an SEIR model and the epifx-forecast command.

Note

You can replace the SEIR model with any other model, as long as it provides the necessary methods.

Note

You can download the completed files at the bottom of this page:

This page also assumes that you have a time series that characterises disease activity. For example:

year date       cases
2017 2017-01-08 3
2017 2017-01-15 4
2017 2017-01-22 2
...

Major decisions

  1. Choose an infection model; here we will use the SEIR model provided by epifx.
  2. Choose an appropriate observation model; here we will use the PopnCounts observation model, which relates disease incidence to count data where the denominator is assumed or known to be the population size.
  3. Define the observation model parameters; in this case, they are:
    • bg_obs – The background signal in the data.
    • pr_obs – The probability of observing an infected individual.
    • disp – The dispersion parameter.
    • bg_var – The variance of the background signal.
  4. Define the simulation parameters, such as the population size and the number of particles.

Note

All of these settings must be defined in epifxlocns.py (located in the working directory) in order to use the epifx commands.

Defining the forecast settings

Each combination of infection model, observation model, and parameters, is referred to as a “location” and is identified by a unique name. The epifxlocns.py file will define all of these settings, for every location that you define.

Here is a minimal example that defines no locations:

"""Local settings for epidemic forecasting."""

def local_settings(locn_id=None):
    """Return the forecast settings for a particular location."""
    # This dictionary will contain all of the forecast settings.
    locations = {}

    # Ensure each location includes its own unique ID.
    for locn in locations:
        locations[locn]['id'] = locn

    if locn_id is None:
        # Return a list of forecast locations.
        return(locations.keys())
    elif locn_id in locations:
        # Return the settings for this location.
        return(locations[locn_id])
    else:
        raise ValueError("Invalid location '{}'".format(locn_id))

The rest of this page will demonstrate how to expand on the above template in order to run epidemic forecasts against a single data file.

Creating a new location

This is as simple as adding a new key to the locations dictionary, which defines the unique name for this location:

locations = {
    'some-city': {}
}

Note

This unique name should only contain alpha-numeric characters and hyphens. It should not contain any spaces.

As shown above, the settings for a forecast location are collected into a dictionary. In the example above, this dictionary is empty. We will now demonstrate how to define all of the necessary forecast settings for this location within this dictionary.

Basic settings

Each location should have a name — an arbitrary string that may contain spaces and will be used as a title for forecast outputs.

Other common settings (such as the output directory) and fixed model parameters (such as the population size) may also be defined here. The next section will demonstrate how these values will be added to the simulation parameters.

locations = {
    'some-city': {
        'name': 'Some City',
        'popn': 1000000,
        'out_dir': '.',
        'tmp_dir': './tmp',
        'get_params': get_locn_params,
    }
}

Note

Simulation parameters are not provided directly. Instead, a function that returns the appropriate parameters is stored under the 'get_params' key.

See the following section for an example of such a function.

Simulation parameters

Simulation parameters are defined by a separate function, which should also be contained in the file epifxlocns.py. This function takes one argument: the settings dictionary for the forecast location.

import epifx
import pypfilt

import errno
import os.path

def get_locn_params(locn_settings):
    """Return the parameters dictionary for a forecast location."""
    model = epifx.SEIR()
    time = pypfilt.Datetime()
    popn_size = locn_settings['popn']
    # The number of particles.
    px_count = 2000
    # The seed for the pseudo-random number generator.
    prng_seed = 3001

    params = epifx.default_params(px_count, model, time, popn_size, prng_seed)

    # Customise the default parameters as desired.
    params['resample']['threshold'] = 0.25
    params['resample']['regularisation'] = True
    params['epifx']['stoch'] = False

    # Set the output and temporary directories.
    out_dir = locn_settings['out_dir']
    tmp_dir = locn_settings['tmp_dir']
    for reqd_dir in [out_dir, tmp_dir]:
        if not os.path.isdir(reqd_dir):
            # Create the directory (and missing parents) with mode -rwxr-x---.
            try:
                os.makedirs(reqd_dir, mode=0o750)
            except OSError as e:
                # Potential race condition with multiple script instances.
                if e.errno != errno.EEXIST:
                    print("Warning: could not create {}".format(reqd_dir))
                    print(e)

    params['out_dir'] = out_dir
    params['tmp_dir'] = tmp_dir

    return params

Note

In this example, the population size is determined by inspecting the settings dictionary for the location: locn_settings['popn']. Note that this was defined in the previous section (“Basic settings”). The same is also true for the output and temporary directories.

Adjusting model priors

You probably also want to modify the default model priors.

def get_locn_params(locn_settings):
    """Return the parameters dictionary for a forecast location."""

    # Other lines omitted ...

    # Restrict R0 to values between 1.3 and 1.5.
    params['param_min'][4] = 1.3
    params['param_max'][4] = 1.5
    # Keep eta fixed at 1 (i.e., enforce homogeneous mixing).
    params['param_min'][7] = 1.0
    params['param_max'][7] = 1.0
    # Introduce the first infection after 12-16 weeks.
    params['param_min'][9] = 12 * 7
    params['param_max'][9] = 16 * 7
    # Update the model priors.
    params['prior'] = params['model'].priors(params)

    # Other lines omitted ...

    return params
Observation model

The observation model is also stored in the location settings:

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'obs_model': epifx.obs.PopnCounts('Weekly Cases', 7),
    }
}

Observations are read from an external file using the observation model’s from_file() method. The file name, and additional arguments for the from_file() method, are defined as follows:

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'obs_file': 'case-counts.ssv',
        'from_file_args': {
            'date_col': 'date',
            'value_col': 'cases',
        },
    }
}

Observation model parameters for the forecasts are defined in a dictionary stored under the 'forecast' key. Each parameter is identified by name, and values can take the form of (a) a single numeric value; (b) a list of numeric values; or (c) a dictionary that maps years to a single numerical value or to a list of numeric values.

All of these forms are shown in the example below:

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'forecast': {
            'bg_obs': {2016: [6, 8], 2017: 8},
            'bg_var': 6,
            'pr_obs': [0.01, 0.02],
            'disp': 100,
        },
    }
}

Finally, the format strings that are used to construct each output file name must be defined. This has two parts: the 'om_name' dictionary defines a string to use for each observation parameter name, and the 'om_format' dictionary defines the numeric format for each observation parameter value:

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'om_format': {
            'bg_obs': '02.0f',
            'bg_var': '02.0f',
            'pr_obs': '0.2f',
            'disp': '03.0f',
        },
        'om_name': {
            'bg_obs': 'bg',
            'bg_var': 'bgvar',
            'pr_obs': 'pr',
            'disp': 'disp',
        },
    }
}

Note

Given the above definitions, a forecast for all data up to 2017-01-08 and using an observation probability of 0.02 would produce the following output file:

some-city-2017-01-08-bg-08-bgvar-06-disp-100-pr-0.02.hdf5

Simulation period

The default simulation period for forecasts is March to October (inclusive). These defaults can be overridden in the 'extra_args' dictionary.

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'extra_args': {
            'start': get_start_date,
            'until': get_until_date,
        },
    }
}

Note that these dates are defined by separate functions, which should also be contained in the file epifxlocns.py. These functions takes one argument: the forecasting year.

import datetime

def get_start_date(year):
    """Begin forecasts on the first of January."""
    return datetime.datetime(year, 1, 1)

def get_until_date(year):
    """End forecasts on the last day of December."""
    return datetime.datetime(year, 12, 31)

Note

These functions must return datetime.datetime instances (i.e., not datetime.date instances) if you are using the pypfilt.Datetime time scale.

Forecasting dates

By default, the epifx-forecast command will generate a forecast for the most recent Sunday (we have typically used weeks ending on Sunday as our observation interval). This can be overridden in the 'extra_args' dictionary.

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'extra_args': {
            # Other lines omitted ...
            'live_fs_dates': get_live_fs_dates,
        },
    }
}

Note that the forecasting date(s) are defined by a separate function, which should also be contained in the file epifxlocns.py. This function takes three arguments:

  • The forecasting year;
  • The list of all available observations; and
  • The date from which to begin forecasting (may be None, meaning that only the most recent forecasting date should be returned).
def get_live_fs_dates(year, all_obs, fs_from):
    """Generate a forecast at the time of each observation."""
    fs_dates = sorted(o['date'] for o in all_obs)
    if fs_from is None:
        # Return only the most recent forecasting date.
        # Important: *must* be a list.
        return [fs_dates[-1]]
    else:
        return [d for d in fs_dates if d >= fs_from]

Note

This function must return a list of datetime.datetime instances (i.e., not datetime.date instances) if you are using the pypfilt.Datetime time scale.

Summary tables

You may wish to override the collection of summary tables that are generated for each forecast. This can be overridden in the 'extra_args' dictionary.

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'extra_args': {
            # Other lines omitted ...
            'make_summary': make_summary,
        },
    }
}

Note that this is defined by a separate function, which should also be contained in the file epifxlocns.py. This function must accept the same arguments as epifx.summary.make().

import epifx.summary
import pypfilt.summary

def make_summary(params, all_obs, **kwargs):
    """Return a summary object for the forecast simulations."""
    meta_builder = epifx.summary.Metadata()
    meta = meta_builder.build(params)

    # Create a summary object with no tables, can now choose what tables
    # to add. Alternatively, can call epifx.summary.make().
    summary = pypfilt.summary.HDF5(params, all_obs, meta=meta, **kwargs)

    peaks = epifx.summary.PeakMonitor()
    # The summary tables to add.
    tables = [
        pypfilt.summary.ModelCIs(),
        epifx.summary.ObsLikelihood(),
        # The following tables are needed for the interactive forecast plots.
        epifx.summary.PredictiveCIs(peaks),
        epifx.summary.PeakForecastCIs(peaks),
    ]
    # Also add a summary table for each set of observations.
    # These tables are needed for the interactive forecast plots.
    for obs_unit in params['obs']:
        tables.append(epifx.summary.Obs(obs_unit))

    summary.add_tables(*tables)

    return summary
Forecast plots

Both the epifx-forecast and epifx-summary commands will produce JSON files by default (in addition to the standard HDF5 output files). The results stored in these JSON files can then be plotted and viewed interactively in a web browser.

Here we define:

  • The directory that will contain the JSON forecast outputs ('json_dir');
  • The y-axis label ('obs_axis_lbl');
  • The decimal precision of the y-axis ticks ('obs_axis_prec');
  • The tool-tip label for individual observations ('obs_datum_lbl'); and
  • The tool-tip decimal precision ('obs_datum_prec').
locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'json_dir': './www/data',
        'obs_axis_lbl': 'Weekly Cases',
        'obs_axis_prec': 0,
        'obs_datum_lbl': 'cases/week',
        'obs_datum_prec': 0,
    }
}

Running the forecasts

First, ensure that the epifxlocns.py is correctly detected and the forecast location is identified by the epifx-locn command:

$ epifx-locn
some-city
$

Then use the epifx-forecast command to generate forecasts for all combinations of the observation model parameter values:

$ epifx-forecast --year 2017 --from 2017-03-10 some-city
INFO:epifx.cmd.forecast:Forecasting from 2017-03-12 00:00:00 ...

Visualising the forecasts

You can use the epifx-template command to generate the necessary files to view the interactive forecast plots in your browser.

$ epifx-template

You will also need to download the d3 JavaScript library.

$ wget https://github.com/d3/d3/releases/download/v3.5.17/d3.zip
$ unzip d3.zip
$ mv d3.min.js www/

Finally, edit www/index.html and add each of the forecast JSON files to the drop-down list, replacing the template values.

<div class="params">
  <form id="data-file-form">
    <label for="data_file">Forecast:</label>
    <select id="data_file">
      <option value="data/some-city-2017-bg-08-bgvar-06-disp-100-pr-0.01.json" selected>pr_obs = 0.01</option>
      <option value="data/some-city-2017-bg-08-bgvar-06-disp-100-pr-0.02.json">pr_obs = 0.02</option>
    </select>
  </form>
</div>

Then open www/index.html in your browser.

Retrospective forecasts

The epifx-scan command is similar to epifx-forecast, except that it generates forecasts from retrospective data. It is intended to explore broad ranges of observation model parameter values, so that appropriate values for these parameters can be identified.

Note

The same data file (stored under the 'obs_file' key) will be used for live forecasts (epifx-forecast) and for retrospective forecasts (epifx-scan).

Observation model parameters

Observation model parameters for the retrospective forecasts are defined in a dictionary stored under the 'scan' key. Each parameter is identified by name, and values can take the form of (a) a single numeric value; (b) a list of numeric values; or (c) a dictionary that maps years to a single numerical value or to a list of numeric values.

The list of years for which retrospective forecasts should be generated is stored under the 'scan_years' key.

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'scan': {
            'bg_obs': [6, 8],
            'bg_var': 6,
            'pr_obs': [0.005, 0.010, 0.015, 0.020, 0.025],
            'disp': [10, 100, 1000],
        },
        'scan_years': [2015, 2016],
    }
}
Retrospective forecasting dates

By default, the epifx-scan command will generate forecasts from mid-March until the observed epidemic peak in each year. This can be overridden in the 'extra_args' dictionary.

locations = {
    'some-city': {
        'name': 'Some City',
        # Other lines omitted ...
        'extra_args': {
            # Other lines omitted ...
            'scan_fs_dates': get_scan_fs_dates,
        },
    }
}

Note that the retrospective forecasting date(s) are defined by a separate function, which should also be contained in the file epifxlocns.py. This function takes only two arguments (i.e., not the same as the 'live_fs_dates' function):

  • The forecasting year; and
  • The list of all available observations.
def get_scan_fs_dates(year, all_obs):
    """Generate a forecast at the time of each observation."""
    return sorted(o['date'] for o in all_obs)

Note

This function must return a list of datetime.datetime instances (i.e., not datetime.date instances) if you are using the pypfilt.Datetime time scale.

Running the retrospective forecasts

Use the epifx-scan command to generate retrospective forecasts for all combinations of the observation model parameter values:

$ epifx-scan some-city

Download the complete files

Note

Run epifx-template to generate www/plot.js and www/style.css, as no modifications to these files are required.

Change Log

0.5.8 (2020-04-03)

  • Enhancement: epifx.obs.PopnCount now supports incomplete observations that comprise an incomplete count and an estimated upper bound for the true value.

0.5.7 (2020-03-28)

  • Enhancement: add an SEEIIR model, which has the same parameters as the SEIR model.

0.5.6 (2018-06-07)

  • Enhancement: observation upper bounds can now be treated as point estimates by setting params['epifx']['upper_bounds_as_obs'] = True.
  • Enhancement: epifx-replay can now define “perfect” upper bound estimates (corresponding to the observed values in the most recent data snapshot) by using the --perfect-upper-bounds argument.
  • Enhancement: epifx.cmd.run_in_parallel now returns the job completion status (Boolean value).
  • Test cases are now run against Python 3.6 rather than Python 3.5, since Debian Testing has migrated to Python 3.6.

0.5.5 (2017-10-26)

  • This release requires pypfilt >= 0.5.4.
  • Bug fix: correctly calculate the time of first infection relative to the true start of the simulation period (params['epoch']) rather than, e.g., the start of the forecasting run.
  • Bug fix: ignore duplicate table rows when generating JSON files.
  • Bug fix: additional data validation checks when generating JSON files.
  • Enhancement: add a new command (epifx-replay) that replays the observations from an existing set of forecasts, so that new forecasts can be generated while accounting for, e.g., incomplete observations.
  • Enhancement: add a worked example of generating forecasts. See the “Worked Example” page in the online documentation.

0.5.4 (2017-08-17)

  • Bug fix: when processing forecast files that only contain estimation runs, the epifx-json command will ignore forecast credible intervals prior to the date of the most recent observation.

0.5.3 (2017-08-16)

  • Bug fix: ensure epifx-json uses native strings for dtype field names.
  • Bug fix: remove invalid import statements that rely on an as-yet unreleased version of pypfilt.

0.5.2 (2017-08-16)

  • Enhancement: add a new monitor, epifx.summary.ThresholdMonitor, and a corresponding table, epifx.summary.ExceedThreshold, for detecting when the expected observations for each particle exceed a specific threshold. Note that this table is not included in the default summary object returned by epifx.summary.make.
  • Enhancement: the epifx-json command no longer aborts if a forecast file only contains forecast credible intervals (the forecasts table) but not the peak timing credible intervals (the peak_cints table).
  • Enhancement: the epifx-json command will generate output from estimation runs if a forecast file only contains an estimation run (which can occur, e.g., when directly using pypfilt.run to generate forecasts rather than using pypfilt.forecast).

0.5.1 (2017-06-15)

  • Bug fix: the accept-reject sampler now resets particle weights after each iteration. This only affects summary tables that require the weights to sum to unity, it has no effect on the particle selection.
  • Bug fix: the daily forcing signal now correctly returns datetime instances.
  • Bug fix: add a missing function argument when processing incomplete observations.
  • Enhancement: the epifx-forecast command now accepts a new argument, --all (or -a), which generates forecasts for all defined locations.

0.5.0 (2017-05-10)

  • This release requires pypfilt >= 0.5.1.
  • Breaking change: epifx.default_params makes fewer assumptions, and now takes more positional arguments.
  • Breaking change: the SEIR model is now defined in terms of intrinsic parameters (e.g., R0 rather than the daily force of infection) and the time of initial exposure is now just another model parameter.
  • Breaking change: record predictive CIs in the /data/forecasts table; by default, the median and the 50% and 95% credible intervals are recorded. The expected observation CIs are now stored in the /data/expected_obs table.
  • Breaking change: record observations in tables grouped by the observation unit; these tables are now located in HDF5 tables /data/obs/obs_unit.
  • Bug fix: ensure that epifx.summary.ObsLikelihood correctly encodes the observation source and units.
  • Bug fix: robustly handle near-zero probability mass.
  • Enhancement: arbitrary model priors are supported via the accept-reject sampler provided by the epifx.select module.
  • Enhancement: a suite of commands for performing retrospective observation model scans and live forecasts are provided by the epifx.cmd module. See the documentation for details.
  • Enhancement: an example template is provided (see the epifx.example module and the epifx-template command) that includes Australian Google Flu Trends data. Observations can be loaded with epifx.example.gft_obs.
  • Enhancement: epifx.summary.PeakMonitor now surveys the entire particle trajectory (including both the estimation and forecasting runs) so that peaks that occurred prior to the forecasting date are reported correctly.
  • Enhancement: the epifx.summary.ObsLikelihood table can now record the likelihood of arbitrary observations (i.e., observations that were not included in the filtering process).
  • Enhancement: the default summary tables provided by epifx.summary.make can be suppressed as needed.
  • Enhancement: custom simulation time scales are supported.
  • Enhancement: add quantile and probability mass sum functions to the observation models.
  • Enhancement: test cases for several modules are now provided in ./tests and can be run with tox.
  • Enhancement: document the release process and provide instructions for uploading packages to PyPI.

0.4.3 (2016-09-16)

  • This release requires pypfilt >= 0.4.3.
  • Breaking change: the epifx.obs.SampleCounts observation model now uses a Beta-binomial distribution rather than a Beta distribution. Parameter names and definitions have been changed accordingly.
  • Enhancement: consistently separate Unicode strings from bytes, and automatically convert NumPy field names into native strings.
  • Enhancement: add support for incomplete data for which there may or may not be an upper bound (whether known in fact or estimated).
  • Enhancement: record the likelihood of each observation according to each particle (see the epifx.summary.ObsLikelihood class).

0.4.2 (2016-06-16)

  • Breaking change: replace the observation models added in epifx 0.4.1 with observations models for:
    • Count data where the denominator is known or assumed to be the entire population (epifx.obs.PopnCounts); and
    • Count data where the denominator is reported and may vary, and where the background signal is a fixed proportion (epifx.obs.SampleCounts).

0.4.1 (2016-04-22)

  • Enhancement: provide generic negative binomial observation models for count data and for fractional/percentage data in epifx.obs.

0.4.0 (2016-04-22)

  • This release requires pypfilt >= 0.4.0.

  • Breaking change: models must define default parameter bounds by implementing the param_bounds method.

  • Breaking change: model expectation functions now receive the previous and current state vectors, in addition to the infection probability vector. This means that expectation functions will need to change from:

    expect(params, unit, period, pr_inf)
    

    to:

    expect(params, unit, period, pr_inf, prev, curr)
    
  • Enhancement: epifx.summary.make now passes additional keyword arguments to the pypfilt.summary.HDF5 constructor, allowing users to override default settings, such as first_day=False.

  • Bug fix: ensure that infection probabilities are strictly non-negative.

  • Bug fix: ensure that population invariants are enforced correctly.

  • Bug fix: correctly scale the daily seeding probability.

  • Add instructions for installing epifx in a virtual environment.

0.3.1 (2016-02-25)

  • Bug fix: prevent a runtime error with params['epifx']['one_prng'] = True by correctly retrieving the pypfilt PRNG (params['resample']['rnd']).

0.3.0 (2016-02-23)

  • This release requires pypfilt >= 0.3.0.
  • Provide each summary statistic as a separate class.
  • Inherit from the pypfilt simulation model base class.
  • Host the documentation at Read The Docs.

0.2.0 (2015-11-17)

  • Use an independent PRNG instance for model stochasticity, as distinct from the pypfilt PRNG instance (used for resampling). Note that this breaks backwards compatibility (in terms of producing identical outputs) and can be overridden by setting params['epifx']['one_prng'] = True.
  • This release requires pypfilt >= 0.2.0.

0.1.10 (2015-07-09)

  • Fix a bug where temporal forcing could cause a negative force of infection.

0.1.9 (2015-07-06)

  • Add support for temporal forcing, modulated by the new parameter sigma.

0.1.8 (2015-06-18)

  • Update the model parameter invariants for alpha, based on the priors for R0 and gamma.

0.1.7 (2015-06-18)

  • Sample R0 and calculate alpha, rather than sampling alpha directly.

0.1.6 (2015-06-08)

  • Avoid error messages if no logging handler is configured by the application.
  • Default to comparing only the simulation outputs and ignore the metadata; this can be overridden by the --meta-data (-m) option.
  • Build a universal wheel via python setup.py bdist_wheel, which supports both Python 2 and Python 3.
  • This release requires pypfilt >= 0.1.2.

0.1.5 (2015-06-04)

  • Record credible intervals for state variables (/data/state).

0.1.4 (2015-06-03)

  • Reduce the minimum latent period to half a day.
  • No longer require the simulation period to be included in the simulation parameters dictionary.
  • Hide stderr output from spawned processes when obtaining git metadata, since the error messages have no relevance to the user (they only serve to indicate that the working directory is not part of a git repository).

0.1.3 (2015-06-01)

  • Obtain git metadata from the working directory, if it is contained within a repository. This requires pypfilt >= 0.1.1.

    Note that sensible metadata will only be obtained if the working directory is not manipulated by program code (e.g., os.chdir).

0.1.2 (2015-06-01)

  • Record the enforced limits on model parameters, so that output files include sufficient information to independently produce identical results.

    The default limits on beta and gamma are now identical to the domain of their default priors (1 to 3 days).

  • Added options --verbose and --data-only to the cmp-output script.

  • Ignore the command line (/meta/sim/cmdline) when comparing output files.

0.1.1 (2015-05-29)

  • Added a script (cmp-output) that compares output files for identical simulation outputs and metadata.

0.1.0 (2015-05-29)

  • Initial release.

Contributing to epifx

As an open source project, epifx welcomes contributions of many forms.

Examples of contributions include:

  • Code patches
  • Documentation improvements
  • Bug reports and patch reviews

Licensing

All contributions should be licensed under the same terms as epifx (see README.rst).

Testing with tox

The epifx testing suite uses the pytest framework, and uses the tox automation tool to run the tests under Python 2 and Python 3. The test cases are contained in the ./tests directory.

To run all tests using all of the Python versions defined in tox.ini, run:

tox

The tox.ini contents are shown below, and include targets that check whether the documentation in ./doc builds correctly with Python 2 and with Python 3.

#
# Configuration file for tox, used to automate test activities.
#
# https://tox.readthedocs.io/en/latest/
#
# This configuration file defines four test environments:
#
#   py27-test: Run the test cases in ./tests/ using Python 2.7.
#   py36-test: Run the test cases in ./tests/ using Python 3.6.
#   py27-docs: Build the package documentation using Python 2.7.
#   py36-docs: Build the package documentation using Python 3.6.
#
# To perform each of these test activities, run:
#
#   tox
#
[tox]
envlist = py{27,36}-{test,docs}

#
# Define common settings.
#
# * Cache installed wheels to accelerate environment creation.
# * Ensure tests are run against the installed package.
# * Add test-specific package dependencies.
#
[base]
pkg = epifx
wheels = {homedir}/.cache/pip/wheels
pytest = {envbindir}/py.test --cov={envsitepackagesdir}/{[base]pkg} --capture=no
install_command=pip install -f {[base]wheels} {opts} {packages}
deps =
    wheel>=0.29
    pytest
    pytest-cov

#
# Define environment-specific settings.
#
# * The documentation builds are performed in the ./doc directory.
# * The documentation builds depend on Sphinx and associated packages.
# * The test cases depend on the testing packages defined in [base].
# * All environments depend on the most recent pypfilt version that has been
#   built locally with tox ({distshare}/pypfilt-*.zip).
#   See http://tox.readthedocs.io/en/latest/example/general.html for details.
# * Python 3.6 tests issue errors about comparing bytes and strings (-bb).
#
[testenv]
changedir =
    docs: doc
deps =
    {distshare}/pypfilt-*.zip
    test: {[base]deps}
    test: matplotlib>=1.5
    docs: sphinx>=1.4
    docs: sphinx-rtd-theme>=0.1.9
    docs: sphinxcontrib-inlinesyntaxhighlight>=0.2
commands =
    py27-test: {envpython} {[base]pytest} {posargs}
    py36-test: {envpython} -bb {[base]pytest} {posargs}
    docs: sphinx-build -W -b html -d {envtmpdir}/doctrees . {envtmpdir}/html

Release process

Feature development takes places on the “master” branch. Periodically, a release is created by increasing the version number and tagging the relevant commit with the new version number.

  • Check that the release passes all of the tests:

    tox
    
  • Update the version number according to the versioning scheme.

    • Update the version number in doc/conf.py. The full version must always be updated, the short (X.Y) version does not need to be updated if the version number is being increased from X.Y.Z to X.Y.Z+1.
    • Update the version number in epifx/version.py.
    • Update the version number in setup.py.
  • Describe the changes at the top of NEWS.rst under a heading of the form X.Y.Z (YYYY-MM-DD), which identifies the new version number and the date on which this version was released.

  • Commit these changes; set the commit message to Release epifx X.Y.Z.

    git add NEWS.rst doc/conf.py epifx/version.py setup.py
    git commit -m "Release epifx X.Y.Z"
    
  • Tag this commit X.Y.Z.

    git tag -a X.Y.Z -m "epifx X.Y.Z"
    
  • Push this commit and the new tag upstream.

    git push --follow-tags
    

Publishing to PyPI

These instructions are based on the Python Packaging User Guide.

Ensure that twine is installed:

pip install twine

Define the PyPI server(s) in .pypirc:

[distutils]
index-servers =
  pypi
  pypitest

Ensure that all uncommitted changes are stashed, or they will be packaged!

git stash

Build the wheel ./dist/epifx-X.Y.Z-py2.py3-none-any.whl:

python setup.py bdist_wheel

Upload this wheel to the PyPI test server, so that any problems can be identified and fixed:

twine upload -r pypitest dist/epifx-X.Y.Z-py2.py3-none-any.whl

Then upload this wheel to PyPI:

twine upload dist/epifx-X.Y.Z-py2.py3-none-any.whl