Epidemic forecasting in Python¶
Welcome to the epifx documentation. This package uses a bootstrap particle filter to generate influenza epidemic forecasts from mechanistic infection models.
Installation¶
You can install epifx using pip
:
# Install pypfilt (with plotting support) and epifx. pip install pypfilt[plot] epifx
See the pypfilt documentation for further details.
Getting Started¶
This page assumes that you have already installed the
epifx
package, and shows how to generate forecasts using an SEIR model.
Simulation parameters¶
Default values for the particle filter parameters are provided:
-
epifx.
default_params
(px_count, model, time, popn_size, prng_seed=None)¶ The default simulation parameters.
Parameters: - px_count – The number of particles.
- model – The infection model.
- time – The simulation time scale.
- popn_size – The population size.
- prng_seed – The seed for both the
epifx
andpypfilt
pseudo-random number generators.
Temporal forcing¶
The force of infection can be subject to temporal forcing (e.g., by climatic
factors such as absolute humidity, or social factors such as media coverage)
as mediated by the model parameter \(\sigma\) (see Infection models).
This requires the simulation parameters to include a function that maps
datetime.datetime
instances to scalar values:
-
epifx.
daily_forcing
(filename, date_fmt='%Y-%m-%d')¶ Return a temporal forcing look-up function, which should be stored in
params['epifx']['forcing']
in order to enable temporal forcing.Parameters: - filename – A file that contains two columns separated by whitespace, the column first being the date and the second being the force of the temporal modulation. Note that the first line of this file is assumed to contain column headings and will be ignored.
- date_fmt – The format in which dates are stored.
Infection models¶
-
class
epifx.
SEIR
¶ An SEIR compartment model for a single circulating influenza strain, under the assumption that recovered individuals are completely protected against reinfection.
\[\begin{split}\frac{dS}{dt} &= - \beta S^\eta I \\[0.5em] \frac{dE}{dt} &= \beta S^\eta I - \sigma E \\[0.5em] \frac{dI}{dt} &= \sigma E - \gamma I \\[0.5em] \frac{dR}{dt} &= \gamma I \\[0.5em] \beta &= R_0 \cdot \gamma\end{split}\]Parameter Meaning \(R_0\) Basic reproduction number \(\sigma\) Inverse of the incubation period (day -1) \(\gamma\) Inverse of the infectious period (day -1) \(\eta\) Inhomogeneous social mixing coefficient \(\alpha\) Temporal forcing coefficient The force of infection can be subject to temporal forcing \(F(t)\), as mediated by \(\alpha\):
\[\beta(t) = \beta \cdot \left[1 + \alpha \cdot F(t)\right]\]Note that this requires the forcing function \(F(t)\) to be defined in the simulation parameters (see the Temporal Forcing documentation).
Initialise the model instance.
-
init
(params, vec)¶ Initialise a state vector.
Parameters: - params – Simulation parameters.
- vec – An uninitialised state vector of correct dimensions (see
state_size()
).
-
priors
(params)¶ Return a dictionary of model parameter priors.
Parameters: params – Simulation parameters.
-
set_params
(vec)¶ Read the model parameters from the provided matrix.
-
load_params
(sample_file)¶ Load the model parameters from a file.
-
save_params
(vec, sample_file)¶ Save the model parameters to a file.
-
update
(params, step_date, dt, is_fs, prev, curr)¶ Perform a single time-step.
Parameters: - params – Simulation parameters.
- step_date – The date and time of the current time-step.
- dt – The time-step size (days).
- is_fs – Indicates whether this is a forecasting simulation.
- prev – The state before the time-step.
- curr – The state after the time-step (destructively updated).
-
-
class
epifx.
SEEIIR
(extra_params=None)¶ An SEEIIR compartment model for a single circulating influenza strain, under the assumption that recovered individuals are completely protected against reinfection.
\[\begin{split}\frac{dS}{dt} &= - \beta S^\eta (I_1 + I_2) \\[0.5em] \frac{dE_1}{dt} &= \beta S^\eta (I_1 + I_2) - 2 \sigma E_1 \\[0.5em] \frac{dE_2}{dt} &= 2 \sigma E_1 - 2 \sigma E_2 \\[0.5em] \frac{dI_1}{dt} &= 2 \sigma E_2 - 2 \gamma I_1 \\[0.5em] \frac{dI_2}{dt} &= 2 \gamma I_1 - 2 \gamma I_2 \\[0.5em] \frac{dR}{dt} &= 2 \gamma I_2 \\[0.5em] \beta &= R_0 \cdot \gamma\end{split}\]Parameter Meaning \(R_0\) Basic reproduction number \(\sigma\) Inverse of the incubation period (day -1) \(\gamma\) Inverse of the infectious period (day -1) \(\eta\) Inhomogeneous social mixing coefficient \(\alpha\) Temporal forcing coefficient The force of infection can be subject to temporal forcing \(F(t)\), as mediated by \(\alpha\):
\[\beta(t) = \beta \cdot \left[1 + \alpha \cdot F(t)\right]\]Note that this requires the forcing function \(F(t)\) to be defined in the simulation parameters (see the Temporal Forcing documentation).
-
init
(params, vec)¶ Initialise a state vector.
Parameters: - params – Simulation parameters.
- vec – An uninitialised state vector of correct dimensions (see
state_size()
).
-
priors
(params)¶ Return a dictionary of model parameter priors.
Parameters: params – Simulation parameters.
-
set_params
(vec)¶ Read the model parameters from the provided matrix.
-
load_params
(sample_file)¶ Load the model parameters from a file.
-
save_params
(vec, sample_file)¶ Save the model parameters to a file.
-
update
(params, step_date, dt, is_fs, prev, curr)¶ Perform a single time-step.
Parameters: - params – Simulation parameters.
- step_date – The date and time of the current time-step.
- dt – The time-step size (days).
- is_fs – Indicates whether this is a forecasting simulation.
- prev – The state before the time-step.
- curr – The state after the time-step (destructively updated).
-
Arbitrary model priors¶
In addition to sampling each model parameter independently, the
epifx.select
module provides support for sampling particles according to
arbitrary target distributions, using an accept-reject sampler.
-
epifx.select.
select
(params, start, end, proposal, target, seed, info=False)¶ Select particles according to a target distribution.
Parameters: - params – The simulation parameters (note: the parameter dictionary will be emptied once the particles have been selected).
- start – The start of the simulation period.
- end – The end of the simulation period.
- proposal – The proposal distribution.
- target – The target distribution.
- seed – The PRNG seed used for sampling and accepting particles.
- info – Whether to return additional information about the particles.
Returns: If
info
isFalse
, returns the initial state vector for each accepted particle. Ifinfo
isTrue
, returns a tuple that contains the initial state vectors, a boolean array that indicates which of the proposed particles were accepted, and the summary tables for all proposed particles.
# Save the accepted particles to disk.
vec = epifx.select.select(params, start, end, proposal, target, seed)
# Load the accepted particles for use in a simulation.
model = epifx.SEIR()
model.set_params(vec)
params = epifx.default_params(px_count, model, time, popn_size)
Any proposal distribution can be used with this sampler, including the default model prior:
-
class
epifx.select.
Proposal
¶ The base class for proposal particle distributions.
-
sample
(params, hist, prng)¶ Draw particle samples from the proposal distribution.
Parameters: - params – The simulation parameters.
- hist – The particle history matrix into which the samples should be written.
- prng – The PRNG instance to use for any random sampling.
-
-
class
epifx.select.
DefaultProposal
¶ A proposal distribution that independently samples each parameter from the prior distributions provided in the simulation parameters.
Any target distribution for which a probability density can be defined can be used with this sampler:
-
class
epifx.select.
Target
¶ The base class for target particle distributions.
-
prepare_summary
(summary)¶ Add the necessary tables to a summary object so that required summary statistics are recorded.
Parameters: summary – The summary object to which the required tables should be added.
-
logpdf
(output)¶ Return the log of the target probability density for each particle.
Parameters: output – The state object returned by pypfilt.run
; summary tables are located atoutput['summary'][table_name]
.
-
Two target distributions are provided by this module.
The TargetAny
distribution accepts all particles with equal
likelihood, for the case where the proposal distribution is identical to the
desired target distribution:
-
class
epifx.select.
TargetAny
¶ A distribution that accepts all proposals with equal likelihood.
The TargetPeakMVN
distribution is a multivariate normal distribution
for the peak timing and size, as defined by previously-observed peaks:
-
class
epifx.select.
TargetPeakMVN
(peak_sizes, peak_times)¶ A multivariate normal distribution for the peak timing and size.
Parameters: - peak_sizes – An array of previously-observed peak sizes.
- peak_time – An array of previously-observed peak times.
Observation models¶
The epifx.obs
module provides generic
observation models for count data with known or
unknown denominators, as well as functions for reading observations from disk
and a base class for custom observation models.
Each observation model must have a unique unit
, and is used to calculate
likelihoods for all observations that share this same unit
.
import epifx.obs
# Create the simulation parameters.
params = ...
# Create an observation model for weekly data (with a period of 7 days),
# that pertains to all observations whose unit is "obs_unit".
obs_model = epifx.obs.PopnCounts("obs_unit", obs_period=7)
# Define the observation model parameters.
obs_model.define_params(params, bg_obs=300, pr_obs=0.01, disp=100)
Forecast summaries¶
-
epifx.summary.
make
(params, all_obs, default=True, extra_tbls=None, pkgs=None, **kwargs)¶ A convenience function that collects all of the summary statistics defined in the
pypfilt.summary
andepifx.summary
modules.Parameters: - params – The simulation parameters.
- all_obs – A list of all observations.
- default – Whether to add all of the tables defined in the
pypfilt.summary
andepifx.summary
modules. - extra_tbls – A list of extra summary statistic tables to include.
- pkgs – A dictionary of python modules whose versions should be
recorded in the simulation metadata. By default, all of the modules
recorded by
pypfilt.summary.Metadata
are included, as is theepifx
package itself. - **kwargs – Extra arguments to pass to
pypfilt.summary.HDF5
.
For example:
from epifx.summary import make params = ... all_obs = ... stats = make(params, all_obs, first_day=True, only_fs=True)
-
class
epifx.summary.
Metadata
¶ Document the simulation parameters and system environment for a set of simulations. This extends the default
pypfilt
black-list to ignore parameters specific toepifx
.-
build
(params, pkgs=None)¶ Construct a metadata dictionary that documents the simulation parameters and system environment.
Parameters: - params – The simulation parameters.
- pkgs – A dictionary that maps package names to modules that
define appropriate
__version__
attributes, used to record the versions of additional relevant packages.
-
-
class
epifx.summary.
PrOutbreak
(name='pr_epi')¶ Record the daily outbreak probability, defined as the sum of the weights of all particles in which an outbreak has been seeded.
Parameters: name – the name of the table in the output file.
-
class
epifx.summary.
PeakMonitor
¶ Record epidemic peak forecasts, for use with other statistics.
-
peak_size
= None¶ A dictionary that maps observation systems to the size of each particle’s peak with respect to that system:
peak_size[(unit, period)]
.Note that this is only valid for tables to inspect in the
finished()
method, and not in theadd_rows()
method.
-
peak_date
= None¶ A dictionary that maps observation systems to the date of each particle’s peak with respect to that system:
peak_date[(unit, period)]
.Note that this is only valid for tables to inspect in the
finished()
method, and not in theadd_rows()
method.
-
peak_time
= None¶ A dictionary that maps observation systems to the time of each particle’s peak with respect to that system, measured in (fractional) days from the start of the forecasting period:
peak_time[(unit, period)]
.Note that this is only valid for tables to inspect in the
finished()
method, and not in theadd_rows()
method.
-
peak_weight
= None¶ A dictionary that maps observation systems to the weight of each particle at the time that its peak occurs:
peak_weight[(unit, period)]
.Note that this is only valid for tables to inspect in the
finished()
method, and not in theadd_rows()
method.
-
expected_obs
= None¶ The expected observation for each particle for the duration of the current simulation window.
Note that this is only valid for tables to inspect in each call to
add_rows()
, and not in a call tofinished()
.
-
days_to
(date)¶ Convert a date to the (fractional) number of days from the start of the forecasting period.
-
-
class
epifx.summary.
PeakForecastEnsembles
(peak_monitor, name='peak_ensemble', fs_only=True)¶ Record the weighted ensemble of peak size and time predictions for each forecasting simulation.
Parameters: - peak_monitor – an instance of
PeakMonitor
. - name – the name of the table in the output file.
- peak_monitor – an instance of
-
class
epifx.summary.
PeakForecastCIs
(peak_monitor, probs=None, name='peak_cints')¶ Record fixed-probability central credible intervals for the peak size and time predictions.
Parameters: - peak_monitor – an instance of
PeakMonitor
. - probs – an array of probabilities that define the size of each
central credible interval.
The default value is
numpy.uint8([0, 50, 90, 95, 99, 100])
. - name – the name of the table in the output file.
- peak_monitor – an instance of
-
class
epifx.summary.
PeakSizeAccuracy
(peak_monitor, name='peak_size_acc', toln=None)¶ Record the accuracy of the peak size predictions against multiple accuracy tolerances.
Parameters: - peak_monitor – an instance of
PeakMonitor
. - name – the name of the table in the output file.
- toln – The accuracy thresholds for peak size predictions, expressed
as percentages of the true size.
The default is
np.array([10, 20, 25, 33])
.
- peak_monitor – an instance of
-
class
epifx.summary.
PeakTimeAccuracy
(peak_monitor, name='peak_time_acc', toln=None)¶ Record the accuracy of the peak time predictions against multiple accuracy tolerances.
Parameters: - peak_monitor – an instance of
PeakMonitor
. - name – the name of the table in the output file.
- toln – The accuracy thresholds for peak time predictions, expressed
as numbers of days. The default is
np.array([7, 10, 14])
.
- peak_monitor – an instance of
-
class
epifx.summary.
ExpectedObs
(peak_monitor, probs=None, name='expected_obs')¶ Record fixed-probability central credible intervals for the expected observations.
Parameters: - peak_monitor – an instance of
PeakMonitor
. - probs – an array of probabilities that define the size of each
central credible interval.
The default value is
numpy.uint8([0, 50, 90, 95, 99, 100])
. - name – the name of the table in the output file.
- peak_monitor – an instance of
-
class
epifx.summary.
PredictiveCIs
(peak_monitor, probs=None, name='forecasts')¶ Record fixed-probability central credible intervals for the observations.
Parameters: - peak_monitor – an instance of
PeakMonitor
. - probs – an array of probabilities that define the size of each
central credible interval.
The default value is
numpy.uint8([0, 50, 95])
. - name – the name of the table in the output file.
- peak_monitor – an instance of
-
class
epifx.summary.
ObsLikelihood
(name='obs_llhd', extra_obs=None)¶ Record the likelihood of each observation according to each particle. Note that this table registers its
record_obs_llhd
method in the parameter dictionary so that it can obtain the observation likelihoods.Parameters: - name – the name of the table in the output file.
- extra_obs – Observations that will not be used in the filtering process, but whose likelihoods are of interest.
-
class
epifx.summary.
ThresholdMonitor
(threshold)¶ Record when expected observations exceed a specific threshold.
Parameters: threshold – The threshold observation value. -
exceed_date
= None¶ A dictionary that maps observation systems to the date when each particle exceeded the specific threshold:
exceed_date[(unit, period)]
.Note that this is only valid for tables to inspect in the
finished()
method, and not in theadd_rows()
method.
-
exceed_weight
= None¶ A dictionary that maps observation systems to the final weight of each particle:
exceed_weight
.Note that this is only valid for tables to inspect in the
finished()
method, and not in theadd_rows()
method.
-
exceed_mask
= None¶ A dictionary that maps observation systems to Boolean arrays that indicate which particles have exceeded the threshold:
exceed_mask[(unit, period)]
.Note that this is only valid for tables to inspect in the
finished()
method, and not in theadd_rows()
method.
-
-
class
epifx.summary.
ExceedThreshold
(thresh_monitor, name='exceed_threshold', fs_only=True)¶ Record when expected observations exceed a specific threshold.
Parameters: - thresh_monitor – an instance of
ThresholdMonitor
. - name – the name of the table in the output file.
-
define_bins
(params, start, end, bin_width)¶ Divide the time scale into a finite number of bins.
This table will record the (weighted) proportion of particles that first exceeded the threshold in each of these bins. Note that the bins must be defined before this table can be used.
Parameters: - params – The simulation parameters.
- start – The time that marks the start of the first bin.
- end – The time that marks the end of the last bin.
- bin_width – The scalar bin width.
- thresh_monitor – an instance of
Generating forecasts¶
import datetime
import epifx
import epifx.obs
# Simulation parameters
num_px = 3600
model = epifx.SEIR()
time = pypfilt.Datetime()
popn = 4000000
prng_seed = 42
params = epifx.default_params(num_px, model, time, popn_size, prng_seed)
# Simulate from the 1st of May to the 31st of October, 2015.
start = datetime.datetime(2015, 5, 1)
until = start + datetime.timedelta(days=183)
# Load the relevant observations.
obs_list = ...
# Create an observation model for weekly data (with a period of 7 days).
obs_model = epifx.obs.PopnCounts("obs_unit", obs_period=7)
# Define the observation model parameters.
obs_model.define_params(params, bg_obs=300, pr_obs=0.01, disp=100)
# Generate weekly forecasts for the first 9 weeks (and the start date).
fs_dates = [start + datetime.timedelta(days=week * 7)
for week in range(10)]
# Summary statistics and output file.
summary = epifx.summary.make(params, obs_list))
out = "forecasts.hdf5"
# Generate the forecasts and save them to disk.
pypfilt.forecast(params, start, until, [obs_list], fs_dates, summary, out)
Comparing forecast outputs¶
Output files can be compared for equality, which is useful for ensuring that different systems produce identical results.
-
epifx.cmd.cmp.
files
(path1, path2, verbose=True, examine=None)¶ Compare two HDF5 files for identical simulation outputs.
Parameters: - path1 – The filename of the first HDF5 file.
- path2 – The filename of the second HDF5 file.
- verbose – Whether to report successful matches.
- examine – The data groups to examine for equality; the default is to
examine the simulation outputs (
'/data'
) and ignore the associated metadata ('/meta'
).
Returns: True
if the files contain identical simulation outputs, otherwiseFalse
.
This functionality is also provided as a command-line script.
Observation Models¶
Two observation models are provided, both of which related disease incidence in the simulation model to count data, but differ in how they treat the denominator.
Observations from the entire population¶
The PopnCounts
class provides a generic observation model
for relating disease incidence to count data where the denominator is assumed
to be the population size \(N\) (i.e., the denominator is assumed constant
and is either not known or is known to be the population size \(N\)).
This observation model assumes that the relationship between observed case counts \(y_t\) and disease incidence in particle \(x_t\) follow a negative binomial distribution with mean \(\mathbb{E}[y_t]\) and dispersion parameter \(k\).
\[ \begin{align}\begin{aligned}\mathcal{L}(y_t \mid x_t) &\sim NB(\mathbb{E}[y_t], k)\\\mathbb{E}[y_t] &= (1 - p_\mathrm{inf}) \cdot bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\\\operatorname{Var}[y_t] &= \mathbb{E}[y_t] + \frac{\left(\mathbb{E}[y_t]\right)^2}{k} \ge bg_\mathrm{var}\end{aligned}\end{align} \]
The observation model parameters comprise:
- The background observation rate \(bg_\mathrm{obs}\);
- The variance in the background signal \(bg_\mathrm{var}\) (the minimum variance);
- The probability of observing an infected individual \(p_\mathrm{obs}\); and
- The dispersion parameter \(k\), which controls the relationship between the mean \((\mathbb{E}[y_t])\) and the variance; as \(k \to \infty\) the distribution approaches the Poisson, as \(k \to 0\) the distribution becomes more and more over-dispersed with respect to the Poisson.

Probability mass functions for the PopnCounts
observation model with different values of the dispersion parameter
\(k\), for the expected value \(\mathbb{E}[y_t] = 200\) (vertical
dashed line).
Observations from population samples¶
The SampleCounts
class provides a generic observation
model for relating disease incidence to count data where the denominator is
reported and may vary (e.g., weekly counts of all patients and the number that
presented with influenza-like illness), and where the background signal is not
a fixed value but rather a fixed proportion.
For this observation model, each observed value \((y_t)\) is the fraction of all patients \((N_t)\) that were identified as cases \((c_t)\). This observation model assumes that the relationship between the observed fraction \(y_t\), the denominator \(N_t\), and disease incidence in particle \(x_t\) follows a Beta-binomial distribution with probability of success \(\mathbb{E}[y_t]\) and dispersion parameter \(k\).
\[ \begin{align}\begin{aligned}y_t &= \frac{c_t}{N_t}\\\mathcal{L}(y_t \mid x_t) &= \mathcal{L}(c_t \mid N_t, x_t)\\\mathbb{E}[y_t] &= bg_\mathrm{obs} + p_\mathrm{inf} \cdot \kappa_\mathrm{obs}\\\operatorname{Var}[y_t] &= \frac{p \cdot (1 - p)}{N_t} \cdot \left[ 1 + \frac{N_t - 1}{k + 1} \right] \ge bg_\mathrm{var}\\\mathcal{L}(c_t \mid N_t, x_t) &\sim \operatorname{BetaBin}(N_t, \mathbb{E}[y_t], k)\\\mathbb{E}[c_t] &= N_t \cdot \mathbb{E}[y_t]\\\operatorname{Var}[c_t] &= N_t \cdot p \cdot (1 - p) \cdot \left[ 1 + \frac{N_t - 1}{k + 1} \right]\end{aligned}\end{align} \]
The shape parameters \(\alpha\) and \(\beta\) satisfy:
\[ \begin{align}\begin{aligned}p &= \mathbb{E}[y_t] = \frac{\alpha}{\alpha + \beta}\\k &= \alpha + \beta\\\implies \alpha &= p \cdot k\\\implies \beta &= (1 - p) \cdot k\end{aligned}\end{align} \]
For details, see “Estimation of parameters in the beta binomial model”, Tripathi et al., J Ann Inst Statist Math 46(2): 317–331, 1994 (DOI: 10.1007/BF01720588).
The observation model parameters comprise:
- The background case fraction \(bg_\mathrm{obs}\);
- The variance in the background case fraction \(bg_\mathrm{var}\) (the minimum variance);
- The slope \(\kappa_\mathrm{obs}\) of the relationship between disease incidence and the proportion of cases; and
- The dispersion parameter \(k\), which controls the relationship between the mean \((\mathbb{E}[y_t])\) and the variance; as \(k \to \infty\) the distribution approaches the binomial, as \(k \to 0\) the distribution becomes more and more over-dispersed with respect to the binomial.

Probability mass functions for the SampleCounts
observation model with different values of the dispersion parameter
\(k\), for the expected value \(\mathbb{E}[c_t] = 200\) (vertical
dashed line) where \(\mathbb{E}[y_t] = 0.02\) and
\(N_t = 10,\!000\).
The PopnCounts class¶
-
class
epifx.obs.
PopnCounts
(obs_unit, obs_period, pr_obs_ix=None)¶ Generic observation model for relating disease incidence to count data where the denominator is assumed or known to be the population size.
Parameters: - obs_unit – A descriptive name for the data.
- obs_period – The observation period (in days).
- pr_obs_ix – The index of the model parameter that defines the observation probability (\(p_\mathrm{obs}\)). By default, the value in the parameters dictionary is used.
-
expect
(params, op, time, period, pr_inf, prev, curr)¶ Calculate the expected observation value \(\mathbb{E}[y_t]\) for every particle \(x_t\).
-
log_llhd
(params, op, time, obs, pr_indiv, curr, hist)¶ Calculate the log-likelihood \(\mathcal{l}(y_t \mid x_t)\) for the observation \(y_t\) (
obs
) and every particle \(x_t\).If it is known (or suspected) that the observed value will increase in the future — when
obs['incomplete'] == True
— then the log-likehood \(\mathcal{l}(y > y_t \mid x_t)\) is calculated instead (i.e., the log of the survival function).If an upper bound to this increase is also known (or estimated) — when
obs['upper_bound']
is defined — then the log-likelihood \(\mathcal{l}(y_u \ge y > y_t \mid x_t)\) is calculated instead.The upper bound can also be treated as a point estimate by setting
params['epifx']['upper_bound_as_obs'] = True
— then the log-likelihood \(\mathcal{l}(y_u \mid x_t)\) is calculated.
-
llhd_in
(op, time, mu, wt, y0, y1)¶ Return the probability mass in \([y_0, y1)\).
Parameters: - op – The observation model parameters dictionary.
- time – The current simulation time, \(t\).
- mu – The expected case fraction for each particle.
- wt – The weight associated with each particle.
- y0 – The (inclusive) minimum fraction of cases (\(y_0\)).
- y1 – The (exclusive) maximum fraction of cases (\(y_0\)).
-
quantiles
(op, time, mu, wt, probs)¶ Return the observations \(y_i\) that satisfy:
\[y_i = \inf\left\{ y \in \mathbb{N} : p_i \le \sum_i w_i \cdot \mathcal{L}(y_t \le y \mid x_t^i)\right\}\]Parameters: - op – The observation model parameters dictionary.
- time – The current simulation time, \(t\).
- mu – The expected case fraction for each particle, \(\mathbb{E}(y_t)\).
- wt – The weight associated with each particle, \(w_i\).
- probs – The probabilities \(p_i\), which must be sorted in ascending order.
-
define_params
(params, bg_obs, pr_obs, disp, bg_var=None)¶ Add observation model parameters to the simulation parameters.
Parameters: - bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
- pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
- disp – The dispersion parameter (\(k\)).
- bg_var – The variance of the background signal (\(bg_\mathrm{var}\)).
-
from_file
(filename, year=None, date_col='to', value_col='count')¶ Load count data from a space-delimited text file with column headers defined in the first line.
Parameters: - filename – The file to read.
- year – Only returns observations for a specific year. The default behaviour is to return all recorded observations.
- date_col – The name of the observation date column.
- value_col – The name of the observation value column.
Returns: A list of observations, ordered as per the original file.
The SampleCounts class¶
-
class
epifx.obs.
SampleCounts
(obs_unit, obs_period, denom, k_obs_ix=None)¶ Generic observation model for relating disease incidence to count data where the denominator is reported.
Parameters: - obs_unit – A descriptive name for the data.
- obs_period – The observation period (in days).
- denom – The denominator to use when calculating likelihoods and quantiles in the absence of an actual observation.
- k_obs_ix – The index of the model parameter that defines the disease-related increase in observation rate (\(\kappa_\mathrm{obs}\)). By default, the value in the parameters dictionary is used.
-
expect
(params, op, time, period, pr_inf, prev, curr)¶ Calculate the expected observation value \(\mathbb{E}[y_t]\) for every particle \(x_t\).
-
log_llhd
(params, op, time, obs, pr_indiv, curr, hist)¶ Calculate the log-likelihood \(\mathcal{l}(y_t \mid x_t)\) for the observation \(y_t\) (
obs
) and every particle \(x_t\).If it is known (or suspected) that the observed value will increase in the future — when
obs['incomplete'] == True
— then the log-likehood \(\mathcal{l}(y > y_t \mid x_t)\) is calculated instead (i.e., the log of the survival function).If an upper bound to this increase is also known (or estimated) — when
obs['upper_bound']
is defined — then the log-likelihood \(\mathcal{l}(y_u \ge y > y_t \mid x_t)\) is calculated instead.The upper bound can also be treated as a point estimate by setting
params['epifx']['upper_bound_as_obs'] = True
— then the log-likelihood \(\mathcal{l}(y_u \mid x_t)\) is calculated.
-
llhd_in
(op, time, mu, wt, y0, y1)¶ Return the probability mass over the interval \([y_0, y1)\).
Parameters: - op – The observation model parameters dictionary.
- time – The current simulation time, \(t\).
- mu – The expected fraction of all patients that are cases.
- wt – The weight associated with each value of
mu
. - y0 – The (inclusive) minimum fraction of cases (\(y_0\)).
- y1 – The (exclusive) maximum fraction of cases (\(y_0\)).
-
quantiles
(op, time, mu, wt, probs)¶ Return the observations \(y_i\) that satisfy:
\[y_i = \inf\left\{ y \in \mathbb{N} : p_i \le \sum_i w_i \cdot \mathcal{L}(y_t \le y \mid x_t^i)\right\}\]Parameters: - op – The observation model parameters dictionary.
- time – The current simulation time, \(t\).
- mu – The expected case fraction for each particle, \(\mathbb{E}(y_t)\).
- wt – The weight associated with each particle, \(w_i\).
- probs – The probabilities \(p_i\), which must be sorted in ascending order.
-
define_params
(params, bg_obs, k_obs, disp, bg_var=None)¶ Add observation model parameters to the simulation parameters.
Parameters: - bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
- k_obs – The increase in observation rate due to infected individuals (\(\kappa_\mathrm{obs}\)).
- disp – The dispersion parameter (\(k\)).
- bg_var – The variance of the background signal (\(bg_\mathrm{var}\)).
-
from_file
(filename, year=None, date_col='to', value_col='cases', denom_col='patients')¶ Load count data from a space-delimited text file with column headers defined in the first line.
Note that returned observation values represent the fraction of patients that were counted as cases, not the absolute number of cases. The number of cases and the number of patients are recorded under the
'numerator'
and'denominator'
keys, respectively.Parameters: - filename – The file to read.
- year – Only returns observations for a specific year. The default behaviour is to return all recorded observations.
- date_col – The name of the observation date column.
- value_col – The name of the observation value column (reported as absolute values, not fractions).
- denom_col – The name of the observation denominator column.
Returns: A list of observations, ordered as per the original file.
Raises: ValueError – If a denominator or value is negative, or if the value exceeds the denominator.
Custom observation models¶
All observation models should derive from the following base class:
-
class
epifx.
Obs
¶ The base class of observation models, which defines the minimal set of methods that are required.
-
expect
(params, op, time, period, pr_inf, prev, curr)¶ Return the expected observation value \(\mathbb{E}[y_t]\) for every particle \(x_t\), at one or more times \(t\).
Parameters: - params – The observation model parameters.
- op – The observation model parameters dictionary.
- time – The simulation time(s), \(t\).
- period – The duration of the observation period (in days).
- pr_inf – The probability of an individual becoming infected during the observation period(s), \(p_\mathrm{inf}\).
- prev – The state vectors at the start of the observation period(s), \(x_t\).
- curr – The state vectors at the end of the observation period(s).
-
log_llhd
(params, op, time, obs, pr_indiv, curr, hist)¶ Return the log-likelihood \(\mathcal{l}(y_t \mid x_t)\) for the observation \(y_t\) and every particle \(x_t\).
Parameters: - params – The observation model parameters.
- op – The observation model parameters dictionary.
- time – The current simulation time, \(t\).
- obs – An observation for the current time-step, \(y_t\).
- pr_indiv – The probabilities of an individual not being infected and being infected during the observation period, \([1 - p_\mathrm{inf}, p_\mathrm{inf}]\).
- curr – The particle state vectors, \(x_t\).
- hist – The particle state histories, indexed by observation period.
-
llhd_in
(op, time, mu, wt, y0, y1)¶ Return the weighted likelihood that \(y_t \in [y_0, y_1)\):
\[\sum_i w_i \cdot \mathcal{L}(y_0 \le y_t < y_1 \mid x_t^i)\]Parameters: - op – The observation model parameters dictionary.
- time – The current simulation time, \(t\).
- mu – The expected case fraction for each particle, \(\mathbb{E}(y_t)\).
- wt – The weight associated with each particle, \(w_i\).
- y0 – The (inclusive) minimum fraction of cases, \(y_0\).
- y1 – The (exclusive) maximum fraction of cases, \(y_1\).
-
quantiles
(op, time, mu, wt, probs)¶ Return the observations \(y_i\) that satisfy:
\[y_i = \inf\left\{ y \in \mathbb{N} : p_i \le \sum_i w_i \cdot \mathcal{L}(y_t \le y \mid x_t^i)\right\}\]Parameters: - op – The observation model parameters dictionary.
- time – The current simulation time, \(t\).
- mu – The expected case fraction for each particle, \(\mathbb{E}(y_t)\).
- wt – The weight associated with each particle, \(w_i\).
- probs – The probabilities \(p_i\), which must be sorted in ascending order.
-
Reading observations from disk¶
The from_file()
methods provided by the observation models listed above
use read_table()
to read specific columns from data files.
This function is intended to be a flexible wrapper around numpy.loadtxt
,
and should be sufficient for reading almost any whitespace-delimited data file
with named columns.
-
epifx.obs.
read_table
(filename, columns, date_fmt=None, comment='#')¶ Load data from a space-delimited ASCII text file with column headers defined in the first non-comment line.
Parameters: - filename – The file to read.
- columns – The columns to read, represented as a list of
(name, type)
tuples;type
should either be a built-in NumPy scalar type, ordatetime.date
ordatetime.datetime
(in which case string values will be automatically converted todatetime.datetime
objects bydatetime_conv()
). - date_fmt – A format string for parsing date columns; see
datetime_conv()
for details and the default format string. - comment – Prefix string(s) that identify comment lines; either a single Unicode string or a tuple of Unicode strings.
Returns: A NumPy structured array.
Raises: ValueError – If the data file contains non-ASCII text (i.e., bytes greater than 127), because
numpy.loadtxt
cannot process non-ASCII data (e.g., see NumPy issues #3184, #4543, #4600, #4939).Example: The code below demonstrates how to read observations from a file that includes columns for the year, the observation date, the observed value, and free-text metadata (up to 20 characters in length).
import datetime import numpy as np import epifx.obs columns = [('year', np.int32), ('date', datetime.datetime), ('count', np.int32), ('metadata', 'S20')] df = epifx.obs.read_table('my-data-file.ssv', columns, date_fmt='%Y-%m-%d')
-
epifx.obs.
datetime_conv
(text, fmt='%Y-%m-%d', encoding='utf-8')¶ Convert date strings to datetime.datetime instances. This is a convenience function intended for use with, e.g.,
numpy.loadtxt
.Parameters: - text – A string (bytes or Unicode) containing a date or date-time.
- fmt – A format string that defines the textual representation.
See the Python
strptime
documentation for details. - encoding – The byte string encoding (default is UTF-8).
Commands¶
The following commands are part of epifx:
epifx-template
creates an example configuration, which serves as a starting point for defining data sources, observation models, and simulation settings.epifx-locns
lists the locations that are defined in the local configuration.epifx-scan
generates forecasts from retrospective data, over a range of parameter values for the observation model(s), so that optimal observation model settings can be identified.epifx-summary
summarises forecasting performance for a completed observation model scan.epifx-forecast
generates forecasts from live data (i.e., where the future observations are unknown).epifx-replay
will replay the observations from an existing set of forecasts, to produce a new set of forecasts (e.g., accounting for incomplete observations).epifx-json
converts forecasts into JSON files that can be plotted and viewed interactively in a web browser.epifx-cmp
compares simulation outputs between files, which is useful for ensuring that different systems produce identical results.
Local configuration¶
All of the epifx commands rely on the necessary simulation settings being
defined in a user-defined module, epifxlocns
.
Run the follow command to generate a basic template for this module in the
current directory:
epifx-template
The epifx-locns
command lists the locations that are defined in
this module:
epifx-locns
This module must define the following function:
-
local_settings
(locn_id=None)¶ Return the simulation settings for the user-defined location uniquely identified by
locn_id
or, iflocn_id is None
, return a list of the user-defined locations.
The settings for each location should be returned as a dictionary that contains the following keys:
'id'
: the unique identifier for this location (should not contain spaces or numbers).'name'
: a pretty-printed version of the location identifier (may contain spaces and numbers).'popn'
: the population size.'obs_model'
: an observation model instance.'obs_file'
: the name of the observations file.'obs_filter'
: a function use to remove outliers from the observations, if required (otherwise, this key is not required).'from_file_args'
: a dictionary of additional keyword arguments to pass to thefrom_file
method of the observation model (e.g.,'value_col'
forepifx.obs.PopnCounts.from_file()
).'scan_years'
: a list of the years for which observation model scans should be performed.'scan'
: a dictionary that maps observation model parameter names to one or more values for that parameter, to be covered by observation model scans. Values may be defined as scalar values, lists, or dictionaries that map years to either scalar values or lists.'forecast'
: a dictionary that maps observation model parameter names to one or more values for that parameter, to be used when forecasting. Values may be defined as scalar values, lists, or dictionaries that map years to either scalar values or lists.'om_format'
: a dictionary that maps observation model parameter names to format specifiers, which are used to include parameter values in output file names.'om_name'
: a dictionary that maps observation model parameter names to strings, which are used to identify these parameters in output file names.'out_dir'
: the directory in which simulation outputs will be written.'json_dir'
: the directory in which JSON forecast files will be written.'tmp_dir'
: the directory in which temporary files will be stored when performing simulations.'get_params'
: a function that accepts this dictionary as an argument and returns the simulation parameters dictionary.- JSON plot settings:
'obs_axis_lbl'
: the axis label for the observations.'obs_axis_prec'
: the decimal precision of axis ticks for the observations.'obs_datum_lbl'
: the label for individual observations.'obs_datum_prec'
: the decimal precision of individual observations.
'extra_args'
: used to apply custom settings arguments for observation model scans and for forecasts:'start'
: a function that takes one argument, the season, and returns the start of the simulation period.'until'
: a function that takes one argument, the season, and returns the end of the simulation period.'live_fs_dates'
: a function that takes three arguments: the season, a list of all observations, and an (optional) initial forecasting date, and returns a list of dates for which live forecasts should be generated.'scan_fs_dates'
: a function that takes two arguments, the season and the list of observations for that season, and returns the dates for which retrospective forecasts should be generated.'make_summary'
: a function that takes the same arguments asepifx.summary.make()
and returns a summary object.
epifx-template
Observation model scans¶
To run simulations for every combination of observation model parameters (as
defined in the local configuration) use epifx-scan
.
In the example below, the simulations will be run in 16 parallel processes and
will only be run against observations for the 2015 calendar year:
epifx-scan --spawn 16 --year 2015 location_name
Observation model performance¶
To summarise the forecasting performance for an observation model scan, use
epifx-summary
:
epifx-summary location_name
By default, this will also convert the best forecast for each calendar year into a JSON file, for interactive viewing in a web browser.
Live forecasting¶
To generate forecasts from live data, use epifx-forecast
.
In the example below, the most recent observation is deemed to be “incomplete”
(i.e., an underestimate of the real value) and an upper bound on the actual
value is provided:
epifx-forecast --incomplete 1 --upper-bound 200 location_name
By default, each forecast will also be converted into a JSON file, for interactive viewing in a web browser.
Replay forecasts¶
To generate forecasts against existing data, where a given observation may
change over time (e.g., gradual accumulation of count data), use
epifx-replay
to run forecasts against the data as they were
provided at each forecasting date.
The observations and forecasting dates will be read from an existing JSON
file, as produced by epifx-forecast or
epifx-json:
epifx-replay prev-forecasts.json location_name
By default, the forecasting location specified in the JSON file will also be used to generate the new forecasts, but a different forecasting location can be specified (as illustrated above).
Incomplete observations can also be adjusted to have “perfect” upper bound estimates (i.e., corresponding to the observed values in the most recent data snapshot):
epifx-replay --perfect-upper-bounds prev-forecasts.json location_name
Note
If the final reported value for an observation date is smaller than the reported value in an earlier snapshot, the original observation will be left unchanged.
This may be changed in a future version of epifx, but will require
changing the semantics (and probably the name) of the observation
'upper_bound'
field and the observation model log-likelihood functions.
Interactive forecast plots¶
Both epifx-summary
and epifx-forecast
will produce JSON
files by default, but epifx-json
can be used to convert any set
of forecast output files into a JSON file:
epifx-json --location location_name --output forecasts.json *.hdf5
Comparing simulation outputs¶
Output files can be compared for equality, which is useful for ensuring that different systems produce identical results.
epifx-cmp --help
epifx-cmp file1.hdf5 file2.hdf5
Note that simulation outputs have been observed to differ depending on the installed versions of the NumPy and SciPy packages, due to changes in the random number generator.
Worked Example¶
This page assumes that you have already installed the
epifx
package, and shows how to generate forecasts using an SEIR model and
the epifx-forecast command.
Note
You can replace the SEIR model with any other model, as long as it provides the necessary methods.
Note
You can download the completed files at the bottom of this page:
This page also assumes that you have a time series that characterises disease activity. For example:
year date cases
2017 2017-01-08 3
2017 2017-01-15 4
2017 2017-01-22 2
...
Major decisions¶
- Choose an infection model; here we will use the
SEIR model provided by
epifx
. - Choose an appropriate observation model; here we will
use the
PopnCounts
observation model, which relates disease incidence to count data where the denominator is assumed or known to be the population size. - Define the observation model parameters; in this case, they are:
- bg_obs – The background signal in the data.
- pr_obs – The probability of observing an infected individual.
- disp – The dispersion parameter.
- bg_var – The variance of the background signal.
- Define the simulation parameters, such as the population size and the number of particles.
Note
All of these settings must be defined in epifxlocns.py
(located in the
working directory) in order to use the epifx
commands.
Defining the forecast settings¶
Each combination of infection model, observation model, and parameters, is
referred to as a “location” and is identified by a unique name.
The epifxlocns.py
file will define all of these settings, for every
location that you define.
Here is a minimal example that defines no locations:
"""Local settings for epidemic forecasting."""
def local_settings(locn_id=None):
"""Return the forecast settings for a particular location."""
# This dictionary will contain all of the forecast settings.
locations = {}
# Ensure each location includes its own unique ID.
for locn in locations:
locations[locn]['id'] = locn
if locn_id is None:
# Return a list of forecast locations.
return(locations.keys())
elif locn_id in locations:
# Return the settings for this location.
return(locations[locn_id])
else:
raise ValueError("Invalid location '{}'".format(locn_id))
The rest of this page will demonstrate how to expand on the above template in order to run epidemic forecasts against a single data file.
Creating a new location¶
This is as simple as adding a new key to the locations
dictionary, which
defines the unique name for this location:
locations = {
'some-city': {}
}
Note
This unique name should only contain alpha-numeric characters and hyphens. It should not contain any spaces.
As shown above, the settings for a forecast location are collected into a dictionary. In the example above, this dictionary is empty. We will now demonstrate how to define all of the necessary forecast settings for this location within this dictionary.
Basic settings¶
Each location should have a name — an arbitrary string that may contain spaces and will be used as a title for forecast outputs.
Other common settings (such as the output directory) and fixed model parameters (such as the population size) may also be defined here. The next section will demonstrate how these values will be added to the simulation parameters.
locations = {
'some-city': {
'name': 'Some City',
'popn': 1000000,
'out_dir': '.',
'tmp_dir': './tmp',
'get_params': get_locn_params,
}
}
Note
Simulation parameters are not provided directly.
Instead, a function that returns the appropriate parameters is stored under
the 'get_params'
key.
See the following section for an example of such a function.
Simulation parameters¶
Simulation parameters are defined by a separate function, which should also be
contained in the file epifxlocns.py
.
This function takes one argument: the settings dictionary for the forecast
location.
import epifx
import pypfilt
import errno
import os.path
def get_locn_params(locn_settings):
"""Return the parameters dictionary for a forecast location."""
model = epifx.SEIR()
time = pypfilt.Datetime()
popn_size = locn_settings['popn']
# The number of particles.
px_count = 2000
# The seed for the pseudo-random number generator.
prng_seed = 3001
params = epifx.default_params(px_count, model, time, popn_size, prng_seed)
# Customise the default parameters as desired.
params['resample']['threshold'] = 0.25
params['resample']['regularisation'] = True
params['epifx']['stoch'] = False
# Set the output and temporary directories.
out_dir = locn_settings['out_dir']
tmp_dir = locn_settings['tmp_dir']
for reqd_dir in [out_dir, tmp_dir]:
if not os.path.isdir(reqd_dir):
# Create the directory (and missing parents) with mode -rwxr-x---.
try:
os.makedirs(reqd_dir, mode=0o750)
except OSError as e:
# Potential race condition with multiple script instances.
if e.errno != errno.EEXIST:
print("Warning: could not create {}".format(reqd_dir))
print(e)
params['out_dir'] = out_dir
params['tmp_dir'] = tmp_dir
return params
Note
In this example, the population size is determined by inspecting the
settings dictionary for the location: locn_settings['popn']
.
Note that this was defined in the previous section (“Basic settings”).
The same is also true for the output and temporary directories.
Adjusting model priors¶
You probably also want to modify the default model priors.
def get_locn_params(locn_settings):
"""Return the parameters dictionary for a forecast location."""
# Other lines omitted ...
# Restrict R0 to values between 1.3 and 1.5.
params['param_min'][4] = 1.3
params['param_max'][4] = 1.5
# Keep eta fixed at 1 (i.e., enforce homogeneous mixing).
params['param_min'][7] = 1.0
params['param_max'][7] = 1.0
# Introduce the first infection after 12-16 weeks.
params['param_min'][9] = 12 * 7
params['param_max'][9] = 16 * 7
# Update the model priors.
params['prior'] = params['model'].priors(params)
# Other lines omitted ...
return params
Observation model¶
The observation model is also stored in the location settings:
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'obs_model': epifx.obs.PopnCounts('Weekly Cases', 7),
}
}
Observations are read from an external file using the observation model’s
from_file()
method.
The file name, and additional arguments for the from_file()
method, are
defined as follows:
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'obs_file': 'case-counts.ssv',
'from_file_args': {
'date_col': 'date',
'value_col': 'cases',
},
}
}
Observation model parameters for the forecasts are defined in a dictionary
stored under the 'forecast'
key.
Each parameter is identified by name, and values can take the form of (a) a
single numeric value; (b) a list of numeric values; or (c) a dictionary that
maps years to a single numerical value or to a list of numeric values.
All of these forms are shown in the example below:
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'forecast': {
'bg_obs': {2016: [6, 8], 2017: 8},
'bg_var': 6,
'pr_obs': [0.01, 0.02],
'disp': 100,
},
}
}
Finally, the format strings that are used to construct each output file name
must be defined.
This has two parts: the 'om_name'
dictionary defines a string to use for
each observation parameter name, and the 'om_format'
dictionary
defines the numeric format for each observation parameter value:
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'om_format': {
'bg_obs': '02.0f',
'bg_var': '02.0f',
'pr_obs': '0.2f',
'disp': '03.0f',
},
'om_name': {
'bg_obs': 'bg',
'bg_var': 'bgvar',
'pr_obs': 'pr',
'disp': 'disp',
},
}
}
Note
Given the above definitions, a forecast for all data up to 2017-01-08 and
using an observation probability of 0.02
would produce the following
output file:
some-city-2017-01-08-bg-08-bgvar-06-disp-100-pr-0.02.hdf5
Simulation period¶
The default simulation period for forecasts is March to October
(inclusive).
These defaults can be overridden in the 'extra_args'
dictionary.
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'extra_args': {
'start': get_start_date,
'until': get_until_date,
},
}
}
Note that these dates are defined by separate functions, which should also be
contained in the file epifxlocns.py
.
These functions takes one argument: the forecasting year.
import datetime
def get_start_date(year):
"""Begin forecasts on the first of January."""
return datetime.datetime(year, 1, 1)
def get_until_date(year):
"""End forecasts on the last day of December."""
return datetime.datetime(year, 12, 31)
Note
These functions must return datetime.datetime
instances (i.e.,
not datetime.date
instances) if you are using the
pypfilt.Datetime
time scale.
Forecasting dates¶
By default, the epifx-forecast
command will generate a forecast for the
most recent Sunday (we have typically used weeks ending on Sunday as our
observation interval).
This can be overridden in the 'extra_args'
dictionary.
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'extra_args': {
# Other lines omitted ...
'live_fs_dates': get_live_fs_dates,
},
}
}
Note that the forecasting date(s) are defined by a separate function, which
should also be contained in the file epifxlocns.py
.
This function takes three arguments:
- The forecasting year;
- The list of all available observations; and
- The date from which to begin forecasting (may be
None
, meaning that only the most recent forecasting date should be returned).
def get_live_fs_dates(year, all_obs, fs_from):
"""Generate a forecast at the time of each observation."""
fs_dates = sorted(o['date'] for o in all_obs)
if fs_from is None:
# Return only the most recent forecasting date.
# Important: *must* be a list.
return [fs_dates[-1]]
else:
return [d for d in fs_dates if d >= fs_from]
Note
This function must return a list of datetime.datetime
instances
(i.e., not datetime.date
instances) if you are using the
pypfilt.Datetime
time scale.
Summary tables¶
You may wish to override the collection of summary tables that are generated
for each forecast.
This can be overridden in the 'extra_args'
dictionary.
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'extra_args': {
# Other lines omitted ...
'make_summary': make_summary,
},
}
}
Note that this is defined by a separate function, which should also be
contained in the file epifxlocns.py
.
This function must accept the same arguments as epifx.summary.make()
.
import epifx.summary
import pypfilt.summary
def make_summary(params, all_obs, **kwargs):
"""Return a summary object for the forecast simulations."""
meta_builder = epifx.summary.Metadata()
meta = meta_builder.build(params)
# Create a summary object with no tables, can now choose what tables
# to add. Alternatively, can call epifx.summary.make().
summary = pypfilt.summary.HDF5(params, all_obs, meta=meta, **kwargs)
peaks = epifx.summary.PeakMonitor()
# The summary tables to add.
tables = [
pypfilt.summary.ModelCIs(),
epifx.summary.ObsLikelihood(),
# The following tables are needed for the interactive forecast plots.
epifx.summary.PredictiveCIs(peaks),
epifx.summary.PeakForecastCIs(peaks),
]
# Also add a summary table for each set of observations.
# These tables are needed for the interactive forecast plots.
for obs_unit in params['obs']:
tables.append(epifx.summary.Obs(obs_unit))
summary.add_tables(*tables)
return summary
Forecast plots¶
Both the epifx-forecast
and epifx-summary
commands will produce JSON
files by default (in addition to the standard HDF5 output files).
The results stored in these JSON files can then be plotted and viewed
interactively in a web browser.
Here we define:
- The directory that will contain the JSON forecast outputs (
'json_dir'
); - The y-axis label (
'obs_axis_lbl'
); - The decimal precision of the y-axis ticks (
'obs_axis_prec'
); - The tool-tip label for individual observations (
'obs_datum_lbl'
); and - The tool-tip decimal precision (
'obs_datum_prec'
).
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'json_dir': './www/data',
'obs_axis_lbl': 'Weekly Cases',
'obs_axis_prec': 0,
'obs_datum_lbl': 'cases/week',
'obs_datum_prec': 0,
}
}
Running the forecasts¶
First, ensure that the epifxlocns.py
is correctly detected and the
forecast location is identified by the epifx-locn
command:
$ epifx-locn
some-city
$
Then use the epifx-forecast
command to generate forecasts for all
combinations of the observation model parameter values:
$ epifx-forecast --year 2017 --from 2017-03-10 some-city
INFO:epifx.cmd.forecast:Forecasting from 2017-03-12 00:00:00 ...
Visualising the forecasts¶
You can use the epifx-template
command to generate the necessary files to
view the interactive forecast plots in your browser.
$ epifx-template
You will also need to download the d3
JavaScript library.
$ wget https://github.com/d3/d3/releases/download/v3.5.17/d3.zip
$ unzip d3.zip
$ mv d3.min.js www/
Finally, edit www/index.html
and add each of the forecast JSON files
to the drop-down list, replacing the template values.
<div class="params">
<form id="data-file-form">
<label for="data_file">Forecast:</label>
<select id="data_file">
<option value="data/some-city-2017-bg-08-bgvar-06-disp-100-pr-0.01.json" selected>pr_obs = 0.01</option>
<option value="data/some-city-2017-bg-08-bgvar-06-disp-100-pr-0.02.json">pr_obs = 0.02</option>
</select>
</form>
</div>
Then open www/index.html
in your browser.
Retrospective forecasts¶
The epifx-scan command is similar to epifx-forecast
, except
that it generates forecasts from retrospective data.
It is intended to explore broad ranges of observation model parameter values,
so that appropriate values for these parameters can be identified.
Note
The same data file (stored under the 'obs_file'
key) will be used for
live forecasts (epifx-forecast
) and for retrospective forecasts
(epifx-scan
).
Observation model parameters¶
Observation model parameters for the retrospective forecasts are defined in a
dictionary stored under the 'scan'
key.
Each parameter is identified by name, and values can take the form of (a) a
single numeric value; (b) a list of numeric values; or (c) a dictionary that
maps years to a single numerical value or to a list of numeric values.
The list of years for which retrospective forecasts should be generated is
stored under the 'scan_years'
key.
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'scan': {
'bg_obs': [6, 8],
'bg_var': 6,
'pr_obs': [0.005, 0.010, 0.015, 0.020, 0.025],
'disp': [10, 100, 1000],
},
'scan_years': [2015, 2016],
}
}
Retrospective forecasting dates¶
By default, the epifx-scan
command will generate forecasts from mid-March
until the observed epidemic peak in each year.
This can be overridden in the 'extra_args'
dictionary.
locations = {
'some-city': {
'name': 'Some City',
# Other lines omitted ...
'extra_args': {
# Other lines omitted ...
'scan_fs_dates': get_scan_fs_dates,
},
}
}
Note that the retrospective forecasting date(s) are defined by a separate
function, which should also be contained in the file epifxlocns.py
.
This function takes only two arguments (i.e., not the same as the
'live_fs_dates'
function):
- The forecasting year; and
- The list of all available observations.
def get_scan_fs_dates(year, all_obs):
"""Generate a forecast at the time of each observation."""
return sorted(o['date'] for o in all_obs)
Note
This function must return a list of datetime.datetime
instances
(i.e., not datetime.date
instances) if you are using the
pypfilt.Datetime
time scale.
Running the retrospective forecasts¶
Use the epifx-scan
command to generate retrospective forecasts for all
combinations of the observation model parameter values:
$ epifx-scan some-city
Download the complete files¶
epifxlocns.py
case-counts.ssv
www/index.html
- www/d3.js version 3.5.17.
Note
Run epifx-template
to generate www/plot.js
and
www/style.css
, as no modifications to these files are required.
Change Log¶
0.5.6 (2018-06-07)¶
- Enhancement: observation upper bounds can now be treated as point estimates
by setting
params['epifx']['upper_bounds_as_obs'] = True
. - Enhancement:
epifx-replay
can now define “perfect” upper bound estimates (corresponding to the observed values in the most recent data snapshot) by using the--perfect-upper-bounds
argument. - Enhancement:
epifx.cmd.run_in_parallel
now returns the job completion status (Boolean value). - Test cases are now run against Python 3.6 rather than Python 3.5, since Debian Testing has migrated to Python 3.6.
0.5.5 (2017-10-26)¶
- This release requires pypfilt >= 0.5.4.
- Bug fix: correctly calculate the time of first infection relative to the
true start of the simulation period (
params['epoch']
) rather than, e.g., the start of the forecasting run. - Bug fix: ignore duplicate table rows when generating JSON files.
- Bug fix: additional data validation checks when generating JSON files.
- Enhancement: add a new command (
epifx-replay
) that replays the observations from an existing set of forecasts, so that new forecasts can be generated while accounting for, e.g., incomplete observations. - Enhancement: add a worked example of generating forecasts. See the “Worked Example” page in the online documentation.
0.5.4 (2017-08-17)¶
- Bug fix: when processing forecast files that only contain estimation runs,
the
epifx-json
command will ignore forecast credible intervals prior to the date of the most recent observation.
0.5.3 (2017-08-16)¶
- Bug fix: ensure
epifx-json
uses native strings for dtype field names. - Bug fix: remove invalid import statements that rely on an as-yet unreleased
version of
pypfilt
.
0.5.2 (2017-08-16)¶
- Enhancement: add a new monitor,
epifx.summary.ThresholdMonitor
, and a corresponding table,epifx.summary.ExceedThreshold
, for detecting when the expected observations for each particle exceed a specific threshold. Note that this table is not included in the default summary object returned byepifx.summary.make
. - Enhancement: the
epifx-json
command no longer aborts if a forecast file only contains forecast credible intervals (theforecasts
table) but not the peak timing credible intervals (thepeak_cints
table). - Enhancement: the
epifx-json
command will generate output from estimation runs if a forecast file only contains an estimation run (which can occur, e.g., when directly usingpypfilt.run
to generate forecasts rather than usingpypfilt.forecast
).
0.5.1 (2017-06-15)¶
- Bug fix: the accept-reject sampler now resets particle weights after each iteration. This only affects summary tables that require the weights to sum to unity, it has no effect on the particle selection.
- Bug fix: the daily forcing signal now correctly returns datetime instances.
- Bug fix: add a missing function argument when processing incomplete observations.
- Enhancement: the
epifx-forecast
command now accepts a new argument,--all
(or-a
), which generates forecasts for all defined locations.
0.5.0 (2017-05-10)¶
- This release requires pypfilt >= 0.5.1.
- Breaking change:
epifx.default_params
makes fewer assumptions, and now takes more positional arguments. - Breaking change: the SEIR model is now defined in terms of intrinsic parameters (e.g., R0 rather than the daily force of infection) and the time of initial exposure is now just another model parameter.
- Breaking change: record predictive CIs in the
/data/forecasts
table; by default, the median and the 50% and 95% credible intervals are recorded. The expected observation CIs are now stored in the/data/expected_obs
table. - Breaking change: record observations in tables grouped by the observation
unit; these tables are now located in HDF5 tables
/data/obs/obs_unit
. - Bug fix: ensure that
epifx.summary.ObsLikelihood
correctly encodes the observation source and units. - Bug fix: robustly handle near-zero probability mass.
- Enhancement: arbitrary model priors are supported via the accept-reject
sampler provided by the
epifx.select
module. - Enhancement: a suite of commands for performing retrospective observation
model scans and live forecasts are provided by the
epifx.cmd
module. See the documentation for details. - Enhancement: an example template is provided (see the
epifx.example
module and theepifx-template
command) that includes Australian Google Flu Trends data. Observations can be loaded withepifx.example.gft_obs
. - Enhancement:
epifx.summary.PeakMonitor
now surveys the entire particle trajectory (including both the estimation and forecasting runs) so that peaks that occurred prior to the forecasting date are reported correctly. - Enhancement: the
epifx.summary.ObsLikelihood
table can now record the likelihood of arbitrary observations (i.e., observations that were not included in the filtering process). - Enhancement: the default summary tables provided by
epifx.summary.make
can be suppressed as needed. - Enhancement: custom simulation time scales are supported.
- Enhancement: add quantile and probability mass sum functions to the observation models.
- Enhancement: test cases for several modules are now provided in
./tests
and can be run with tox. - Enhancement: document the release process and provide instructions for uploading packages to PyPI.
0.4.3 (2016-09-16)¶
- This release requires pypfilt >= 0.4.3.
- Breaking change: the
epifx.obs.SampleCounts
observation model now uses a Beta-binomial distribution rather than a Beta distribution. Parameter names and definitions have been changed accordingly. - Enhancement: consistently separate Unicode strings from bytes, and automatically convert NumPy field names into native strings.
- Enhancement: add support for incomplete data for which there may or may not be an upper bound (whether known in fact or estimated).
- Enhancement: record the likelihood of each observation according to each
particle (see the
epifx.summary.ObsLikelihood
class).
0.4.2 (2016-06-16)¶
- Breaking change: replace the observation models added in epifx 0.4.1 with
observations models for:
- Count data where the denominator is known or assumed to be the entire
population (
epifx.obs.PopnCounts
); and - Count data where the denominator is reported and may vary, and where the
background signal is a fixed proportion (
epifx.obs.SampleCounts
).
- Count data where the denominator is known or assumed to be the entire
population (
0.4.1 (2016-04-22)¶
- Enhancement: provide generic negative binomial observation models for count
data and for fractional/percentage data in
epifx.obs
.
0.4.0 (2016-04-22)¶
This release requires pypfilt >= 0.4.0.
Breaking change: models must define default parameter bounds by implementing the
param_bounds
method.Breaking change: model expectation functions now receive the previous and current state vectors, in addition to the infection probability vector. This means that expectation functions will need to change from:
expect(params, unit, period, pr_inf)
to:
expect(params, unit, period, pr_inf, prev, curr)
Enhancement:
epifx.summary.make
now passes additional keyword arguments to thepypfilt.summary.HDF5
constructor, allowing users to override default settings, such asfirst_day=False
.Bug fix: ensure that infection probabilities are strictly non-negative.
Bug fix: ensure that population invariants are enforced correctly.
Bug fix: correctly scale the daily seeding probability.
Add instructions for installing epifx in a virtual environment.
0.3.1 (2016-02-25)¶
- Bug fix: prevent a runtime error with
params['epifx']['one_prng'] = True
by correctly retrieving the pypfilt PRNG (params['resample']['rnd']
).
0.3.0 (2016-02-23)¶
- This release requires pypfilt >= 0.3.0.
- Provide each summary statistic as a separate class.
- Inherit from the pypfilt simulation model base class.
- Host the documentation at Read The Docs.
0.2.0 (2015-11-17)¶
- Use an independent PRNG instance for model stochasticity, as distinct from
the pypfilt PRNG instance (used for resampling). Note that this breaks
backwards compatibility (in terms of producing identical outputs) and can be
overridden by setting
params['epifx']['one_prng'] = True
. - This release requires pypfilt >= 0.2.0.
0.1.10 (2015-07-09)¶
- Fix a bug where temporal forcing could cause a negative force of infection.
0.1.9 (2015-07-06)¶
- Add support for temporal forcing, modulated by the new parameter sigma.
0.1.8 (2015-06-18)¶
- Update the model parameter invariants for alpha, based on the priors for R0 and gamma.
0.1.7 (2015-06-18)¶
- Sample R0 and calculate alpha, rather than sampling alpha directly.
0.1.6 (2015-06-08)¶
- Avoid error messages if no logging handler is configured by the application.
- Default to comparing only the simulation outputs and ignore the metadata;
this can be overridden by the
--meta-data
(-m
) option. - Build a universal wheel via
python setup.py bdist_wheel
, which supports both Python 2 and Python 3. - This release requires pypfilt >= 0.1.2.
0.1.5 (2015-06-04)¶
- Record credible intervals for state variables (
/data/state
).
0.1.4 (2015-06-03)¶
- Reduce the minimum latent period to half a day.
- No longer require the simulation period to be included in the simulation parameters dictionary.
- Hide stderr output from spawned processes when obtaining git metadata, since the error messages have no relevance to the user (they only serve to indicate that the working directory is not part of a git repository).
0.1.3 (2015-06-01)¶
Obtain git metadata from the working directory, if it is contained within a repository. This requires pypfilt >= 0.1.1.
Note that sensible metadata will only be obtained if the working directory is not manipulated by program code (e.g.,
os.chdir
).
0.1.2 (2015-06-01)¶
Record the enforced limits on model parameters, so that output files include sufficient information to independently produce identical results.
The default limits on
beta
andgamma
are now identical to the domain of their default priors (1 to 3 days).Added options
--verbose
and--data-only
to thecmp-output
script.Ignore the command line (
/meta/sim/cmdline
) when comparing output files.
0.1.1 (2015-05-29)¶
- Added a script (
cmp-output
) that compares output files for identical simulation outputs and metadata.
0.1.0 (2015-05-29)¶
- Initial release.
Contributing to epifx¶
As an open source project, epifx welcomes contributions of many forms.
Examples of contributions include:
- Code patches
- Documentation improvements
- Bug reports and patch reviews
Testing with tox¶
The epifx testing suite uses the pytest
framework, and uses the tox automation tool
to run the tests under Python 2 and Python 3.
The test cases are contained in the ./tests
directory.
To run all tests using all of the Python versions defined in tox.ini
, run:
tox
The tox.ini
contents are shown below, and include targets that check
whether the documentation in ./doc
builds correctly with Python 2 and with
Python 3.
#
# Configuration file for tox, used to automate test activities.
#
# https://tox.readthedocs.io/en/latest/
#
# This configuration file defines four test environments:
#
# py27-test: Run the test cases in ./tests/ using Python 2.7.
# py36-test: Run the test cases in ./tests/ using Python 3.6.
# py27-docs: Build the package documentation using Python 2.7.
# py36-docs: Build the package documentation using Python 3.6.
#
# To perform each of these test activities, run:
#
# tox
#
[tox]
envlist = py{27,36}-{test,docs}
#
# Define common settings.
#
# * Cache installed wheels to accelerate environment creation.
# * Ensure tests are run against the installed package.
# * Add test-specific package dependencies.
#
[base]
pkg = epifx
wheels = {homedir}/.cache/pip/wheels
pytest = {envbindir}/py.test --cov={envsitepackagesdir}/{[base]pkg} --capture=no
install_command=pip install -f {[base]wheels} {opts} {packages}
deps =
wheel>=0.29
pytest
pytest-cov
#
# Define environment-specific settings.
#
# * The documentation builds are performed in the ./doc directory.
# * The documentation builds depend on Sphinx and associated packages.
# * The test cases depend on the testing packages defined in [base].
# * All environments depend on the most recent pypfilt version that has been
# built locally with tox ({distshare}/pypfilt-*.zip).
# See http://tox.readthedocs.io/en/latest/example/general.html for details.
# * Python 3.6 tests issue errors about comparing bytes and strings (-bb).
#
[testenv]
changedir =
docs: doc
deps =
{distshare}/pypfilt-*.zip
test: {[base]deps}
test: matplotlib>=1.5
docs: sphinx>=1.4
docs: sphinx-rtd-theme>=0.1.9
docs: sphinxcontrib-inlinesyntaxhighlight>=0.2
commands =
py27-test: {envpython} {[base]pytest} {posargs}
py36-test: {envpython} -bb {[base]pytest} {posargs}
docs: sphinx-build -W -b html -d {envtmpdir}/doctrees . {envtmpdir}/html
Release process¶
Feature development takes places on the “master” branch. Periodically, a release is created by increasing the version number and tagging the relevant commit with the new version number.
Check that the release passes all of the tests:
tox
Update the version number according to the versioning scheme.
- Update the version number in
doc/conf.py
. The full version must always be updated, the short (X.Y) version does not need to be updated if the version number is being increased from X.Y.Z to X.Y.Z+1. - Update the version number in
epifx/version.py
. - Update the version number in
setup.py
.
- Update the version number in
Describe the changes at the top of
NEWS.rst
under a heading of the formX.Y.Z (YYYY-MM-DD)
, which identifies the new version number and the date on which this version was released.Commit these changes; set the commit message to
Release epifx X.Y.Z
.git add NEWS.rst doc/conf.py epifx/version.py setup.py git commit -m "Release epifx X.Y.Z"
Tag this commit
X.Y.Z
.git tag -a X.Y.Z -m "epifx X.Y.Z"
Push this commit and the new tag upstream.
git push --follow-tags
Publishing to PyPI¶
These instructions are based on the Python Packaging User Guide.
Ensure that twine
is installed:
pip install twine
Define the PyPI server(s) in .pypirc
:
[distutils]
index-servers =
pypi
pypitest
Ensure that all uncommitted changes are stashed, or they will be packaged!
git stash
Build the wheel ./dist/epifx-X.Y.Z-py2.py3-none-any.whl
:
python setup.py bdist_wheel
Upload this wheel to the PyPI test server, so that any problems can be identified and fixed:
twine upload -r pypitest dist/epifx-X.Y.Z-py2.py3-none-any.whl
Then upload this wheel to PyPI:
twine upload dist/epifx-X.Y.Z-py2.py3-none-any.whl