Observation Models

A generic base class for negative binomial observation models is provided, which assumes that the model is parameterised by a background signal (\(bg_\mathrm{obs}\)), an observation probability (\(p_\mathrm{obs}\)), and a dispersion parameter (\(k\)).

class epifx.obs.NB(obs_unit, obs_period)

The base class for generic observation models that assume the relationship between observed numerical quantities \(y_t\) and disease incidence in particle \(x_t\) follows a negative binomial distribution with mean \(\mathbb{E}[x]\) and dispersion parameter \(k\).

\[\mathcal{L}(y_t \mid x_t) \sim NB(\mathbb{E}[x], k)\]
Parameters:
  • obs_unit – A descriptive name for the data.
  • obs_period – The observation period (in days).
define_params(params, bg_obs, pr_obs, disp)

Add observation model parameters to the simulation parameters.

Parameters:
  • bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
  • pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
  • disp – The dispersion parameter (\(k\)).
expect(params, op, period, pr_inf, prev, curr)

Determine the expected value for a given infection probability. The default implementation, for a population of size \(N\), is:

\[\mathbb{E}[x] = bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\]
Parameters:
  • params – The observation model parameters.
  • op – The observation model parameters.
  • period – The duration of the observation period (in days).
  • pr_inf – The probability of an individual becoming infected during the observation period (\(p_\mathrm{inf}\)).
  • prev – The state vectors at the start of the observation period.
  • curr – The state vectors at the end of the observation period.
log_llhd(params, op, obs, pr_indiv, curr, hist)

Calculate the log-likelihood of obtaining an observation from each particle.

Parameters:
  • params – The simulation parameters.
  • op – The observation model parameters.
  • obs – The observation that was made.
  • pr_indiv – A 2 x N matrix (for N particles) of the probability of an individual being uninfected and infected, respectively.
  • curr – The particle state vectors.
  • hist – The particle state histories, indexed by observation period.
Raises:

NotImplementedError – Derived classes must implement this method.

Two sub-classes are provided, which cater for count data and fractional data, respectively.

Count data

class epifx.obs.NBCounts(obs_unit, obs_period)

Generic negative binomial observation model for relating disease incidence to count data.

Parameters:
  • obs_unit – A descriptive name for the data.
  • obs_period – The observation period (in days).
from_file(filename, year=None, date_col='to', value_col='count')

Load count data from a space-delimited text file with column headers defined in the first line.

Parameters:
  • filename – The file to read.
  • year – Only returns observations for a specific year. The default behaviour is to return all recorded observations.
  • date_col – The name of the observation date column.
  • value_col – The name of the observation value column.
Returns:

A list of observations, ordered as per the original file.

log_llhd(params, op, obs, pr_indiv, curr, hist)

Calculate the log-likelihood of obtaining an observation from each particle.

Parameters:
  • params – The simulation parameters.
  • op – The observation model parameters.
  • obs – The observation that was made.
  • pr_indiv – A 2 x N matrix (for N particles) of the probability of an individual being uninfected and infected, respectively.
  • curr – The particle state vectors.
  • hist – The particle state histories, indexed by observation period.
Raises:

NotImplementedError – Derived classes must implement this method.

define_params(params, bg_obs, pr_obs, disp)

Add observation model parameters to the simulation parameters.

Parameters:
  • bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
  • pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
  • disp – The dispersion parameter (\(k\)).
expect(params, op, period, pr_inf, prev, curr)

Determine the expected value for a given infection probability. The default implementation, for a population of size \(N\), is:

\[\mathbb{E}[x] = bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\]
Parameters:
  • params – The observation model parameters.
  • op – The observation model parameters.
  • period – The duration of the observation period (in days).
  • pr_inf – The probability of an individual becoming infected during the observation period (\(p_\mathrm{inf}\)).
  • prev – The state vectors at the start of the observation period.
  • curr – The state vectors at the end of the observation period.

Fractional data

class epifx.obs.NBFractions(obs_unit, obs_period, percentages=False)

Generic negative binomial observation model for relating disease incidence to fractional and percentage data, where the denominator is known and reported.

Parameters:
  • obs_unit – A descriptive name for the data.
  • obs_period – The observation period (in days).
  • percentages – Indicates whether the data are fractional values (False, default) or percentages (True).
from_file(filename, year=None, date_col='to', value_col='value', denom_col='denominator')

Load fractional data from a space-delimited text file with column headers defined in the first line.

Parameters:
  • filename – The file to read.
  • year – Only returns observations for a specific year. The default behaviour is to return all recorded observations.
  • date_col – The name of the observation date column.
  • value_col – The name of the observation value column.
  • denom_col – The name of the observation denominator column.
Returns:

A list of observations, ordered as per the original file.

expect(params, op, period, pr_inf, prev, curr)

Determine the expected value for a given infection probability. The default implementation, for a population of size \(N\), is:

\[\mathbb{E}[x] = bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\]
Parameters:
  • params – The observation model parameters.
  • op – The observation model parameters.
  • period – The duration of the observation period (in days).
  • pr_inf – The probability of an individual becoming infected during the observation period (\(p_\mathrm{inf}\)).
  • prev – The state vectors at the start of the observation period.
  • curr – The state vectors at the end of the observation period.
log_llhd(params, op, obs, pr_indiv, curr, hist)

Calculate the log-likelihood of obtaining an observation from each particle.

Parameters:
  • params – The simulation parameters.
  • op – The observation model parameters.
  • obs – The observation that was made.
  • pr_indiv – A 2 x N matrix (for N particles) of the probability of an individual being uninfected and infected, respectively.
  • curr – The particle state vectors.
  • hist – The particle state histories, indexed by observation period.
Raises:

NotImplementedError – Derived classes must implement this method.

define_params(params, bg_obs, pr_obs, disp)

Add observation model parameters to the simulation parameters.

Parameters:
  • bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
  • pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
  • disp – The dispersion parameter (\(k\)).

Reading observations from disk

The from_file() methods provided by NBCounts and NBFractions both use read_table() to read specific columns from data files. This function is intended to be a flexible wrapper around numpy.loadtxt, and should be sufficient for reading almost any whitespace-delimited data file with named columns.

epifx.obs.read_table(filename, columns, date_fmt=None)

Load data from a space-delimited text file with column headers defined in the first line.

Parameters:
  • filename – The file to read.
  • columns – The columns to read, represented as a list of (name, type) tuples; type should either be a built-in NumPy scalar type, or either datetime.date or datetime.datetime, in which case string values will be automatically converted to datetime.datetime objects by datetime_conv().
  • date_fmt – A format string for parsing date columns; see datetime_conv() for details and the default format string.
Returns:

A NumPy structured array.

Example:

The code below demonstrates how to read observations from a file that includes columns for the year, the observation date, the observed value, and free-text metadata (up to 20 characters in length).

import datetime
import numpy as np
import epifx.obs
columns = [('year', np.int32), ('date', datetime.datetime),
           ('count', np.int32), ('metadata', 'S20')]
df = epifx.obs.read_table('my-data-file.ssv', columns,
                          date_fmt='%Y-%m-%d')
epifx.obs.datetime_conv(text, fmt='%Y-%m-%d')

Convert date strings to datetime.datetime instances. This is a convenience function intended for use with, e.g., numpy.loadtxt.

Parameters:
  • test – A string containing a date or date-time value.
  • fmt – A format string that defines the textual representation. See the Python strptime documentation for details.