Observation Models¶
A generic base class for negative binomial observation models is provided, which assumes that the model is parameterised by a background signal (\(bg_\mathrm{obs}\)), an observation probability (\(p_\mathrm{obs}\)), and a dispersion parameter (\(k\)).
-
class
epifx.obs.
NB
(obs_unit, obs_period)¶ The base class for generic observation models that assume the relationship between observed numerical quantities \(y_t\) and disease incidence in particle \(x_t\) follows a negative binomial distribution with mean \(\mathbb{E}[x]\) and dispersion parameter \(k\).
\[\mathcal{L}(y_t \mid x_t) \sim NB(\mathbb{E}[x], k)\]Parameters: - obs_unit – A descriptive name for the data.
- obs_period – The observation period (in days).
-
define_params
(params, bg_obs, pr_obs, disp)¶ Add observation model parameters to the simulation parameters.
Parameters: - bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
- pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
- disp – The dispersion parameter (\(k\)).
-
expect
(params, op, period, pr_inf, prev, curr)¶ Determine the expected value for a given infection probability. The default implementation, for a population of size \(N\), is:
\[\mathbb{E}[x] = bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\]Parameters: - params – The observation model parameters.
- op – The observation model parameters.
- period – The duration of the observation period (in days).
- pr_inf – The probability of an individual becoming infected during the observation period (\(p_\mathrm{inf}\)).
- prev – The state vectors at the start of the observation period.
- curr – The state vectors at the end of the observation period.
-
log_llhd
(params, op, obs, pr_indiv, curr, hist)¶ Calculate the log-likelihood of obtaining an observation from each particle.
Parameters: - params – The simulation parameters.
- op – The observation model parameters.
- obs – The observation that was made.
- pr_indiv – A 2 x N matrix (for N particles) of the probability of an individual being uninfected and infected, respectively.
- curr – The particle state vectors.
- hist – The particle state histories, indexed by observation period.
Raises: NotImplementedError – Derived classes must implement this method.
Two sub-classes are provided, which cater for count data and fractional data, respectively.
Count data¶
-
class
epifx.obs.
NBCounts
(obs_unit, obs_period)¶ Generic negative binomial observation model for relating disease incidence to count data.
Parameters: - obs_unit – A descriptive name for the data.
- obs_period – The observation period (in days).
-
from_file
(filename, year=None, date_col='to', value_col='count')¶ Load count data from a space-delimited text file with column headers defined in the first line.
Parameters: - filename – The file to read.
- year – Only returns observations for a specific year. The default behaviour is to return all recorded observations.
- date_col – The name of the observation date column.
- value_col – The name of the observation value column.
Returns: A list of observations, ordered as per the original file.
-
log_llhd
(params, op, obs, pr_indiv, curr, hist)¶ Calculate the log-likelihood of obtaining an observation from each particle.
Parameters: - params – The simulation parameters.
- op – The observation model parameters.
- obs – The observation that was made.
- pr_indiv – A 2 x N matrix (for N particles) of the probability of an individual being uninfected and infected, respectively.
- curr – The particle state vectors.
- hist – The particle state histories, indexed by observation period.
Raises: NotImplementedError – Derived classes must implement this method.
-
define_params
(params, bg_obs, pr_obs, disp)¶ Add observation model parameters to the simulation parameters.
Parameters: - bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
- pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
- disp – The dispersion parameter (\(k\)).
-
expect
(params, op, period, pr_inf, prev, curr)¶ Determine the expected value for a given infection probability. The default implementation, for a population of size \(N\), is:
\[\mathbb{E}[x] = bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\]Parameters: - params – The observation model parameters.
- op – The observation model parameters.
- period – The duration of the observation period (in days).
- pr_inf – The probability of an individual becoming infected during the observation period (\(p_\mathrm{inf}\)).
- prev – The state vectors at the start of the observation period.
- curr – The state vectors at the end of the observation period.
Fractional data¶
-
class
epifx.obs.
NBFractions
(obs_unit, obs_period, percentages=False)¶ Generic negative binomial observation model for relating disease incidence to fractional and percentage data, where the denominator is known and reported.
Parameters: - obs_unit – A descriptive name for the data.
- obs_period – The observation period (in days).
- percentages – Indicates whether the data are fractional values
(
False
, default) or percentages (True
).
-
from_file
(filename, year=None, date_col='to', value_col='value', denom_col='denominator')¶ Load fractional data from a space-delimited text file with column headers defined in the first line.
Parameters: - filename – The file to read.
- year – Only returns observations for a specific year. The default behaviour is to return all recorded observations.
- date_col – The name of the observation date column.
- value_col – The name of the observation value column.
- denom_col – The name of the observation denominator column.
Returns: A list of observations, ordered as per the original file.
-
expect
(params, op, period, pr_inf, prev, curr)¶ Determine the expected value for a given infection probability. The default implementation, for a population of size \(N\), is:
\[\mathbb{E}[x] = bg_\mathrm{obs} + p_\mathrm{inf} \cdot p_\mathrm{obs} \cdot N\]Parameters: - params – The observation model parameters.
- op – The observation model parameters.
- period – The duration of the observation period (in days).
- pr_inf – The probability of an individual becoming infected during the observation period (\(p_\mathrm{inf}\)).
- prev – The state vectors at the start of the observation period.
- curr – The state vectors at the end of the observation period.
-
log_llhd
(params, op, obs, pr_indiv, curr, hist)¶ Calculate the log-likelihood of obtaining an observation from each particle.
Parameters: - params – The simulation parameters.
- op – The observation model parameters.
- obs – The observation that was made.
- pr_indiv – A 2 x N matrix (for N particles) of the probability of an individual being uninfected and infected, respectively.
- curr – The particle state vectors.
- hist – The particle state histories, indexed by observation period.
Raises: NotImplementedError – Derived classes must implement this method.
-
define_params
(params, bg_obs, pr_obs, disp)¶ Add observation model parameters to the simulation parameters.
Parameters: - bg_obs – The background signal in the data (\(bg_\mathrm{obs}\)).
- pr_obs – The probability of observing an infected individual (\(p_\mathrm{obs}\)).
- disp – The dispersion parameter (\(k\)).
Reading observations from disk¶
The from_file()
methods provided by NBCounts
and
NBFractions
both use read_table()
to read specific columns from
data files.
This function is intended to be a flexible wrapper around numpy.loadtxt
,
and should be sufficient for reading almost any whitespace-delimited data file
with named columns.
-
epifx.obs.
read_table
(filename, columns, date_fmt=None)¶ Load data from a space-delimited text file with column headers defined in the first line.
Parameters: - filename – The file to read.
- columns – The columns to read, represented as a list of
(name, type)
tuples;type
should either be a built-in NumPy scalar type, or eitherdatetime.date
ordatetime.datetime
, in which case string values will be automatically converted todatetime.datetime
objects bydatetime_conv()
. - date_fmt – A format string for parsing date columns; see
datetime_conv()
for details and the default format string.
Returns: A NumPy structured array.
Example: The code below demonstrates how to read observations from a file that includes columns for the year, the observation date, the observed value, and free-text metadata (up to 20 characters in length).
import datetime import numpy as np import epifx.obs columns = [('year', np.int32), ('date', datetime.datetime), ('count', np.int32), ('metadata', 'S20')] df = epifx.obs.read_table('my-data-file.ssv', columns, date_fmt='%Y-%m-%d')
-
epifx.obs.
datetime_conv
(text, fmt='%Y-%m-%d')¶ Convert date strings to datetime.datetime instances. This is a convenience function intended for use with, e.g.,
numpy.loadtxt
.Parameters: - test – A string containing a date or date-time value.
- fmt – A format string that defines the textual representation.
See the Python
strptime
documentation for details.