Model Specification

This page provides a more comprehensive overview of how the model is quantitatively constructed and functions

Model specification

Following on from the contextual overview, an OptimaTB model essentially consists of a graph of compartments with transitions between them. For ease of construction, these compartments are specified by the user as 'cascade' compartments corresponding to disease states, and 'populations' specifying the groups of people for which each cascade should be replicated. Thus, the system of ODEs that is simulated contains compartments corresponding to one copy of the cascade for each population. For example, if the cascade had 3 states Susceptible->Infected->Recovered and there were three populations 15-64,65+,Prisoners then there would be 9 ODE compartments

15-64     : Susceptible
65+       : Susceptible
Prisoners : Susceptible
15-64     : Infected
65+       : Infected
Prisoners : Infected
15-64     : Recovered
65+       : Recovered
Prisoners : Recovered

Note that in the context of OptimaTB, a 'compartment' is typically taken to mean a state in the cascade e.g. susceptible, as opposed to a dynamic variable in the ODE system e.g. 15-64:Susceptible.

Special compartments

There are three types of special compartments

A birth compartment which corresponds to a source of people. Based on the birth rate, people are added to the simulation via transitions out of this compartment
A death compartment, when people are removed from the simulation they are moved into this compartment
A junction compartment which recieves input from several compartments, and then distributes its outputs proportionately across several compartments. For example, if the susceptible population is actually represented as two comparments e.g. unvaccinated and vaccinated, and infection is possible with two strains e.g. Infection A, Infection B, then without a junction compartment, there would have to be transitions from
```
  Unvaccinated -> Infection A
  Vaccinated   -> Infection A
  Unvaccinated -> Infection B
  Vaccinated   -> Infection B
```
Instead, a junction compartment can be used such that
```
  Unvaccinated \                  / Infection A
                ---- Junction ----
  Vaccinated   /                  \ Infection B
```
Then, the transitions from unvaccinated and vaccinated into the junction correspond to the rates of infection pooled over infection types, and the transition out of the junction can be framed in terms of the relative rate of new infections of types A and B. This can be useful in cases where the junction framework is a better match to the available empirical data e.g. if the proportion of strain A/strain B is known, but the specific rates Unvaccinated -> Infection A/Infection B are not known.

Note that the junction requires the outflows to be distributed in proportion format. Some older cascades and databooks contain parameters that map actual numbers to proportions, to handle the case where the number of people has been measured and to allow the databook to contain a record of this number (as opposed to computing and entering the corresponding portion). In some cases, where there were very few cases, the ratio could be highly sensitive to individuals e.g. ratio of MDR/XDR could depend on just a handful of cases, so there may be 0:0 or 1:0 or 0:1 which could have a strong effect on the simulation output.

Characteristics

Each compartment has a size in units of number of people. However, sometimes it is useful to construct quantities that represent groups of compartments. For example, the number of people alive is given by summing over all of the compartments. Similarly, it is also useful to construct quantities that correspond to normalizations of populations. For example, the disease prevalance is the number of infected people divided by the total number of people. These quantities are all derived from sums and divisions of the population counts, and are referred to as characteristics. These characteristics are defined in the cascade, and there is therefore one characteristic value for each population in the simulation e.g. The number of 'alive' people obtained by summing over the cascade states will be computed for each population.

Entry points and initialization

Characteristics also provide the interface for setting the initial population sizes are entered into the simulation. The compartments are defined in cascade.xlsx in the 'Compartments' worksheet. Each characteristic has a list of other compartments or characteristics that are summed together to determine the value of the characteristic. In addition, it has an 'Entry point' that corresponds to one compartment. Each compartment must appear as an entry point in one and only one characteristic. The 'entry point' indicates that the value of the characteristic provided by the user should be used to compute the initial value of the entry compartment given the value of all other included compartments. Consider the following example

label	databook_order	value	entry point	includes
`n_sus`	-1			s
`n_inf`		20	i	i
`n_rec`		50	r	r
`alive`		100	s	s, i, r

First, consider just the label and includes columns. The characteristics all correspond to numbers of people - the first three correspond to the number of people in each of the susceptible, infected, and recovered compartments. The final characteristic corresponds to the total number of people that are alive in the simulation, obtained by summing the three compartments together.

Turning to databook_order, we can see that the databook will contain fields for n_inf, n_rec, and alive - but not n_sus. So the user will be asked for the number of people who are infected and the number of people who have recovered, and the number of people who are alive. Of course, it can be trivially be seen that the number of susceptible people is simply the number of people who are alive, minus the number of people who are infected or recovered. For the numbers above, there should be 30 people in the susceptible compartment.

Now, n_inf and n_rec only include one compartment, which means that the value entered by the user directly maps to the compartment specified as the entry point. Thus i=20 and r=50. Then, alive=100 means that

100 = s + i + r

Because s is the entry point for alive, this equation is rearranged to solve for s

s = 100 - i - r

and then because values have now been provided for i and r, this expression can now be solved for s.

In practice, this calculation can be highly nontrivial as the user may be initializating several different compartments using overlapping sets of characteristics. A sophisticated set theory algorithm is used to perform the allocation in these cases. Errors arising in this part of the code are likely because some of the user-provided values are incompatible. For instance, if the number of alive people above was specified as 60, the number of people alive would be smaller than the number of infected and recovered people. This is clearly impossible, since compartment size is always positive.

Finally, one complication is what happens when the initial year of the simulation run doesn't match any of the years with data. In this case, the values entered by the user will be interpolated or extrapolated to match the starting year. There are two features here to keep in mind

Standard interpolation is used for times that are within the range provided by the user e.g. data points for 2010, 2012, 2013, and initial simulation year of 2011. However, extrapolation simply uses the closest value, with zero gradient. So for example, starting the simulation in 2001 or 2009 would use the value for 2010 as the initial condition. This can be problematic if there is a large discrepancy between the nearest data point and the initial year of the simulation
The interpolation is performed independently for each population. In regard to the set decomposition used to map the user-provided population sizes to the populations in the simulation, even if the data provides a valid decomposition for the years entered by the user, it is possible that the interpolated values do not satisfy required constraints. For example, a valid decomposition may be possible in 2010 and 2012, but the interpolated values in 2011 fail. This can cause initialization failures that vary with the start year of the simulation and not with the data.

Note - there are suggestions in code comments that a compartment that is not marked as an entry point is allowed, but is assumed to start with zero population

Transitions and Parameters

The cascade defines a set of directed edges that correspond to transitions between one cascade state to another. These edges correspond to transition rates. For instance, if we consider

Susceptible -(A)-> Infected -(B)-> Recovered

then the quantities A and B represent the rate at which susceptible people become infected, and the rate at which infected people recover, respectively. Another way of formulating these transitions is by considering a matrix formulation of the system. For example

[S']   [ -A  0  0] [S]
[I'] = [ +A -B  0]*[I]
[R']   [ 0  +B  0] [R]

This transition matrix is specified in precisely this format in cascade.xlsx. However, a slight complication is that these transition values may need to be computed in complex ways. Firstly, empirically measured quantities may not correspond directly to transition rates. For example, the infection rate A may depend on the strain of the disease (e.g. drug resistant or not) and the user may know the base infection rate and the relative infectiousness of the disease strains. Thus the concept of transitions is generalized to parameters which are quantities that are combined arbitrarily to compute the transition coefficients. cascade.xlsx contains definitions for both the transition matrix and parameters. For each unique entry in the transition matrix, there must be exactly one parameter that maps to it. The same parameter can appear multiple times in the transition matrix. In addition, the parameter list may contain other quantities that contribute. So for instance, the transition matrix may contain the labels

A - inf_rate
B - rec_rate

and the parameter list may contain

Parameter Name	Transition Tag	Databook Order	Formula
`base_infection_rate`
`strain_arel`
`strain_brel`
`total_infection_rate`	`inf_rate`	-1	`(base_infection_ratestrain_arel + base_infection_ratestrain_brel)/(strain_arel+strain_brel)`
`recovery_rate`	`rec_rate`

The column titles correspond to

The name of the parameter
Whether it maps to a transition tag
Whether the parameter appears in the databook (-1 means it is not entered by the user)
How to compute the parameter value

So going through this table, the databook will prompt the user to enter values for base_infection_rate,strain_arel, and strain_brel. However, none of these values are used directly because they are not associated with any transition tags. Instead, OptimaTB will derive the value for total_infection_rate based on these quantities. Because the databook order for total_infection_rate is -1, the user is not able to directly set this quantity. But the fact that total_infection_rate is associated with a transition tag indicates that this value is the one that directly correspond to one of the transition rates. In contrast, the user is prompted for recovery_rate and this quantity is directly used as a transition rate.

Births and deaths

Births and deaths are handled using special compartments. A birth compartment is a reservoir that transfers people into normal compartments at every timestep depending on the birth rate (which is introduced as a transition quantity).

At present, the birth rate must be specified as an absolute number of people, as opposed to a fraction of the population.

Cross-population effects

Thus far, we have looked at the dynamics of individuals through the disease cascade but within a single population. However, the purpose of including multiple populations in the simulation is because they can interact and affect each other. OptimaTB models two types of interactions - 'contacts' which correspond to encounters/interactions across populations, 'transfers' which correspond to moving people from one population to another.

Contacts

In general, parameter values are specified on a per-population basis. For instance, the model

Susceptible -(A)-> Infected -(B)-> Recovered

would have a different recovery rate B in every population e.g., if older people take longer to recover. A special tag avg_contacts_in can be added to a parameter to indicate that it should be computed as a weighted average over populations. This is typically used for the force-of-infection parameter to account for cross-population transmission. If the avg_contacts_in flag is provided, then the parameter value will be computed by

First, evaluating the parameter formula in each population
Taking a weighted average to compute the final parameter value in each population. Each population could have different weights. The weights are specified in the 'Population Contacts' worksheet of the databook

Consider a worked example with populations 0-10 and 11-60. The force of infection depends on the disease infectiousness inf and the prevalance in each each population. We believe that children are much more likely to pass on the disease than adults, and that adults are more likely to infect other adults than kids. So the implementation proceeds as follows:

A characteristic needs to be introduced to compute the prevalance. This characteristic is specified as

label denominator includes

alive s, i, r

prev alive i

That is, we first introduce a characteristic for the number of people who are alive, and then introduce a characteristic that expresses the number of people in the infected compartment as a fraction of the number of people who are alive
Next, we introduce the parameters for transmission

label databook order transition tag special rules formula

inf

foi -1 A avg_contacts_in inf*prev

That is, first we introduce a parameter inf which the user needs to provide in the databook. Then, we declare a parameter foi which maps to the transition rate A (the rate at which people move from susceptible to infected). We calculate this value based on the infectiousness and prevalance, and therefore do not ask the user to supply the value. We then supply the formula used to calculate the force of infection, as the prevalance multiplied by the infectiousness. Finally, the special rule avg_contacts_in is provided to indicate that this quantity needs to be calculated across populations
We then fill out the Population Contacts table in the databook which provides the weights for this sum. Because we believe that children are more infectious, we might fill it out as follows

Interaction Impact Weights 0-10 11-60

0-10 5 5

11-60 1 2

This table is read as weighting the (directed) interaction from row to column. So if we want to know the relative contribution of the populations to 11-60 we look at the second column, and then going down the table, we see that 0-10 has a weight 5 times larger than 11-60. Note that these weights are normalized in the weighted sum (i.e. divided by the columnwise sum). This quantity should be interpreted as the relative frequency of interactions.
Finally, to compute the foi parameter, we first compute inf*prev for both populations. Suppose inf=0.5 and prev=0.1 in population 0-10 and prev=0.2 in population 11-60. So we obtain the unweighted quantities for foi, 0.05 in 0-10 and 0.1 in 11-60.

Then, we need to take into account the weighting. First, the number of interactions depends on the size of the population. So the initial weights are the population sizes. These are then multiplied by the weights from the Population Contacts matrix. So if there were 100 people in the 0-10 age group and 1000 people in the 11-60 age group, the final quantities in each population will be
```
 foi (0-10)  = 0.05*(5/6)*(100/1100) + 0.1*(1/6)*(1000/1100) = 0.019
 foi (11-60) = 0.05*(5/6)*(100/1100) + 0.1*(2/6)*(1000/1100) = 0.034
```
In this way, we find that the infection rate is higher in the adult population than in the child population because children are more likely to infect adults than the other way round.

Transfers

The second mechanism for cross-population interactions involves movement of people from one population to another. Examples of this include

Aging, in which people move from one age compartment to the next
Membership of special groups, e.g. PLHIV or prisoners

A transfer is conservative - people are subtracted from a compartment in one population, and added to the corresponding compartment in a different population. The mapping from compartment to compartment is one to one, in order to preserve disease state. For example, a vaccinated individual moving into prison would remain vaccinated.

The transfer rate may be time dependent - for aging, this would correspond to a nonuniform distribution of ages within a range (e.g. if there were more 11-30 year olds than 30-64 year olds in the 11-64 compartment). For other transfers, such as for prisoners, a time dependent rate is more naturally interpreted as just a change in rate from year to year i.e. the incarceration rate would be expected to vary from year to year.

Note that transfers and interactions are complementary but distinct - a disease could spread from the prisoner population to the general population either by the prisoner meeting someone (e.g. prison visit) and passing the illness on, which would be an interaction, or if an infected prisoner is released and joins the general population, which would be a transfer. This also provides another illustration of usage of interaction impact weights - members of the general population may interact with each other a lot, and members of the prison population may interact with each other a lot, but the nature of prisons means that there is minimal contact between the two populations. Hence they may have high internal interaction weights and low cross-population interaction weights.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Specification