-
Notifications
You must be signed in to change notification settings - Fork 0
Model Specification
This page provides a more comprehensive overview of how the model is quantitatively constructed and functions
Following on from the contextual overview, an OptimaTB model essentially consists of a graph of compartments with transitions between them. For ease of construction, these compartments are specified by the user as 'cascade' compartments corresponding to disease states, and 'populations' specifying the groups of people for which each cascade should be replicated. Thus, the system of ODEs that is simulated contains compartments corresponding to one copy of the cascade for each population. For example, if the cascade had 3 states Susceptible->Infected->Recovered and there were three populations 15-64,65+,Prisoners then there would be 9 ODE compartments
15-64 : Susceptible
65+ : Susceptible
Prisoners : Susceptible
15-64 : Infected
65+ : Infected
Prisoners : Infected
15-64 : Recovered
65+ : Recovered
Prisoners : Recovered
Note that in the context of OptimaTB, a 'compartment' is typically taken to mean a state in the cascade e.g. susceptible, as opposed to a dynamic variable in the ODE system e.g. 15-64:Susceptible.
There are three types of special compartments
-
A
birthcompartment which corresponds to a source of people. Based on the birth rate, people are added to the simulation via transitions out of this compartment -
A
deathcompartment, when people are removed from the simulation they are moved into this compartment -
A
junctioncompartment which recieves input from several compartments, and then distributes its outputs proportionately across several compartments. For example, if the susceptible population is actually represented as two comparments e.g. unvaccinated and vaccinated, and infection is possible with two strains e.g. Infection A, Infection B, then without a junction compartment, there would have to be transitions fromUnvaccinated -> Infection A Vaccinated -> Infection A Unvaccinated -> Infection B Vaccinated -> Infection BInstead, a junction compartment can be used such that
Unvaccinated \ / Infection A ---- Junction ---- Vaccinated / \ Infection BThen, the transitions from unvaccinated and vaccinated into the junction correspond to the rates of infection pooled over infection types, and the transition out of the junction can be framed in terms of the relative rate of new infections of types A and B. This can be useful in cases where the junction framework is a better match to the available empirical data e.g. if the proportion of strain A/strain B is known, but the specific rates Unvaccinated -> Infection A/Infection B are not known.
Note that the junction requires the outflows to be distributed in proportion format. Some older cascades and databooks contain parameters that map actual numbers to proportions, to handle the case where the number of people has been measured and to allow the databook to contain a record of this number (as opposed to computing and entering the corresponding portion). In some cases, where there were very few cases, the ratio could be highly sensitive to individuals e.g. ratio of MDR/XDR could depend on just a handful of cases, so there may be 0:0 or 1:0 or 0:1 which could have a strong effect on the simulation output.
Each compartment has a size in units of number of people. However, sometimes it is useful to construct quantities that represent groups of compartments. For example, the number of people alive is given by summing over all of the compartments. Similarly, it is also useful to construct quantities that correspond to normalizations of populations. For example, the disease prevalance is the number of infected people divided by the total number of people. These quantities are all derived from sums and divisions of the population counts, and are referred to as characteristics. These characteristics are defined in the cascade, and there is therefore one characteristic value for each population in the simulation e.g. The number of 'alive' people obtained by summing over the cascade states will be computed for each population.
Characteristics also provide the interface for setting the initial population sizes are entered into the simulation. The compartments are defined in cascade.xlsx in the 'Compartments' worksheet. Each characteristic has a list of other compartments or characteristics that are summed together to determine the value of the characteristic. In addition, it has an 'Entry point' that corresponds to one compartment. Each compartment must appear as an entry point in one and only one characteristic. The 'entry point' indicates that the value of the characteristic provided by the user should be used to compute the initial value of the entry compartment given the value of all other included compartments. Consider the following example
| label | databook_order | value | entry point | includes |
|---|---|---|---|---|
n_sus |
-1 | s | ||
n_inf |
20 | i | i | |
n_rec |
50 | r | r | |
alive |
100 | s | s, i, r |
First, consider just the label and includes columns. The characteristics all correspond to numbers of people - the first three correspond to the number of people in each of the susceptible, infected, and recovered compartments. The final characteristic corresponds to the total number of people that are alive in the simulation, obtained by summing the three compartments together.
Turning to databook_order, we can see that the databook will contain fields for n_inf, n_rec, and alive - but not n_sus. So the user will be asked for the number of people who are infected and the number of people who have recovered, and the number of people who are alive. Of course, it can be trivially be seen that the number of susceptible people is simply the number of people who are alive, minus the number of people who are infected or recovered. For the numbers above, there should be 30 people in the susceptible compartment.
Now, n_inf and n_rec only include one compartment, which means that the value entered by the user directly maps to the compartment specified as the entry point. Thus i=20 and r=50. Then, alive=100 means that
100 = s + i + r
Because s is the entry point for alive, this equation is rearranged to solve for s
s = 100 - i - r
and then because values have now been provided for i and r, this expression can now be solved for s.
In practice, this calculation can be highly nontrivial as the user may be initializating several different compartments using overlapping sets of characteristics. A sophisticated set theory algorithm is used to perform the allocation in these cases. Errors arising in this part of the code are likely because some of the user-provided values are incompatible. For instance, if the number of alive people above was specified as 60, the number of people alive would be smaller than the number of infected and recovered people. This is clearly impossible, since compartment size is always positive.
Finally, one complication is what happens when the initial year of the simulation run doesn't match any of the years with data. In this case, the values entered by the user will be interpolated or extrapolated to match the starting year. There are two features here to keep in mind
- Standard interpolation is used for times that are within the range provided by the user e.g. data points for 2010, 2012, 2013, and initial simulation year of 2011. However, extrapolation simply uses the closest value, with zero gradient. So for example, starting the simulation in 2001 or 2009 would use the value for 2010 as the initial condition. This can be problematic if there is a large discrepancy between the nearest data point and the initial year of the simulation
- The interpolation is performed independently for each population. In regard to the set decomposition used to map the user-provided population sizes to the populations in the simulation, even if the data provides a valid decomposition for the years entered by the user, it is possible that the interpolated values do not satisfy required constraints. For example, a valid decomposition may be possible in 2010 and 2012, but the interpolated values in 2011 fail. This can cause initialization failures that vary with the start year of the simulation and not with the data.
Note - there are suggestions in code comments that a compartment that is not marked as an entry point is allowed, but is assumed to start with zero population
The cascade defines a set of directed edges that correspond to transitions between one cascade state to another. These edges correspond to transition rates. For instance, if we consider
Susceptible -(A)-> Infected -(B)-> Recovered
then the quantities A and B represent the rate at which susceptible people become infected, and the rate at which infected people recover, respectively. Another way of formulating these transitions is by considering a matrix formulation of the system. For example
[S'] [ -A 0 0] [S]
[I'] = [ +A -B 0]*[I]
[R'] [ 0 +B 0] [R]
This transition matrix is specified in precisely this format in cascade.xlsx. However, a slight complication is that these transition values may need to be computed in complex ways. Firstly, empirically measured quantities may not correspond directly to transition rates. For example, the infection rate A may depend on the strain of the disease (e.g. drug resistant or not) and the user may know the base infection rate and the relative infectiousness of the disease strains. Thus the concept of transitions is generalized to parameters which are quantities that are combined arbitrarily to compute the transition coefficients. cascade.xlsx contains definitions for both the transition matrix and parameters. For each unique entry in the transition matrix, there must be exactly one parameter that maps to it. The same parameter can appear multiple times in the transition matrix. In addition, the parameter list may contain other quantities that contribute. So for instance, the transition matrix may contain the labels
A - inf_rate
B - rec_rate
and the parameter list may contain
| Parameter Name | Transition Tag | Databook Order | Formula |
|---|---|---|---|
base_infection_rate |
|||
strain_arel |
|||
strain_brel |
|||
total_infection_rate |
inf_rate |
-1 | (base_infection_rate*strain_arel + base_infection_rate*strain_brel)/(strain_arel+strain_brel) |
recovery_rate |
rec_rate |
The column titles correspond to
- The name of the parameter
- Whether it maps to a transition tag
- Whether the parameter appears in the databook (
-1means it is not entered by the user) - How to compute the parameter value
So going through this table, the databook will prompt the user to enter values for base_infection_rate,strain_arel, and strain_brel. However, none of these values are used directly because they are not associated with any transition tags. Instead, OptimaTB will derive the value for total_infection_rate based on these quantities. Because the databook order for total_infection_rate is -1, the user is not able to directly set this quantity. But the fact that total_infection_rate is associated with a transition tag indicates that this value is the one that directly correspond to one of the transition rates. In contrast, the user is prompted for recovery_rate and this quantity is directly used as a transition rate.
Births and deaths are handled using special compartments. A birth compartment is a reservoir that transfers people into normal compartments at every timestep depending on the birth rate (which is introduced as a transition quantity).
At present, the birth rate must be specified as an absolute number of people, as opposed to a fraction of the population.
Thus far, we have looked at the dynamics of individuals through the disease cascade but within a single population. However, the purpose of including multiple populations in the simulation is because they can interact and affect each other. OptimaTB models two types of interactions - 'contacts' which correspond to encounters/interactions across populations, 'transfers' which correspond to moving people from one population to another.
In general, parameter values are specified on a per-population basis. For instance, the model
Susceptible -(A)-> Infected -(B)-> Recovered
would have a different recovery rate B in every population e.g., if older people take longer to recover. A special tag avg_contacts_in can be added to a parameter to indicate that it should be computed as a weighted average over populations. This is typically used for the force-of-infection parameter to account for cross-population transmission. If the avg_contacts_in flag is provided, then the parameter value will be computed by
- First, evaluating the parameter formula in each population
- Taking a weighted average to compute the final parameter value in each population. Each population could have different weights. The weights are specified in the 'Population Contacts' worksheet of the databook
Consider a worked example with populations 0-10 and 11-60. The force of infection depends on the disease infectiousness inf and the prevalance in each each population. We believe that children are much more likely to pass on the disease than adults, and that adults are more likely to infect other adults than kids. So the implementation proceeds as follows:
-
A characteristic needs to be introduced to compute the prevalance. This characteristic is specified as
label denominator includes alives, i, r prevalivei That is, we first introduce a characteristic for the number of people who are alive, and then introduce a characteristic that expresses the number of people in the infected compartment as a fraction of the number of people who are alive
-
Next, we introduce the parameters for transmission
label databook order transition tag special rules formula inffoi-1 Aavg_contacts_ininf*prevThat is, first we introduce a parameter
infwhich the user needs to provide in the databook. Then, we declare a parameterfoiwhich maps to the transition rateA(the rate at which people move from susceptible to infected). We calculate this value based on the infectiousness and prevalance, and therefore do not ask the user to supply the value. We then supply the formula used to calculate the force of infection, as the prevalance multiplied by the infectiousness. Finally, the special ruleavg_contacts_inis provided to indicate that this quantity needs to be calculated across populations -
We then fill out the
Population Contactstable in the databook which provides the weights for this sum. Because we believe that children are more infectious, we might fill it out as followsInteraction Impact Weights 0-10 11-60 0-10 5 5 11-60 1 2 This table is read as weighting the (directed) interaction from row to column. So if we want to know the relative contribution of the populations to
11-60we look at the second column, and then going down the table, we see that0-10has a weight 5 times larger than11-60. Note that these weights are normalized in the weighted sum (i.e. divided by the columnwise sum). This quantity should be interpreted as the relative frequency of interactions. -
Finally, to compute the
foiparameter, we first computeinf*prevfor both populations. Supposeinf=0.5andprev=0.1in population0-10andprev=0.2in population11-60. So we obtain the unweighted quantities forfoi,0.05in0-10and0.1in11-60.Then, we need to take into account the weighting. First, the number of interactions depends on the size of the population. So the initial weights are the population sizes. These are then multiplied by the weights from the
Population Contactsmatrix. So if there were 100 people in the0-10age group and 1000 people in the11-60age group, the final quantities in each population will befoi (0-10) = 0.05*(5/6)*(100/1100) + 0.1*(1/6)*(1000/1100) = 0.019 foi (11-60) = 0.05*(5/6)*(100/1100) + 0.1*(2/6)*(1000/1100) = 0.034In this way, we find that the infection rate is higher in the adult population than in the child population because children are more likely to infect adults than the other way round.
The second mechanism for cross-population interactions involves movement of people from one population to another. Examples of this include
- Aging, in which people move from one age compartment to the next
- Membership of special groups, e.g. PLHIV or prisoners
A transfer is conservative - people are subtracted from a compartment in one population, and added to the corresponding compartment in a different population. The mapping from compartment to compartment is one to one, in order to preserve disease state. For example, a vaccinated individual moving into prison would remain vaccinated.
The transfer rate may be time dependent - for aging, this would correspond to a nonuniform distribution of ages within a range (e.g. if there were more 11-30 year olds than 30-64 year olds in the 11-64 compartment). For other transfers, such as for prisoners, a time dependent rate is more naturally interpreted as just a change in rate from year to year i.e. the incarceration rate would be expected to vary from year to year.
Note that transfers and interactions are complementary but distinct - a disease could spread from the prisoner population to the general population either by the prisoner meeting someone (e.g. prison visit) and passing the illness on, which would be an interaction, or if an infected prisoner is released and joins the general population, which would be a transfer. This also provides another illustration of usage of interaction impact weights - members of the general population may interact with each other a lot, and members of the prison population may interact with each other a lot, but the nature of prisons means that there is minimal contact between the two populations. Hence they may have high internal interaction weights and low cross-population interaction weights.