GitHub - vhsvhs/simgenome: SimGenome is a quantitative model of gene transcription regulatory circuit evolution. SimGenome incorporates three major aspects of this process: (i) DNA-binding thermodynamics, (ii) non-equilibrium gene expression patterns, and (iii) population structure and evolution.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
docs		docs
saved_pops		saved_pops
segal2008		segal2008
src		src
tools		tools
README		README

Repository files navigation

SimGenome README

Victor Hanson-Smith
victorhansonsmith@gmail.com
2012

========
OVERVIEW

SimGenome is a quantitative model of gene transcription regulatory circuit evolution.  SimGenome incorporates three major
aspects of this process: (i) DNA-binding thermodynamics, (ii) non-equilibrium gene expression patterns, and (iii) population 
structure and evolution.

=================
REQUIRED SOFTWARE

1. Python
2. MPI
3. mpi4py
4. Numpy

========
MPI NOTES

SimGenome uses a master-slave architecture with MPI.  The master process is 0, and the slave processes are 1 through N-1.

SimGenome has two running modes: genetic alogorithm (GA) and knock-out (KO) analysis.

In GA, at the beginning of each generation, all slaves self identify a subset of the individuals in the population, and then proceed
to calculate the fitness of those individuals.  Upon completion of fitness calculations, each slave sends a hashtable (where key = individual ID,
value = fitness score) to the master.  The master incorporates these hashtables into a single hashtable.  The master then uses
this hashtable of fitnesses to selectively mate and reproduce the population.  The master also injects mutations into the new children, according
to the mutation parameters specified by the user.  The master then broadcasts the updated population to all the slaves, and the loop
then iterates to the next generation.

In KO, the analysis focuses on a single genome, specified using --poppath P and --ko G, where P is the population
and G is the ID of the target genome within P.  All slaves identify a subset of regulatory genes in G.  Each slave then calculates the
fitness of the WT individual, and the fitness of the individual with each regulatory gene KO'd individually.

==============
LOAD BALANCING
Keep in mind the master node only dispatches jobs; the slaves do the heavy-lifting (i.e. the fitness calculations).  This means that
best load-balancing results will be achieved using a population with M individuals on a computational cluster with N MPI processes, 
where M is a multiple of N-1.  For example, with a population size of 128, good values for N include 5, 9, 17, 33, 65, and 129.

=================================
VERBOSITY NOTES (use --verbose N)

N= 1:
. Prints basic notifications about population bu, loading files, etc.
. Prints generation stats: time to completion, max/min/mean/median/sd fitness.

N= 2:
. Pickles the population, writes population.gen.X.pickle.
. Prints ID fitnesses, elites, and F1 crosses.
. Writes the R script to plot_expression for each individual for each generation.

N= 3:

N= 4:
. Invokes the method 'print_configuration', which writes config.gen.X.gid.Y.txt
. Invokes R to plot_expression.

N= 5:
. Prints timeslice information to stdout.


N= 10:
. Plots the R script (functon plot_expression) for each individual for each generation.

N= 99:
. Prints some debugging information.  This is very verbose, and not recommended.

N= 100:
. Prints all debugging information.  This is outrageously verbose.


=============================================
NOTES ON GROWTH RATE AND DECAY RATE
The rate at which genes reach maximum expression, and the rate of their decay, are governed
by three parameters:
--growth_rate
--decay_rate
--pe_scalar

Genes will be turned on faster if you increase the value of growth rate and/or pe_scalar.
Genes will decay faster if you increase the value of decay_rate and/or pe_scalar.

It is important to avoid binding "saturation," i.e. the situation where a regulatory region bound at X bits
reaches maximum expression at the same rate as a gene bound by X+n bits.  In this saturation scenario,
there is no evolutionary advantage for the regulatory machinery to evolve additional specificity
for the URS.  To avoid this problem, turn down pe_scalar.

========================
TROUBLESHOOTING & F.A.Q.

* If SimGenome hangs at the end of a generation (usually the 0th generation), this may indicate that a slave node has crashed.

About

SimGenome is a quantitative model of gene transcription regulatory circuit evolution. SimGenome incorporates three major aspects of this process: (i) DNA-binding thermodynamics, (ii) non-equilibrium gene expression patterns, and (iii) population structure and evolution.

Readme