Node 9
Node 9
shtml
CCL node9
About CCL
Rules, Instructions, Next: Molecular Comparisons Up: Molecular Modeling Previous: Molecular Mechanics
Contributing
Material, Supporting,
About Us, Molecular Dynamics and Monte Carlo
Resources
The true picture of a molecular system is far from the static, idealized image provided by
Software Archive, molecular mechanics. The atoms and groups in molecules are in constant motion, and,
List Archive, Data especially in the case of biological macromolecules, these movements are concerted
Archives, Document (correlated) and may be essential for biological function. The direct proof of this behavior
Archives, is provided by X-ray and NMR experiments. Thermodynamic properties of molecules,
especially when they form a complex or are immersed in a solvent, can not be derived
Search CCL
from the harmonic approximation which inherently assumes small amplitude motions
Text Search, RegExp around a symmetric minimum. While molecular mechanics and the simulation methods
Search, described below share the same potential energy function they differ conceptually. In
molecular mechanics, the information is derived from a single geometry of the molecule.
Announcements
Simulation methods require thousands to millions of geometries to produce meaningful
Conferences, Jobs, results. The biological action of molecules frequently involves large amplitude motions
Resumes, (biochemists aften call them ``conformational transitions'' or ``changes''), resulting in
Links abrupt changes in the geometry of the protein/ligand molecule as well as rearrangement of
solvent molecules. The thermodynamic and energetic parameters of such systems can only
Topics, Data,
be derived by realistic and explicit simulation, i.e., by leaving the potential energy
Software Sites,
minimum and probing the energy surface of the molecular system. Molecular simulations
Hardware Sites,
are a recent addition to molecular modeling systems and are still in active stages of
Institutions,
development. They are also more complex conceptually than molecular mechanics. Recent
Listsoftsites, Search
monographs on this topic contain not only the physical and mathematical background
Engines,
necessary to use the method but also point to applications of the methods: Brooks et. al,
E-mail us (1988); McCammon & Harvey (1987), van Gunsteren & Weiner (1989).
Send E-mail to CCL There are essentially two approaches to performing molecular simulations: the stochastic
Administrators, and the deterministic. The stochastic approach, called Monte Carlo, is based on exploring
the energy surface by randomly probing the geometry of the molecular system (or, in a
statistical mechanics language, its configuration space). It is essentially composed of the
following steps:
The most popular realization of the Monte Carlo method (though not the only one !) for
molecular systems is the Metropolis method:
1. Specify the initial atom coordinates (e.g., from molecular mechanics geometry
optimization).
2. Select some atom i randomly and move it by random displacement: , , and
.
3. Calculate the change of potential energy corresponding to this displacement.
4. If accept the new coordinates and go to step 2.
5. Otherwise, if , select a random number R in the range [0,1] and:
A.
if accept the new coordinates and go to step 2,
B.
if keep the original coordinates and go to step 2.
Note that the iterations are independent of one another (i.e., the system does not contain
any ``memory''). The possibility that the system might revert to its previous state is as
probable as choosing any other state. This condition has to be satisfied if the system is to
behave as a Markov process, for which the methods of calculating ensemble (i.e., statistical)
averages are known. Another important condition to be satisfied is the continuity of the
potential energy function in order for the system to be ergodic. In this context, ergodicity
means that any state of the system can be reached from any other state. A clearly written
introduction to these concepts is given by Heermann, (1990).
The deterministic approach, called Molecular Dynamics (MD), actually simulates the time
evolution of the molecular system and provides us with the actual trajectory of the system
. The information generated from simulation methods can in principle be used to fully
characterize the thermodynamic state of the system. In practice, the simulations are
interrupted long before there is enough information to derive absolute values of
thermodynamic functions, however the differences between thermodynamic functions
corresponding to different states of the system are usually computed quite reliably.
Based on the potential energy function V, we can find components, , of the force acting
on an atom as:
To start the MD simulation we need an initial set of atom positions (i.e., geometry) and atom
velocities. In practice, the acceptable starting state of the system is achieved by
``equilibration'' and ``heating'' runs prior to the ``production'' run. The initial positions
of atoms are most often accepted from the prior geometry optimization with molecular
mechanics. Formally, such positions correspond to the absolute zero temperature. The
velocities are assigned randomly to each atom from the Maxwell distribution for some low
temperature (say 20 K). The random assignment does not allocate correct velocities and
the system is not at thermodynamic equilibrium . To approach the equilibrium the
``equilibration'' run is performed and the total kinetic energy (or temperature) of the
system is monitored until it is constant. The velocities are then rescaled to correspond to
some higher temperature, i.e., the heating is performed. Then the next equilibration run
follows. The absolute temperature, T, and atom velocities are related through the mean
kinetic energy of the system:
where N denotes the number of atoms in the system, represents the mass of the i-th
atom, and k is the Boltzmann constant. By multiplying all velocities by
we can effectively ``heat'' the system. Heating can also be realized by immersing the
system in a ``heat bath'' which stochastically (i.e., randomly) accelerates the atoms of the
molecular system. These cycles are repeated until the desired temperature is achieved and
at this point a ``production'' run can commence. In the actual software, the ``heating'' and
``equilibration'' stages can be introduced in a more efficient way by assigning velocities in
such a way that ``hot spots'' (i.e., spots in which the neighboring atoms are assigned high
velocities) are avoided.
Molecular dynamics for larger molecules or systems in which solvent molecules are
explicitly taken into account, is a computationally intensive task even for the most
powerful supercomputers, and approximations are frequently made. The most popular is
the SHAKE method which in effect freezes vibrations along covalent bonds. This method is
also applied sometimes to valence angles. The major advantage of this method is not the
removal of a number of degrees of freedom (i.e., independent variables) from the system,
but the elimination of high frequency vibrations corresponding to ``hard'' bond stretching
interactions. In simulations of biological molecules, these modes are usually of least
interest, but their extraction allows us to increase the size of the time step, and in effect
achieve a longer time range for simulations.
Figure 6.24: Periodic boundary conditions in molecular dynamics.
Another approximation is the united-atom approach where hydrogen atoms which do not
participate in hydrogen bonding are lumped into a heavier atom to form a pseudo-atom of
larger size and mass (e.g., a CH group).
Even supercomputers have their limitations and there is always some practical limit on the
size (i.e., number of atoms) of the simulated system. For situations involving solvent, the
small volume of the box in which the macromolecule and solvent are contained introduces
undesirable boundary effects. In fact, the results may depend sometimes more on the size
and shape of the box than on the molecules involved. To circumvent this limited box size
difficulty, periodic boundary conditions are used. This idea is represented in Fig. 6.24. In this
approach, the original box containing a solute and solvent molecules is surrounded with
identical images of itself, i.e., the positions and velocities of corresponding particles in all of
the boxes are identical. The common approach is to use a cubic or rectangular
parallelepiped box, but other shapes are also possible (e.g., truncated octahedron). By using
this approach, we obtain what is in effect an infinite sized system. The particle (usually a
solvent molecule) which escapes the box on the right side, enters it on the left side, due to
periodicity. Since MD simulations are usually performed as an NVE (microcanonical)
ensemble (i.e., at constant number of particles, constant volume, and constant total energy)
or an NVT (canonical) ensemble, the volume of the boxes does not change during
simulation, and the constancy in the number of particles is enforced by the periodicity of
the lattice, e.g., a particle leaving the box on left side, enters it on the right side. There are
also techniques for performing simulations in a NPT (isothermal-isobaric), and NPH
(isobaric-isoenthalpic) ensembles, where the pressure constancy during simulation is
achieved by squeezing or expanding box sizes. The constant temperature is usually
maintained by ``coupling the system to a heat bath'', i.e., by adding dissipative forces
(usually Langevine, friction type forces) to the atoms of the system which as a consequence
affects their velocities. However, each approximation has its price. In the case of periodic
boundary conditions we are actually simulating a crystal comprised of boxes with ideally
correlated atom movements. Longer simulations will be contaminated with these
artificially correlated motions. The maximum length for the simulation, before artifacts
start to show up, can be estimated by considering the speed of sound in water ( 15 Å/ps at
normal conditions). This means that for a cubic cell with a side of 60 Å, simulations longer
than 4ps will incorporate artifacts due to the presence of images.
Figure 6.25: Stochastic bondary conditions in molecular dynamics.
In the other popular approach, stochastic boundary conditions allow us to reduce the size of
the system by partitioning the system into essentially two portions: a reaction zone and a
reservoir region. The reaction zone is that portion of the system which we want to study
and the reservoir region contains the portion which is inert and uninteresting. For
example, in the case of an enzyme, the reaction zone should include the proximity of the
active center, i.e., portions of protein, substrate, and molecules of solvent adjacent to the
active center. The reservoir region is excluded from molecular dynamics calculations and
is replaced by random forces whose mean corresponds to the temperature and pressure in
the system. The reaction zone is then subdivided into a reaction region and a buffer region.
The stochastic forces are only applied to atoms of the buffer region, in other words, the
buffer region acts as a ``heat buffer''. There are several other approximations whose
description can be found in the molecular dynamics monographs quoted here.
Since molecular dynamics is a very active field of research, new approaches are reported
frequently. To facilitate the incorporation of new techniques and approaches, many
molecular dynamics software packages are organized in a modular fashion. Modules
communicate their data and results via disk files. In this way, the versatility and flexibility
of the software is achieved, and new modules can be inserted easily between the old ones
without the need of modifying existing computer code.
Statistical mechanics concepts can supply even more information from molecular
dynamics runs, since they furnish prescriptions for how to calculate the macroscopic
properties of the system (e.g., thermodynamic functions like free energy and entropy
changes, heats of reaction, rate constant, etc.) from statistical averages of elementary steps
on molecular trajectory. They also allow us to uncover a correlation between the motions of
groups or fragments of the system as well as provide various distribution functions.
Essentialy, all expressions derived below apply to both MD and MC, with the only
difference being that the ``time step'' in MD should be changed to a ``new configuration''
for MC. MD was chosen here for illustration, since it is more often applied to chemical
problems.
Three quantities form a foundation for these calculations: the partition function Z, the
Boltzmann probability functional P, and the ensemble average, <A>, of a quantity A:
where and are all generalized momenta and coordinates of the system (e.g., angular
and linear velocities and internal coordinates); is the Hamiltonian for the
system, k denotes the Boltzmann constant ( ), and T is the temperature.
The integration extends over the whole range accessible to momenta and coordinates. For
the isochoric-isothermic system of constant NTV, the Helmholtz free energy, A = U - TS = G -
PV (U, G, S and T are internal energy, Gibbs free energy, entropy and absolute temperature,
respectively) for the system of unit volume is given as:
The above equation simplifies significantly when cartesian coordinates and cartesian
momenta are used, since the kinetic energy factor of the total energy represented by the
hamiltonian in the expression for Z reduces to a constant depending upon the size,
temperature and volume of the system:
The situation is more optimistic with differences in the free energy between two closely
related states, since it can be assumed that contributions from points far apart in the
configurational space will be similar, and hence will cancel each other in this case. The free
energy difference, , between two states 0 and 1 is:
By multiplying the numerator of the fraction with the unity written as:
or, after including the integral from the denominator into the integral in the numerator:
i.e., the ensemble average of the exponential function involving the difference between
potential energy functions for state 0 and 1 calculated at coordinates obtained for the state
0. To clarify the importance of this statement, let us assume that we have two closely
related states which are described by potential functions and ,
respectively. We perform the molecular dynamics simulation using the function for the
state 0, i.e., . At each time step, , we take the current coordinates obtained
for system 0 and calculate the value not only of but also of , i.e., we
take atom coordinates derived for one state and calculate the potential energy function for
this state and for another state. Then the difference between these two values is calculated,
and the exponent of this difference taken, i.e. . We also
calculate which is needed for the Boltzmann factor. At the end of the
calculations we combine the values from all n time steps to calculate the ensemble average
in eq. 6.50 as:
The above formula is given here only for illustration, since in practical computations this
integration is done differently for the sake of efficiency. As was mentioned earlier, the
difference between states 0 and 1 should be very small (not more than
kcal/mol), i.e., a perturbation. For this reason the method is called a free energy
perturbation. If the difference between states 0 and 1 is larger, the process has to be divided
into smaller steps. In practice, the change from state 0 to 1 is represented as a function of a
suitably chosen coupling parameter such that for state 0 and for state 1. This
parameter is incorporated into the potential energy function to allow smooth transition
between the two states. For example, if a fluorine atom in C--F bond is being changed to a
chlorine, the bond stretching term in the potential energy function appears as:
where and are the stretching parameters for C--F and C--Cl bonds, respectively,
while and denote the optimal lengths for these bonds. Similar expressions exist
for all terms in the potential energy function and include provisions for
creating/annihilating atoms by converting them from/to dummy atoms. Now, the path
can be easily split into any number of desired substates spaced close
enough to satisfy the small perturbation requirement. Assuming that the process was
divided into n equal subprocesses we have: , where . However, in
actual calculations, the division is usually not uniform to account for the fact that the most
rapid changes in free energy occur often at the beginning and at the end of the path. The
simulation is run for each value of providing free energy changes between substates i and
i+1 as:
i.e., the coordinates calculated at the substate are used to calculate between states
and . Then the total change of free energy is calculated as a sum of partial
contributions:
The most important conclusion from the above is that the conversion between two states
does not have to be conducted along a physically meaningful path. Obviously, atoms cannot
be converted stepwise to one another, since they are made of elementary particles.
However, basic thermodynamics, namely, the Hess's law of constant heat summation,
ensures that the difference between free energy depends only upon initial and final states.
Any continuous path which starts at state 0 and ends at state 1, even if it is a completely
fictitious one, can be used to calculate this value. There are other methods than
perturbation of free energy to calculate the difference in free energies between states. The
most accepted one is the thermodynamic integration method where the derivative of free
energy A versus coupling parameter is calculated:
in other words, the derivative of the free energy versus is equal to the ensemble average
of the derivative of potential energy versus , and hence:
In most cases, the calculations are run in both directions, i.e., and to provide
the value of hysteresis, i.e., precision of the integration process. Large differences between
and indicate that either the simulation was not run long enough for each
of the substates, or that the number of substates is too small. While checking the hysteresis
represents a proof of the precision of the calculations, it is not a proof of the accuracy of the
results. Errors in estimating the free energy changes may result from an inadequate energy
function and in most cases from inadequate probing of the energy surface of the system.
Molecular dynamics, similar to molecular mechanics, can also suffer from the ``local
minimum syndrome''. If atom positions corresponding to the minimum of energy in state 0
are substantially different from those for state 1, the practical limitations of the length of
the computer simulation may not allow the system to explore the configuration space
corresponding to the true minimum for state 1. For this reason the set of independent
simulations, each starting at different atom positions should be performed to assess the
adequacy in sampling of the configuration space for both states.
The methods above allow one to calculate in principle the free energies of solvation,
binding, etc. In the case of free energies of solvation, for example, one solvent molecule is
converted to a solute molecule in a stepwise fashion, by creating and annihilating atoms or
changing their type as illustrated by eq. 6.52, which in effect is creating the solute molecule
de novo. Binding energies could in principle be calculated in a similar way by ``creating''
the ligand molecule in the enzyme cavity. In practice, these calculations are very difficult,
since they involve a massive displacement of solvent, i.e., the initial and final state differ
dramatically. However, in drug design we are in most cases interested not in the actual free
energies of solvation or binding, but in differences between these parameters for different
molecules, i.e., in values of (or if an isothermic-isobaric system is being
considered). The quantitative method of finding these parameters is often called a
thermodynamic cycle approach and is becoming a routine procedure in finding the
differences in free energy of binding for two different ligands or the influence of mutation
in the macromolecular receptor or enzyme on binding. As an illustration, let us assume that
we want to find the difference in the free energy of binding, ( ), between two
molecules and to a protein P. Note that is simply related to the ratio of
equilibrium binding constants:
Figure 6.26: The thermodynamic cycle for two molecules and binding to the same
protein P and corresponding free energy changes .
Since eq. 6.50 requires that the potential energy function for state 1 is calculated at the
coordinates of state 0, the number of atoms and bonds in both molecules must be equal.
The processes and involve annihilation/formation of
atoms and cannot occur in biological systems, however, they are much easier for
calculations. These pseudo reactions are realized computationally by changing the type of
atoms and groups. If molecules are of similar size the solvent molecules need not diffuse
out of the active site. If new atoms are needed, dummy atoms are converted stepwise to
real atoms. On the other hand, if molecule has fewer atoms than molecule , atoms
are annihilated by converting them to dummy atoms. As an example consider two
molecules which differ by the presence of a methyl group. ``Mutating'' the hydrogen atom
into a methyl group is illustrated in Fig. 6.27.
Figure 6.27: Conversion of a hydrogen atom into a methyl group. Note that the number of
atoms and bonds is conserved during the conversion.
A molecular dynamics simulation is first performed to calculate , i.e., the free energy
change corresponding to ``mutating'' molecule into in the solution. In this
simulation the protein molecule is not present in the system since it is inert in this reaction
(i.e., ``reduces'' on both sides of the chemical equation). The second simulation involves
calculation of , i.e., the molecule is converted to molecule inside the active site
of the protein P. The is then calculated as .
It cannot be stressed strongly enough that both of these free energy changes, and
, are non-physical. is not a difference between the free energy of solvation of
molecules and unless the ``molecules'' are mononuclear. Calculation of the
difference in solvation energies would require a dynamics simulation for molecules in a
vacuum to account for the differences in the interaction energy and the entropy of free and
solvated molecules. Similarly, alone is not a difference between the free energies of
binding, since it contains no information about that solvent interaction with the part of the
molecule which is submerged in the active site. The differences in the solvent-solute
interaction between ligands may comprise a major portion of their relative affinity for the
protein.
In these instances where we know the structure of the receptor protein, the
thermodynamic cycle method is a very promising tool for drug design, since it allows us to
actually calculate the differences between free energies of binding for molecules which
have not even been synthetized yet! Likewise, it allows assessing the impact of variations in
protein structure (e.g., mutant versus wild type protein) on the binding, i.e., perform
genetic engineering on the computer. This is why the X-ray crystallography or NMR studies
of proteins combined with molecular simulation techniques form a framework for the drug
design of the future, where prospective drug molecules will be first inferred from theory,
and only later synthetized. Morerover, intensive research on deducing protein tertiary
structure from the knowledge of its primary sequence is bound to result in new
possibilities for folding of proteins in the computer. With the astonishing advances in the
power of computers and progress in methodology, there is no doubt that the importance of
these techniques to rational drug design will increase steadily.
Computational Chemistry
Wed Dec 4 17:47:07 EST 1996
[ CCL Home Page ]
[ About CCL ] [ Resources ] [ Search CCL ] [ Announcements ] [ Links ] [ E-mail us ]
[ Raw Version of this page ]