0% found this document useful (0 votes)
60 views146 pages

Topics in Statistical Mechanics: The Foundations of Molecular Simulation

This document outlines the key principles of molecular simulation covered in the course "The Foundations of Molecular Simulation". The course covers the theoretical basis of Monte Carlo and molecular dynamics simulations and recent developments in the field. It discusses techniques such as hybrid Monte Carlo, parallel tempering, and various integration schemes for simulating rigid, constrained, and unconstrained systems. The document provides an overview of the topics, instructors, meeting details, grading structure, and recommended reference books for the course.

Uploaded by

supitan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views146 pages

Topics in Statistical Mechanics: The Foundations of Molecular Simulation

This document outlines the key principles of molecular simulation covered in the course "The Foundations of Molecular Simulation". The course covers the theoretical basis of Monte Carlo and molecular dynamics simulations and recent developments in the field. It discusses techniques such as hybrid Monte Carlo, parallel tempering, and various integration schemes for simulating rigid, constrained, and unconstrained systems. The document provides an overview of the topics, instructors, meeting details, grading structure, and recommended reference books for the course.

Uploaded by

supitan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 146

Topics in Statistical Mechanics:

The Foundations of Molecular Simulation


J. Schofield/R. van Zon
CHM1464H Fall 2008, Lecture Notes

About the course


The Foundations of Molecular Simulation
This course covers the basic principles involved in simulating chemical and physical systems
in the condensed phase. Simulations are a means to evaluate equilibrium properties such free
energies as well as dynamical properties such as transport coefficients and reaction rates. In
addition, simulations allow one to gain insight into molecular mechanisms. After presenting
the theoretical basis of Monte Carlo and molecular dynamics simulations, particular attention
is given to recent developments in this field. These include the hybrid Monte Carlo method,
parallel tempering, and symplectic and other integration schemes for rigid, constrained, and
unconstrained systems. Time permitting, techniques for simulating quantum systems and
mixed quantum-classical systems are also discussed.

Organizational details
Location: WE 76 (Wetmore Hall, New College)
Dates and Time: Wednesdays, 3:00 - 5:00 pm

Instructors
Prof. Jeremy Schofield
Office: Lash Miller 420E
Telephone: 978-4376
Email: jmschofi@chem.utoronto.ca
Office hours: Tuesdays, 2:00 pm - 3:00 pm
Dr. Ramses van Zon
Office: Lash Miller 423
3

4
Telephone: 946-7044
Email: rzon@chem.utoronto.ca
Office hours: Fridays, 3:00 pm - 4:00 pm

Grading
1. 4 problem sets

70%

2. Short literature report (3500 word limit)


or simulation project

30%

Suggested reference books


D. A. McQuarrie, Statistical Mechanics
D. Frenkel and B. Smit, Understanding Molecular Dynamics: From Algorithms to
Applications (Academic Press, 2002) 2nd ed. Through the UoT library:
www.sciencedirect.com/science/book/9780122673511 .
D. C. Rapaport, The Art of Molecular Dynamics Simulations (Cambridge U. P., 2004).
W. H. Press, S. A. Teukolvsky, W. T. Vettering, and C. P. Flannery, Numerical
Recipes: The Art of Scientific Computing (Cambridge University Press, 1992) 2nd ed.
www.nrbook.com/a/bookcpdf.php (c), www.nrbook.com/a/bookfpdf.php (fortran).

Contents
1 Review
1.1 Classical Mechanics . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Ensembles and Observables . . . . . . . . . . . . . . . . . . .
1.3 Liouville Equation for Hamiltonian Systems . . . . . . . . . .
1.3.1 Equilibrium (stationary) solutions of Liouville equation
1.3.2 Time-dependent Correlation Functions . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

9
9
12
17
20
20

2 Numerical integration and importance sampling


2.1 Quadrature . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Importance Sampling and Monte Carlo . . . . . . . . .
2.3 Markov Chain Monte Carlo . . . . . . . . . . . . . . .
2.3.1 Ensemble averages . . . . . . . . . . . . . . . .
2.3.2 Markov Chains . . . . . . . . . . . . . . . . . .
2.3.3 Construction of the transition matrix K(y x)
2.4 Statistical Uncertainties . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

23
23
24
28
28
28
33
36

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

3 Applications of the Monte Carlo method


3.1 Quasi-ergodic sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Umbrella sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Simulated annealing and parallel tempering . . . . . . . . . . . . . . . . . .
3.3.1 High temperature sampling . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Extended state space approach: Simulated Tempering, Marinari and
Parisi, 1992 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Parallel Tempering or Replica Exchange, C.J. Geyer, 1991 . . . . . .

41
41
42
43
43

4 Molecular dynamics
4.1 Basic integration schemes . . . . . . . . . . . . . . . . . . . .
4.1.1 General concepts . . . . . . . . . . . . . . . . . . . . .
4.1.2 Ingredients of a molecular dynamics simulation . . . .
4.1.3 Desirable qualities for a molecular dynamics integrator
4.1.4 Verlet scheme . . . . . . . . . . . . . . . . . . . . . . .

47
47
47
52
56
60

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

44
45

CONTENTS

4.2
4.3
4.4
4.5

4.1.5 Leap Frog scheme . . . . . . . . . . . . . . . . . . . .


4.1.6 Momentum/Velocity Verlet scheme . . . . . . . . . .
Symplectic integrators from Hamiltonian splitting methods .
The shadow or pseudo-Hamiltonian . . . . . . . . . . . . . .
Stability limit of the Verlet scheme for harmonic oscillators .
More accurate splitting schemes . . . . . . . . . . . . . . . .
4.5.1 Optimized schemes . . . . . . . . . . . . . . . . . . .
4.5.2 Higher order schemes from more elaborate splittings .
4.5.3 Higher order schemes using gradients . . . . . . . . .
4.5.4 Multiple time-step algorithms . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

5 Advanced topics
5.1 Hybrid Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 The Method . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Application of Hybrid Monte-Carlo . . . . . . . . . . . . .
5.2 Time-dependent correlations . . . . . . . . . . . . . . . . . . . . .
5.3 Event-driven simulations . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Implementation of event-driven dynamics . . . . . . . . . .
5.3.2 Generalization: Energy discretization . . . . . . . . . . . .
5.4 Constraints and Constrained Dynamics . . . . . . . . . . . . . . .
5.4.1 Constrained Averages . . . . . . . . . . . . . . . . . . . . .
5.4.2 Constrained Dynamics . . . . . . . . . . . . . . . . . . . .
5.5 Statistical Mechanics of Non-Hamiltonian Systems . . . . . . . . .
5.5.1 Non-Hamiltonian Dynamics and the Canonical Ensemble .
5.5.2 Volume-preserving integrators for non-Hamiltonian systems
A Math Appendices
A.1 Taylor expansion . . . . . . . . . . . . .
A.2 Series expansions . . . . . . . . . . . . .
A.3 Probability theory: . . . . . . . . . . . .
A.3.1 Discrete systems . . . . . . . . .
A.3.2 Continuous Systems . . . . . . .
A.3.3 Gaussian distributions . . . . . .
A.4 Fourier and Laplace Transforms . . . . .
A.5 Calculus . . . . . . . . . . . . . . . . . .
A.5.1 Integration by parts . . . . . . .
A.5.2 Change of Variable and Jacobians

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

61
62
65
73
75
79
80
82
84
85

.
.
.
.
.
.
.
.
.
.
.
.
.

87
87
87
90
94
96
99
101
104
104
107
114
118
124

.
.
.
.
.
.
.
.
.
.

129
129
130
130
130
131
132
133
134
134
134

CONTENTS
B Problem sets
B.1 Problem Set
B.2 Problem Set
B.3 Problem Set
B.4 Problem Set

1
2
3
4

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

137
137
139
143
145

CONTENTS

1
Review
1.1

Classical Mechanics

1-Dimensional system with 1 particle of mass m


Newtons equations of motion for position x(t) and momentum p(t):
dx
dt
F (t) = ma(t)
dV
F (t) =
dx
x(t)

p = mx
a(t) = x(t)

p(t)

= m
x(t) = F (t) =

dV
dx

Define an energy function called the Hamiltonian H(x, p) =

p2
2m

+ V (x).

Introduce terminology
p2
= kinetic energy
2m

V (x) = potential energy

Newtons laws can then be expressed as:

x =

p
H
=
m
p

p =

dV
H
=
.
dx
x

These are coupled ordinary differential equations whose solution is uniquely specified by specifying two conditions, such as x0 = x(0) and p0 = p(0) at some
reference time t0 = 0.
9

10

1. REVIEW
3-dimensional system of 1 particle
Notation: r = (x, y, z) and p = (px , py , pz ). Also, p p = p2x + p2y + p2z .
The Hamiltonian is:

pp
2m

+ V (r).

The equations of motion are:

px
rx
ry = 1 py
m
pz
rz

r =

H
p
=
p
m

p =

shorthand for

V
H
=
r
r

2 particles in 3-dimensions
Hamiltonian: H =

p1 p1
2m1

p2 p2
2m2

+ V (r1 , r2 )

Equations of motion are:


p
H
= 1
p1
m1
H
p1 =
r1

H
p
= 2
p2
m2
H
p2 =
r2
r2 =

r1 =

Introduce generalized notation: r(2) = (r1 , r2 ) and p(2) = (p1 , p2 ).


p(2) p(2) = p1 p1 + p2 p2
Equations of motion in this notation:
r (2) =

H
p(2)

p (2) =

H
.
r(2)

N particle system in 3-D


Equation of motion in generalized notation:
r (N ) =

H
p(N )

p (N ) =

H
.
r(N )

A total of 6N equations!
At each point in time, the system is specified by 6N coordinates (r(N ) (t), p(N ) (t))
x(N ) (t) called the phase point.
The set of all phase points is called phase space.
Classical dynamics describes a path through the 6N -Dimensional phase space.

1.1. CLASSICAL MECHANICS

11

Special properties of path through phase space:


1. Certain quantities remain unchanged during the evolution of system.
Examples: energy, momentum and angular momentum may be conserved
(constant) along the path or trajectory of the system.
Path remains on a hyper-surface of constant energy in phase space.
2. Paths never cross in phase space. Each disjoint path, labelled by initial
conditions, passes arbitrarily close to any point on the constant energy hypersurface.
Amount of time for the trajectory of the system from a given initial point
in phase space to pass arbitrarily close to the initial point is called the
recurrence time: Absolutely enormous for large, interacting systems.
Consider an arbitrary function G of the phase space coordinate x(N ) ,
G(r(N ) , p(N ) , t) = G(x(N ) , t).
Taking the time derivative,
G(x(N ) , t) G(x(N ) , t) (N ) G(x(N ) , t) (N )
dG(x(N ) , t)
=
+
r
+
p
dt
t
r(N )
p(N )
G(x(N ) , t) G(x(N ) , t) H
G(x(N ) , t) H
=
+

(N ) .
t
r(N )
p(N )
p(N )
r
We can define the Liouville operator L to be:
L=

H
(N ) (N )
(N
)
p
r
r
p(N )

so that in terms of a general function B


LB =

B
H
B
H

(N ) .
(N
)
(N
)
(N
)
r
p
p
r

In terms of the Liouville operator,


dG(x(N ) , t)
G(x(N ) , t)
=
+ LG(x(N ) , t).
dt
t
Functions of the phase space coordinate G that are not explicit functions of time
t are conserved by the dynamics if LG = 0.
Formal solution of evolution is then
G(x(N ) , t) = eLt G(x(N ) , 0).

12

1. REVIEW
In particular,
x(N ) (t) = eLt x(N ) (0).
Note that LH = 0.
Can also define the Poisson bracket operator via
{A, B}

A
B
A
B

(N ) .
(N
)
(N
)
(N
)
r
p
p
r

The relationship between the Poisson bracket and Liouville operators is


LB = {B, H}

dG(x(N ) , t)
G(x(N ) , t)
=
+ {G(x(N ) , t), H(x(N ) )}.
dt
t

so

Important property:



eLt A(x(N ) )B(x(N ) ) = eLt A(x(N ) ) eLt B(x(N ) ) = A(x(N ) (t))B(x(N ) (t)).

1.2

Ensembles and Observables

Consider some arbitrary dynamical variable G(r(N ) , p(N ) ) = G(x(N ) ) (function of phase
space coordinates and hence possibly evolving in time).
An experimental measurement of quantity corresponds to a time average of some (possibly short) sampling interval .
1
Gobs (t) = G(t)


d G r(N ) (t + ), p(N ) (t + ) .

 m . where m is a microscopic time scale. Hence fluctuations on microscopic


time scale a smoothed out.
For most systems, evolution of G(t) cannot be solved analytically and so must
resort to
1. Numerically solving evolution (computer simulation)
2. Developing a new theoretical framework relating time averages to something
that can be calculated.
Ensemble Average: Infinite/long time average of dynamical variable corresponds to
an average over a properly weighted set of points of phase space (called an ensemble).
The statistical average is called an ensemble average.
Each point in phase space corresponds to a different configuration of the system.

1.2. ENSEMBLES AND OBSERVABLES

13

Ensemble average therefore corresponds to a weighted average over different configurations of the system.
Define a probability density for phase space (often loosely called the distribution
function):
f (r(N ) , p(N ) , t) = distribution function
and hence
f (r(N ) , p(N ) , t)dr(N ) dp(N )

prob. of finding a system in ensemble with


= coordinates between (r(N ) , r(N ) +dr(N ) ) and
(p(N ) , p(N ) + dp(N ) ) at time t.

Note that the distribution function is normalized:


Z
dr(N ) dp(N ) f (r(N ) , p(N ) , t) = 1
The ensemble average is defined as:
Z
hG(t)i

dr(N ) dp(N ) G(r(N ) , p(N ) ) f (r(N ) , p(N ) , t).

microcanonical ensemble: All systems in ensemble have the same total energy.
All dynamical trajectories with same energy compose a set of states in microcanonical ensemble.
Technically, all conserved quantities should also be the same.
What is the connection between the ensemble average and the experimental observation
(time average)?
Quasi-ergodic hypothesis: As t , a dynamical trajectory will pass arbitrarily
close to each point in the constant-energy (if only conserved quantity) hypersurface of
phase space (metrically transitive).
Another statement: For all initial states except for a set of zero measure, the
phase space is connected through the dynamics.
Hypersurfaces of phase space covered by trajectory.

14

1. REVIEW
So in some sense, as :, we expect
Z
Z

1
1 0 (N ) (N )
(N )
(N )
Gobs (t) =
d G r (t + ), p (t + ) =
dr dp
G(r(N ) , p(N ) )
0

where
Z
=

dr

(N )

dp

(N )

Z
=

dr(N ) dp(N )

E<H(x(N ) )<E+E

hence
Z
Gobs (t) = G(t) =

G(r(N ) , p(N ) )f (r(N ) , p(N ) , t) dr(N ) dp(N )

if f (r(N ) , p(N ) , t) = 1/.

All points on hypersurface have the same weight (equally probable).


Ensemble analogy: each point in restricted phase space corresponds to a configuration of the system with the same macroscopic properties.
Can utilize an axiomatic approach to find equilibrium distributions: Maximize statistical entropy subject to constraints.
Alternate method: Asymptotic solution of the Boltzmann equation for distribution
functions - describes collisions of pairs from Newtons equations and adds an assumption of statistical behavior (molecular chaos).
System naturally evolves from an initial state to states with static macroscopic
properties corresponding to equilibrium properties - Can model this with simple
spin systems like the Kac ring model.
Measure of disorder, the statistical entropy, increases as the system evolves: maximized in equilibrium (H theorem).
Canonical Ensemble
Remove restriction of defining probability only on constant energy hypersurface.
Allow total energy of systems in ensemble to vary (hopefully) narrowly around a fixed
average value.
f (x(N ) ) =

1
exp{(A H(x(N ) ))}
N !h3N

A is the Helmholtz free energy.

1.2. ENSEMBLES AND OBSERVABLES

15

We define the partition function QN (T, V ) by


Z
1
QN (T, V ) =
dx(N ) exp{H(x(N ) )} = exp{A}
3N
N !h
so by normalization
f (x(N ) ) =

1
1 exp{H(x(N ) )}
(N )
.
exp{(A

H(x
))}
=
N !h3N
N !H 3N
QN (T, V )

Relation A = kT ln QN (T, V ) gives thermodynamic connection: For example


1. The pressure is:




A
ln QN
P =
= kT
.
V T
V
T
2. The chemical potential is:


A
=
N T,V
3. The energy is:
Z
exp{A}
dx(N ) H(x(N ) ) exp{H(x(N ) )}
E =
N !h3N
Z
exp{A}

dx(N ) exp{H(x(N ) )}
N !h3N

1 QN
ln QN
=
=
.
QN

We can write the canonical partition function as:


Z
1
QN (T, V ) =
dx(N ) exp{H(x(N ) )}
N !h3N
Z
Z
1
dx(N ) exp{H(x(N ) )}(E H(x(N ) ))
dE
=
3N
N !h


Z
Z0
1
(N )
(N )
=
dE exp{E}
dx
(E H(x ))
N !h3N
0
Z
QN (T, V ) =
dE exp{E}N (E)
0

where
Z
1
N (E)
dx(N ) (E H(x(N ) ))
3N
N !h
= density of unique states at energy E (microcanonical partition function).

16

1. REVIEW

Relationship between ensemble averages


How likely are we to observe a system in the canonical ensemble with an energy very
different from the average energy E = hH(x(N ) )i? From the Tchebycheff inequality,
we find that



2
P r H(x(N ) ) E E E 2
2 E
Now the variance in the energy is:


2 ln QN
E
= kT 2 Cv
E2 = H(x(N ) )2 hH(x(N ) )i2 =
=

and hence


 kT 2 Cv
P r H(x(N ) ) E E
2
2 E
For an ideal gas system, E = 3/2N kT and hence Cv = 3/2N k.
Typically, E N and Cv N .


 kT 2 Cv
1

P r H(x(N ) ) E E
2
N 2
2 E
As N increases, it becomes less and less likely to observe a system with energy
very different from E,

(N )

hB(x

Z
)icanon =

(1 + O(1/N )) .
dE P (E)hB(x(N ) )imicro at E hB(x(N ) )i
micro at E

P (E) is sharply-peaked around E = E: Can show


P (E) P (E)

1
2E2

1/2

(E E)2
exp
2kT 2 Cv


Relative spread of energy E /E N 1/2 .

1.3. LIOUVILLE EQUATION FOR HAMILTONIAN SYSTEMS

1.3

17

Liouville Equation for Hamiltonian Systems

Define small volume element V0 in phase space.


How does probability of finding the system in this region change in time?

dX0N f (X0N , 0)

P (V0 ) =
V0

Allow system to evolve according to dynamics:

X0N

V0

N
Xt

Vt

Volume changes shape in mapping:


N
X0N Xt
' X0N + X 0N t
X0N + X N

Maybe changes volume as well.


Number of states is V0 and Vt is same since we follow all points in original volume.
Can only change if some points in V0 arent in Vt (flow out of volume).
So P (V0 , 0) = P (Vt , t): Conservation of probability (like fluid where particles arent
created or destroyed.)
N
Changing variables from X0N to Xt
,

Z
P (V0 ) =

dX0N

f (X0N , 0)

N
N
N
X N , t t)
dXt
J(X N ; Xt
)f (Xt

Vt

V0

= Pt (Vt )

since P (V0 , 0) = P (Vt , t).

N
Recall that Xt
X0N X0N .

18

1. REVIEW
Evaluation of the Jacobian is a bit complicated, but gives
N
N
J(X0N ; Xt
) = Jacobian for transform X0N = Xt
X N


X0N
= 1 X N X N
=
N
X
t

So
Z
Pt (Vt ) = P (V0 ) =


N
N
dXt
(1 X N X N ) f Xt
X N , t t

Vt

for small X N .
What is X N ?
N
' X0N + X 0N t, or X N = X 0N t.
For Hamiltonian systems Xt

Expanding for small displacements X0N and small time intervals t:




N
N
f Xt
X N , t t ' f Xt
, t

f
1

t (X N f ) X N +
2X N f (X N )2 + . . .
t
2
Inserting this in previous equation for Pt (Vt ) = P (V0 ), we get
Z
N
Pt (Vt ) = Pt (Vt ) +
dXt
Vt


f
1 2
N
N 2

t X N (X f ) + X N f (X )
t
2
or


Z
1 2
f
N
N 2
N
t X N (X f ) + X N f (X ) = 0
dXt
t
2
Vt
Since this holds arbitrary volume Vt , the integrand must vanish.
f
1
t = X N (X N f ) + 2X N f (X N )2 +
t
2
Now, let us evaluate this for X N = X 0N t
To linear order

 in t


N
N
N

X N X0 f t = X X N f + X N X f t
but
R N
P N
H
H
X N X N =
+
=

= 0!
N
N
N
N
R
P
R P
P N RN

1.3. LIOUVILLE EQUATION FOR HAMILTONIAN SYSTEMS

19

Note that this implies the volume element does not change with normal
Hamiltonian propagation:


N
N
N
N
N
N
N

dX0 = dXt J(X ; Xt ) = dXt 1 X N X t = dXt


.
Also, (X N )2 O(t)2 since X N t, so
f
t = X N X N f t + O(t)2
t
In the short-time limit,
f
= X N X N f
t
Recall

R N RN + P N P N G


H
H
=
R N
P N G LG = {G,H}
P N
RN

X N X N G =

So we obtain the Liouville equation:


f
= Lf = {f, H} .
t
The formal solution is:
f (x(N ) , t) = eLt f (x(N ) , 0).
Also note:
f
df (X N , t)
+ X N X N f =
= 0.
t
dt
Interpretation:
f (r(N ) (0), p(N ) (0), 0) = f (r(N ) (t), p(N ) (t), t)
f (r(N ) (0), p(N ) (0), t) = f (r(N ) (t), p(N ) (t), 0).
If follow an initial phase point from time 0 to time t, probability density doesnt change
(i.e. you go with the flow).
Probability density near phase point x(N ) (0) at time t is the same as the initial probability density at backward-evolved point x(N ) (t).

20

1. REVIEW

1.3.1

Equilibrium (stationary) solutions of Liouville equation

Not a function of time, meaning f (RN , P N , t) = f (RN , P N ) or


f
= Lf = {f, H} = {H, f } = 0.
t
Recall that we showed that energy is conserved by the dynamics so

dH
dt

= 0.

Suppose f (RN , P N , t) is an arbitrary function of H(RN , P N ).

H
f
H
f
f
= {H, f (H)} =

t
RN P N
P N RN
but
f H
f
=
P N
H P N
f
=
t

f H
f
=
RN
H RN

H
H
H
H

RN P N
P N RN

f
=0
H

Thus any funct. of H is stationary solution of Liouville equation!


In particular, both the microcanonical and canonical distribution functions are solutions of the Liouville equation.

1.3.2

Time-dependent Correlation Functions

Consider the time-dependent correlation function CAB (t) in the canonical ensemble
Z


(N )
(N )
A(x , t)B(x , 0) = dx(N ) A(x(N ) , t)B(x(N ) , 0)f (x(N ) ).
From the form of the Liouville operator, for arbitrary functions A and B of the phase
space coordinates



A(x(N ) , t)B(x(N ) , t) = eLt A(x(N ) , 0) eLt B(x(N ) , 0) = eLt A(x(N ) , 0)B(x(N ) , 0) .
It can be shown by integrating by parts that:


LA(x(N ) ) B(x(N ) ) = A(x(N ) ) LB(x(N ) ) .

1.3. LIOUVILLE EQUATION FOR HAMILTONIAN SYSTEMS


Consequence:


A(x(N ) , t)B(x(N ) , 0) = A(x(N ) )B(x(N ) , t) .

The autocorrelation function CAA (t) is therefore an even function of time.


Also,
Z

(N )

dx

Lt

(N )

e A(x


, 0) f (x(N ) , 0) =

Z
Z


dx(N ) A(x(N ) , 0) eLt f (x(N ) , 0)
dx(N ) A(x(N ) , 0)f (x(N ) , t)

For an equilbrium system where f (x(N ) , t) = f (x(N ) ),


hA(t)i = hA(0)i
hA(t + )B( )i = hA(t)B(0)i .

21

22

1. REVIEW

2
Numerical integration and importance
sampling
2.1

Quadrature

Consider the numerical evaluation of the integral


Z

dx f (x)

I(a, b) =
a

Rectangle rule: on small interval, construct interpolating function and integrate over
interval.
Polynomial of degree 0 using mid-point of interval:
Z

(a+1)h

dx f (x) h f ((ah + (a + 1)h)/2) .


ah

Polynomial of degree 1 passing through points (a1 , f (a1 )) and (a2 , f (a2 )): Trapezoidal rule


Z a2
x a1
a2 a1
(f (a2 ) f (a1 ))
dx f (x) =
(f (a1 ) + f (a2 )) .
f (x) = f (a1 ) +
a2 a1
2
a1
Composing trapezoidal rule n times on interval (a, b) with even sub-intervals
[kh, (k + 1)h] where k = 0, . . . , n 1 and h = (b a)/n gives estimate
!
Z b
n1
b a f (a) + f (b) X
dx f (x)
+
f (a + kh) .
n
2
a
k=1
23

24

2. NUMERICAL INTEGRATION AND IMPORTANCE SAMPLING


Simpsons rule: interpolating function of degree 2 composed n times on interval (a, b):
Z

dx f (x)
a

ba
[f (a) + 4f (a + h) + 2f (a + 2h) + 4f (a + 3h) + 2f (a + 4h) + + f (b)] .
3n

Error bounded by


h4
(b a) f (4) () .
180
where (a, b) is the point in the domain where the magnitude of the 4th
derivative is largest.
Rules based on un-even divisions of domain of integration
w(x) = weight function.
P
I(a, b) ni=1 wi fi , where wi = w(xi ) and fi = f (xi ) with choice of xi and wi
based on definite integral of polynomials of higher order.
Example is Gaussian quadrature with n points based on polynomial of degree
2n 1: well-defined procedure to find {xi } and {wi } (see Numerical Recipes).
Error bounds for n-point Gaussian quadrature are
(b a)2n+1 (n!)4 (2n)
f () for (a, b)..
(2n + 1)! [(2n)!]3
For multi-dimensional integrals, must place ni grid points along each i dimension.
Number of points in hyper-grid grows exponentially with dimension.
Unsuitable for high-dimensional integrals.

2.2

Importance Sampling and Monte Carlo

Suppose integrand f (x) depends on multi-dimensional point x and that integral over hypervolume
Z
I=
dx f (x)
V

is non-zero only in specific regions of the domain.


We should place higher density of points in region where integrand is large.
Define a weight function w(x) that tells us which regions are significant.

2.2. IMPORTANCE SAMPLING AND MONTE CARLO

25

Require property w(x) > 0 for any point x in volume.


R
Sometimes use normalized weight function so V dx w(x) = 1, though not strictly
necessary.
Re-express integral as:
Z
f (x)
dx
w(x).
I=
w(x)
V
Idea: Draw a set of N points {x1 , . . . , xN } from the weight function w(x) then
N
1 X f (xi )
.
I=
N i=1 w(xi )

As N , I I.
How does this improve the rate of convergence of the calculation of I? We will see that
the statistical uncertainty is related to the variance I2 of the estimate of I, namely
I2

N
1 X
=
hIi Ii i
N i

where

Ii =

f (xi )
I.
w(xi )

and we have assumed that the random variables Ii are statistically independent. Here
h i represents the average over the true distribution of f /w that is obtained in the
limit N .
Vastly different values of ratio f (xi )/w(xi ) lead to large uncertainty.
The error is minimized by minimizing I2 .
If w(xi ) = f (xi ), then f (xi )/w(xi ) = and
*


2 +
f (xi )
f (xi )
=I=
= 2 ,
w(xi )
w(xi )
and I2 = 0.
Note that writing w(xi ) = f (xi )/ requires knowing = I, the problem we are
trying to solve.
Generally desire all f (xi )/w(xi ) to be roughly the same for all sampled points xi
to mimimize I2 .
Rb
Rb
f (x)
Example in 1-dimensional integral I = a dx f (x) = a dx w(x)
w(x).

26

2. NUMERICAL INTEGRATION AND IMPORTANCE SAMPLING


Monte-Carlo sampling: use random sampling of points x with weight w(x) to
estimate ratio f (x)/w(x).
How do we draw sample points with a given weight w(x)? Consider simplest case
of a uniform distribution on interval (a, b):
 1
if x (a, b).
ba
w(x) =
0
otherwise.
Use a random number generator that gives a pseudo-random number rand in
interval (0, 1).
xi = a + (b a) rand,
then {xi } are distributed uniformly in the interval (a, b).
We find that the estimator of f is then:
I=

N
N
N
1 X f (xi )
baX
1 X
=
Ik ,
f (xi ) =
N i=1 w(xi )
N i=1
N k=1

where each Ik = (b a) f (xk ) is an independent estimate of the integral.


Like trapezoidal integration but with randomly sampled points {xi } from (a, b)
each with uniform weight.
How about other choices of weight function w(x)?
It is easy to draw uniform yi (0, 1). Now suppose we map the yi to xi via
Z x
dz w(z).
y(x) =
a

Note that y(a) = 0 and y(b) = 1 if w(x) is normalized over (a, b).
How are the x distributed? Since the y are distributed uniformly over (0, 1), the
probability of finding a value of y in the interval is
dy(x) = w(x) dx,
so the x are distributed with weight w(x).
It then follows that the one-dimensional integral can be written in the transformed
variables y as:
Z b
Z y(b)
Z 1
f (x)
f (x(y))
f (x(y))
I=
dx w(x)
=
dy
=
dy
w(x)
w(x(y))
w(x(y))
a
y(a)
0
Integral easily evaluated by selecting uniform points y and then solving x(y).
Must be able to solve for x(y) to be useful.

2.2. IMPORTANCE SAMPLING AND MONTE CARLO

27

Procedure:
1. Select N points yi uniformly on (0, 1) using yi = rand.
Rx
2. Compute xi = x(yi ) by inverting y(x) = a dz w(z).
P
3. Compute estimator for integral I = N1 N
i=1 f (xi )/w(xi ).
This procedure is easy to do using simple forms of w(x). Suppose the integrand is
strongly peaked around x = x0 . One good choice of w(x) might be a Gaussian weight
w(x) =

1
2 2

(xx0 )2
2 2

where the variance (width) of the Gaussian is treated as a parameter.


R
R
If I = dx f (x) = dx w(x)f (x)/w(x), can draw randomly from the Gaussian weight by:
Z x
(
xx0 )2
1
y(x) =
d
x e 22
2 2 Z
x
(
xx0 )2
1
1

+
d
x e 22
=
2
2 2 0



Z xx
0
2
1
1
1
x x0
w2
+
dw e
=
=
1 + erf
.
2
2
0
2

Inverting gives xi = x0 + 2 ierf(2yi 1), where ierf is the inverse error function
with series representation



3 7 2 5
ierf(z) =
z+ z +
z + ...
2
12
480
The estimator of the integral is therefore
N

I=

X
(xi x0 )2
1
2 2
f (xi )e 22 .
N
i=1

This estimator reduces the variance I2 if w(xi ) and f (xi ) resemble one another.
Another way to draw from a Gaussian:

Draw 2 numbers, y1 and y2 uniformly on (0, 1). Define R = 2 ln y1 and =


2y2 . Then x1 = R cos and x2 = R sin are distributed with density
w(x1 , x2 ) =

1 x21 x22
e e
2

28

2. NUMERICAL INTEGRATION AND IMPORTANCE SAMPLING


since



dy1 dy2 =

2.3
2.3.1

y1
x1
y2
x1

y1
x2
y2
x2




dx1 dx2 = w(x1 , x2 )dx1 dx2

Markov Chain Monte Carlo


Ensemble averages

Generally, we cannot find a simple way of generating the {xi } according to a known
w(xi ) for high-dimensional systems by transforming from a uniform distribution.
Analytical integrals for coordinate transformation may be invertable only for separable coordinates.
Many degrees of freedom are coupled so that joint probability is complicated and
not the product of single probabilities.
Typical integrals are ensemble averages of the form:
1
hAi =
Z

Z
dr

(N ) u(r(N ) )

(N )

R
A(r

(N )

)=

dr(N ) eu(r ) A(r(N ) )


R
dr(N ) eu(r(N ) )
V

Typical potentials are complicated functions of configuration r(N ) .


A good importance sampling weight would set w(r(N ) ) = eu(r

(N ) )

How do we sample configurations from a general, multi-dimensional weight?


(N )

(N )

Goal: Devise a method to generate a sequence of configurations {r1 , . . . , rn }


(N )
in which the probability of finding a configuration ri in the sequence is given by
(N )
(N )
w(ri )dri .
We will do so using a stochastic procedure based on a random walk.

2.3.2

Markov Chains

We will represent a general, multi-dimensional configurational coordinate that identifies the


state by the vector X. We wish to generate a sequence of configurations X1 , . . . , Xn where
each Xt is chosen with probability density Pt (Xt ). To generate the sequence, we use a
discrete-time, time-homogeneous, Markovian random walk.
Consider the configuration X1 at an initial time labelled as 1 that is treated as a
random variable drawn from some known initial density P1 (X).

2.3. MARKOV CHAIN MONTE CARLO

29

We assume the stochastic dynamics of the random variable is determined by a timeindependent transition function K(Y X), where the transition function defines the
probability density of going from state Y to state X in a time step.
Since K is a probability density, it must satisfy
Z
dX K(Y X) = 1
K(Y X) 0.
The state Xt at time t is assumed to be obtained from the state at time Xt1 by a
realization of the dynamics, where the probability of all transitions is determined by
K.
At the second step of the random walk, the new state X2 is chosen from P2 (X|X1 ) =
K(X1 X). At the third step, the state X3 is chosen from P3 (X|X2 ) = K(X2 X),
and so on, generating the random walk sequence {X1 , . . . , Xn }.
If infinitely many realizations of the dynamics is carried out, we find that the distribution of Xi for each of the i steps are
P1 (X) R
P2 (X) = R dY K(Y X)P1 (Y)
P3 (X) = dY K(Y X)P2 (Y)
..
.
R
Pt (X) = dY K(Y X)Pt1 (Y)

for X1
for X2
for X3
..
.
for Xt .

The transition function K is ergodic if any state X can be reached from any state Y in
a finite number of steps in a non-periodic fashion.
If K is ergodic, then the Perron-Frobenius Theorem guarantees the existence of a unique
stationary distribution P (X) that satisfies
Z
P (X) = dY K(Y X)P (Y)
and
lim Pt (X) = P (X).
t

Implications
1. From any initial distribution of states P1 (X), an ergodic K guarantees that states
Xt will be distributed according to the unique stationary distribution P (X) for
large t (many steps of random walk).
2. P (X) is like an eigenvector of matrix K with eigenvalue = 1.
3. Goal is then to design an ergodic transition function K so that the stationary or
limit distribution is the Bolztmann weight w(X).

30

2. NUMERICAL INTEGRATION AND IMPORTANCE SAMPLING

Finite State Space


To make the Markov chain more concrete, consider a finite state space in which only m
different configurations of the system exist. The phase space then can be enumerated as
{X1 , . . . Xm }.
The transition function is then an m m matrix K, where the element Kij 0 is the
transition probability of going from state Xj to state Xi .
Since the matrix elements represent transition probabilities, for any fixed value of j,
m
X

Kij = 1.

i=1

(1)

(m)

The distribution Pt (X) corresponds to a column vector Pt = col[at , . . . at ], where


(i)
at is the probability of state Xi at time t.
The distribution evolves under the random walk as Pt = K Pt1 .
The matrix K is regular if there is an integer t0 such that Kt0 has all positive (non-zero)
entries. Then Kt for t t0 has all positive entries.
Suppose the matrix K is a regular transition matrix. The following properties hold:
Statements following from the Frobenius-Perron theorem
1. The multiplicity of the eigenvalue = 1 is one (i.e. the eigenvalue is simple.)
Proof. Since K is regular, there exists a transition matrix K = Kt0 with all positive
elements. Note that if Ke = e, then Ke = t0 e, so that all eigenvectors of
K with eigenvalue are P
also eigenvectors of K with eigenvalue
t0 . Let e1 =
P
col[1, 1,P
. . . , 1]/m. Since i Kij = 1 for any j, we have i (K K)ijP= 1, and
T
hence i Kij = 1 for any j. The transpose of K therefore satisfies i Kji
=1
for all j. The P
vector e1 is the right eigenvector with eigenvalue = 1 since
T
(KT e1 )j = 1/m i Kji
, and hence KT e1 = e1 . Now both K and KT are m m
square matrices and have the same eigenvalues (since the characteristic equation
is invariant to the transpose operation), so = 1 is an eigenvalue of both K and
KT . Now suppose KT v = v and all components vj of v are not equal. Let k be
the index of the largest component, vk , which we can take to be positive without
loss of generality.
vk |vj | for all j and vk > |vl | for some l. It then follows
P T Thus, P
T
that vk = j Kkj
vj < j Kkj
vk , since all components of KT are non-zero and
positive. Thus we conclude that vk < vk , a contradiction. Hence all vk must be

2.3. MARKOV CHAIN MONTE CARLO

31

the same, corresponding to eigenvector e1 . Hence e1 is a unique eigenvector of


KT and = 1 is a simple root of the characterstic equation. Hence K has a single
eigenvector with eigenvalue = 1.
2. If the eigenvalue 6= 1 is real, then || < 1.
Proof. Suppose KT v = t0 v, with t0 6= 1. It then follows that there is an
index k of
v such
vk |vj | for all j and vk > |vl | for some l. Now
P that
Pvector
T
T
t0
vk = j Kkj vj < j Kkj vk = vk , or t0 vk < vk . Thus t0 < 1, and hence
< 1, since is real. Similarly, following the same lines of argument and using
the fact that vk < vl for some l, we can establish that > 1. Hence || < 1.
A similar sort of argument can be used if is complex.
3. Under the dynamics of the random walk Pt = KPt1 , limt Pt = P1 , where P1
is the right eigenvector of K with simple eigenvalue = 1.
Proof. If the matrix K is diagonalizable, the eigenvectors of K form a complete
basis (since it can be put in Jordan canonical form). Thus, any initial distribution Ps can be expanded in terms of the complete, linearly independent basis
{P1 , P2 , . . . Pm } of eigenvectors as Ps = b1 P1 + + bm Pm , where KPi = i Pi
with 1 = 1 and |i | < 1. Now Pt = b1 P1 + b2 t2 P2 + + bm tm Pm , but
t
limt P
ti = 0 exponentially
P fast for i 2. Thus, limt K Ps = b1 P1 . Note
that if i (P
Ps )i = 1,1then i (Pt )i = 1 as K maintains the norm. This implies
that b1 = ( i (P1 )i ) . It turns out that the condition of detailed balance, which
we will define to mean Kij (P1 )j = Kji (P1 )i allows one to define a symmetric
transition matrix K0 , which is necessarily diagonalizable.
More generally, any square matrix is similar to a matrix of Jordan form, with
isolated blocks of dimension of the multiplicity of the eigenvalue. Thus the matrix
1 where the columns of P are the (generalized)
K can be written as K = PKP
is of form
eigenvectors of K and K

0
J1

K =

...
0

JI

where I + 1 is the number of independent eigenvectors. For each eigenvector with

32

2. NUMERICAL INTEGRATION AND IMPORTANCE SAMPLING


corresponding eigenvalue i , Ji is of the form

i 1
0
0 i 1

Ji =
.. .
.
0 i

... ...
0
The dimension of the square matrix Ji depends on the original matrix K. For
any given eigenvalue , the sum of the dimensions of the Ji for which = i is
equal to the algebraic multiplicity of the eigenvalue. It is easily shown that the
nth power of a block Ji has elements bounded by nk ink for a block of size k,
goes
which goes to zero exponentially has n goes to infinity. Hence, the matrix K
to

1
0
0

K =

.
.

.
0
0
Thus it follows that as n goes to infinity


n P1
(Kn ) = PK
P1 P1
1 = (P1 ) ,

where P1 is the stationary distribution, since the matrix P1


1 is the component
of the left eigenvector of K with eigenvalue of 1, which is the constant vector.
Consider the explicit case m = 3, with Kij given by
1
2

K = 0
1
2

1
3
1
.
3
1
3

0
1
2
1
2

(K K)ij > 0 for all i and j, so K is regular. Note also that


The eigenvalues and eigenvectors of K are
= 1
=
=

1
2

=
1
6

=
=

P1 = (2/7, 2/7, 3/7)


P2 = (3/14, 3/14, 6/14)
P3 = (3/7, 3/7, 0).

Kij = 1 for all j.

2.3. MARKOV CHAIN MONTE CARLO

33

If initially Ps = (1, 0, 0), then P2 = (1/2, 0, 1/2) and


P10 = (0.28669 , 0.28474 , 0.42857 ),
which differs from P1 by 0.1%.

2.3.3

Construction of the transition matrix K(y x)

We wish to devise a procedure so that the limiting (stationary) distribution of the random
walk is the Boltzmann distribution Peq (x).
Break up the transition matrix into two parts, generation of trial states T(y x) and
acceptance probability of trial state A(y x)
K(y x) = T(y x)A(y x).
Will consider definition of acceptance probability given a specified procedure for
generating trial states with condition probability T(y x).
Write dynamics in term of probabilities at time t
Z
dy Pt (y) K(y x)


Z
Probability of starting in x and remaining in x = Pt (x) 1 dy K(x y)
Probability of starting in y and ending in x =

so

Pt+1 (x) =
dy Pt (y) K(y x) + Pt (x) 1 dy K(x y)


Z
= Pt (x) + dy Pt (y)K(y x) Pt (x)K(x y) .
Z

Thus, to get Pt+1 (x) = Pt (x), we want


Z


dy Pt (y)K(y x) Pt (x)K(x y) = 0.

This can be accomplished by requiring microscopic reversibility or detailed balance:


Pt (y)K(y x) = Pt (x)K(x y)
Pt (y)T(y x)A(y x) = Pt (x)T(x y)A(x y).

34

2. NUMERICAL INTEGRATION AND IMPORTANCE SAMPLING


We desire Pt (x) = Pt+1 (x) = Peq (x). This places restriction on the acceptance
probabilities
A(y x)
Peq (x)T(x y)
=
.
A(x y)
Peq (y)T(y x)
Note that Peq (x)K(x y) is the probability of observing the sequence {x, y} in
the random walk. This must equal the probability of the sequence {y, x} in the
walk.
Metropolis Solution: Note that if Peq (x)T(x y) > 0 and Peq (y)T(y x) > 0, then
we can define


Peq (x)T(x y)
A(y x) = min 1,
Peq (y)T(y x)


Peq (y)T(y x)
A(x y) = min 1,
.
Peq (x)T(x y)
This definition satisfies details balance.
Proof. Verifying explicitly, we see that
Peq (y)T(y x)A(y x) = min (Peq (y)T(y x), Peq (x)T(x y))
Peq (x)T(x y)A(x y) = min (Peq (x)T(x y), Peq (y)T(y x))
= Peq (y)T(y x)A(y x)
as required.
Procedure guaranteed to generate states with probability proportional to Boltzmann distribution if proposal probability T(y x) generates an ergodic transition
probability K.

Simple example
Suppose we want to evaluate the equilibrium average hAi at inverse temperature
of some dynamical variable A(x) for a 1-dimensional harmonic oscillator system with
potential energy U (x) = kx2 /2.
Z

hA(x)i =

dx Peq (x)A(x)

Monte-Carlo procedure

Peq (x) =

1 kx2 /2
e
Z

2.3. MARKOV CHAIN MONTE CARLO

35

1. Suppose the current state of the system is x0 . We define the proposal probability
of a configuration y to be
 1
if y [x0 x, x0 + x]
2x
T(x0 y) =
0
otherwise
x is fixed to some value representing the maximum displacement of the trial
coordinate.
y chosen uniformly around current value of x0 .
Note that if y is selected, then T(x0 y) = 1/(2x) = T(y x0 ) since x0
and y are within a distance x of each other.
2. Accept trial y with probability



T(y x0 )Peq (y)
A(x0 y) = min 1,
= min 1, eU ,
T(x0 y)Peq (x0 )
where U = U (y) U (x0 ).
Note that if U 0 so the proposal has lower potential energy than the
current state, A(x0 y) = 1 and the trial y is always accepted as the next
state x1 = y.
If U > 0, then A(x0 y) = eU = q. We must accept the trial y
with probability 0 < q < 1. This can be accomplished by picking a random
number r uniformly on (0, 1) and then:
(a) If r q, then accept configuration x1 = y
(b) If r > q, then reject configuration and set x1 = x0 (keep state as is).
3. Repeat steps 1 and 2, and record states {xi }.
Markov chain of states (the sequence) {xi } generated, with each xi appearing in the
sequence with probability Peq (xi ) (after some number of equilibration steps).
After collecting N total configurations, the equilibrium average hAi can be estimated
from
hAi =

N
1 X
A(xi )
N i=1

since the importance function w(x) = Peq (x).


Rejected states are important to generate states with correct weight.
Typically, should neglect the first Ns points of Markov chain since it takes some iterations of procedure to generate states with stationary distribution of K(y x).

36

2. NUMERICAL INTEGRATION AND IMPORTANCE SAMPLING


Called equilibration or burn in time.
Often have correlation among adjacent states. Can record states every Ncorr
Monte-Carlo steps (iterations).
Why must rejected states be counted? Recall that K(x y) must be normalized to
be a probability (or probability density).
Z
Z
dy K(x y) = 1 = dy [(x y) + (1 (x y))] K(x y)
hence, the probability to remain in the current state is
Z
K(x x) = 1 dy (1 (x y)) K(x y)
K(x x) is non-zero to insure normalization, so we must see the sequence
{. . . , x, x, . . . } in the Markov chain.

2.4

Statistical Uncertainties

We would like some measure of the reliability of averages computed from a finite set of
sampled data. Consider the average x of a quantity x constructed from a finite set of N
measurements {x1 , . . . , xN }, where the xi are random variables drawn from a density (x).
N
1 X
x=
xi
N i=1

Suppose the random variables xi are drawn independently, so that the joint probability
satisfies
(x1 , . . . xN ) = (x1 )(x2 ) (xN ).
Suppose that the intrinsic mean hxi of the random variable is hxi =
that the intrinsic variance is 2 , where
Z
2
2
= dx x hxi (x).

dx x(x) and

A mesaure of
p reliability is2 the standard deviation of x, also known as the standard
error E = E2 , where E is the variance of the finite average x around hxi in the
finite set {x1 , . . . , xN }
2
E2 = h x hxi i.

2.4. STATISTICAL UNCERTAINTIES

37

We expect
1. Variance (error) decreases with increasing N
2. Error depends on the intrinsic variance 2 of the density . Larger variance should
mean slower convergence of x to hxi.
Suppose all the intrinsic moments hxn i of (x) exist. If we define the dimensionless variable
z=

x hxi
,
E

and note that the probability density of z can be written as


Z
Z
1
1
it(z
z)
P (
z ) = h(z z)i =
dt he
i=
dt eitz heitz i
2
2
Z
1
dt eitz N (t),
=
2

(2.1)

where N (t) is the characteristic function of P (z). Using the fact that E = / N , as will
be shown shortly, z can be expressed as


N 
N
1 X xi hxi
N X xi hxi
z=
=
,
N i=1

N i=1
and hence
N

it X
*
N i=1
N (t) =
e


xi hxi +

D

it

( xhxi
)

EN

= (t/ N )N ,

where


(u) = eiu(xhxi)/ ,
which, for small arguments u can be expanded as (u) = 1 u2 /2 + iu3 3 /(6 3 ) + . . . ,
where 3 is the third cumulant of the density (x). Using this expansion, we find ln N (t) =
2
t2 /2 + O(N 1/2 ), and hence N (t) = et /2 for large N . Inserting this form in Eq. (2.1)
gives
1
2
P (
z ) = ez /2
2
and hence we find the central limit theorem result that the averages x are distributed around
the intrinsic mean according to the normal distribution
2

N (x; hxi, E2 )

(xhxi)

1
2
=
e 2E ,
2E

38

2. NUMERICAL INTEGRATION AND IMPORTANCE SAMPLING

provided the number of samples N is large and all intrinsic moments (and hence cumulants)
of (x) are finite.
If the central limit theorem holds so that the finite averages are normally distributed
as above, then the standard error can be used to define confidence intervals since the
probability p that the deviations in the average, x hxi, lie within a factor of c times
the standard error is given by
Z

cE

d(x hxi)N (x; hxi, x2 ) = p.

cE

If p = 0.95, defining the 95% confidence intervals, then we find that c = 1.96, and hence
the probability that the intrinsic mean hxi lies in the interval (x 1.96E , x + 1.96E )
is P (x 1.96E hxi x + 1.96E ) = 0.95.
If the data are not normally distributed, then higher cumulants of the probability
density may be needed (skewness, kurtosis, ...). The confidence intervals are defined
in terms of integrals of the probability density.
How do we calculate E or E2 ? We defined the variance as
2
E2 = h x hxi i

where

x=

1 X
xi
N i

Expanding the square, we get E2 = hx2 i hxi2 but


1 X
1
hx2 i =
hxi xj i = 2
2
N i,j
N
=

!
X
i

hx2i i +

XX
i

hxi ihxj i

j6=i

 2
1
2
2
2
N
(
+
hxi
)
+
N
(N

1)hxi
=
+ hxi2 ,
2
N
N

hence E2 = 2 /N .
However we dont really know what 2 is, so we must estimate this quantity. One way
to do this is to define the estimator of the standard deviation
2:

2 =

1 X
(xi x)2 .
N i

2.4. STATISTICAL UNCERTAINTIES

39

The average of this estimator is


1 X
1 X 2
h(xi x)2 i =
hxi 2xi x + x2 i.
N i
N i

h
2i =

but using the facts that


hx2 i N 1 2
+
hxi
N
N

hxi xi =

hx2 i =

2
+ hxi2 ,
N

we get h
E2 i = (N 1) 2 /N and hence 2 = N h
2 i/(N 1).
The standard error can therefore be estimated by

E =

2
N 1

Note that E decreases as 1/ N 1


Narrowing confidence intervals by a factor of 2 requires roughly 4 times more
data.
If the xi are not indepenent, then N Nef f , where Nef f is the effective number of
independent configurations. If c is the correlation length (number of Monte-Carlo
steps over which correlation hxi xj i differs from hxi2 ), then Nef f = N/c .
Another way to estimate the intrinsic variance 2 is to use the Jackknife Method: see
B. Efron, The annals of statistics 7, 1 (1979).
Jackknife procedure is to form N averages from the set {x1 , . . . , xN } by omitting
one data point. For example, the jth average is defined to be
N

(j)

1 X
xi
=
N 1 i6=j

The result is to form a set of averages {x(1) , x(2) , . . . x(N ) }.


The average x over this Jackknifed set is the same as before since
N
N X
N
X
1 X
1 X (j)
1
x=
x =
xi =
xi .
N j=1
N (N 1) j=1 i6=j
N i=1

40

2. NUMERICAL INTEGRATION AND IMPORTANCE SAMPLING


We define an estimator of the variance 2 of the Jackknifed set to be
N
2
1 X (j)
=
x x .
N j=1
2

which converges to h2 i as N gets large.


Inserting the definitions of x(j) and x and using the independence of the xi , we
see that the estimator is related to the intrinsic variance by


1
1
2
2
h i =

2 =
2 = N (N 1)h2 i.
N 1 N
N (N 1)
The standard error can therefore be estimated using the Jackknifed set as
v
u
N
uN 1 X
2
t
x(j) x .
E =
N j=1

3
Applications of the Monte Carlo
method
3.1

Quasi-ergodic sampling

Detailed balance condition and ergodic transition matrix imply that random walk
Monte-Carlo method correctly generates distribution of configurations.
Says nothing about the rate of convergence, which depends on implementation (eigenvalue spectrum of transition matrix).
Consider a 1-dimensional system with a quartic potential U (x) = x2 (x2 2).
For = 10, probability density P (x) eU (x) is bimodal with small relative probability to be found near x = 0.
1

U(x)
p(x)
0.5

-0.5

-1
-1.5

-1

-0.5

0
x

41

0.5

1.5

42

3. APPLICATIONS OF THE MONTE CARLO METHOD


Note that P (1)/P (0) = e10 .
Consider the simple Metropolis Monte-Carlo scheme discussed previously with:

T(x y) =

1
2x

if y [x x, x + x]
otherwise.

If maximum displacement x is small (x  1), then if x is in the region near


x = 1, the probability of proposing y near 1 is zero, and the proposal is very
unlikely to propose a configuration y which is in a different mode from x.
Random walk dynamics consists of long periods of x being localized in one of the
modes, with only rare transitions between the modes (roughly every e10 steps).
Overall, an equal amount of time must be spent in both of the modes.
Rare transitions between modes leads to very slow convergence of distribution of
states to P (x).
Easy to fix here since we know where the modes lie: increasing x will allow
proposals between modes.

3.2

Umbrella sampling

Origin of quasi-ergodic sampling problem is poor movement in random walk dynamics


between different modes of high probability density.
Sampling can be improved if one improves the transition probability between modes by
either improving the proposal of trial moves or by modifying the acceptance criterion
(i.e. sampling with a different importance function).
Umbrella sampling is based on using a modified version of the Boltzmann distribution, typically specified by an additional potential energy term that encourages movement between modes.
Consider the quartic system discussed in the last section. We define the umbrella
potential Ub (x) = kx2 and the importance function (x) = eU (x) eUb (x) and define
the transition matrix so that is the limit distribution of the random walk.
Any canonical average can be written as an average over


Z
Z
P (x)
hA(x)i = dx P (x)A(x) = dx (x) A(x)
(x)
provided that (x) 6= 0 at any point where P (x) 6= 0.

3.3. SIMULATED ANNEALING AND PARALLEL TEMPERING

43

Generate Markov chain of states {x1 , . . . , xN } according to (x) so that an estimator of the average is
hA(x)i =

N
N
N
1 X
P (xi )
1 X
1 X
A(xi )
=
A(xi )w(xi ) =
A(xi )eUb (xi ) .
N i=1
(xi )
N i=1
N i=1

Weight factor w(xi ) = eUb (xi ) accounts for bias introduced by the umbrella potential. In this case, it will assign greater weight to regions around the modes at
x = 1 since the biased random walk attaches less signficance to these regions.
The parameter k in the umbrella potential can be adjusted to minimize the statistical uncertainties.
Disadvantage of umbrella approach: Must know a way in which to enhance movement
between all modes of system in order to define an effective umbrella potential.

3.3
3.3.1

Simulated annealing and parallel tempering


High temperature sampling

At high temperatures (  1), the equilibrium distribution Ph (x) is only weakly bimodal.
Transition rate between modes depends exponentially on U , where U is the
barrier height of the potential separating different modes.
If is small so that U  1, then the system moves easily between modes.
Can use high-temperature distribution Ph (x) as an importance function (x), resulting
in weight function w(xi ) given by
w(xi ) =

P (xi )
= e(h )U (xi ) = eU (xi ) .
(xi )

If is large, points near barrier xi = 0 receive little weight since w(xi )  1.


Advantage of this approach is that we dont need to know barrier locations since
high average potential energy overcomes barriers.
Disadvantage: For high-dimensional systems, the number of accessible states is
large (high entropy) for high temperatures. Many configurations sampled at high
temperatures therefore receive little weight, leading to sampling inefficiency and
large statistical uncertainties.

44

3. APPLICATIONS OF THE MONTE CARLO METHOD

3.3.2

Extended state space approach: Simulated Tempering,


Marinari and Parisi, 1992

Large temperature gaps in high temperature sampling approach lead to inefficient


sampling due to a difference in density of states (entropy), while small temperature
gaps are typically insufficient to enhance the passage between modes.
Idea of extended state space approaches is to use a ladder of different temperatures (or
other parameter) to allow the system to gradually move out of modes in an efficient
manner.
We augment the phase space point r(N ) with a parameter i from a set of m values
(N )
{i } and define a target limit distribution (r(N ) , i ) = Wi ei U (r ) on the extended
state space (r(N ) , i ), where Wi is an adjustable parameter.
Monte-carlo procedure is standard, but with extended state space:
1. Carry out sequence of a specified number of updates at fixed i using normal
Metropolis scheme.
2. Randomly and uniformly select a temperature index j, with corresponding parameter j , for a Monte-Carlo update. Accept change of parameter with probability
A(i j), where




Wj (j i )U (r(N ) )
(r(N ) , j )
= min 1,
e
A(i j) = min 1,
.
(r(N ) , i )
Wi
(N )

(N )

Generate chain of states of extended phase space {(r1 , i1 ), . . . , (rn , in )}. If


target average is at temperature = 1 , averages are given by estimator
n

hA(r

(N )

1X
(N ) W1 (1 i )U (r(N ) )
k
k
)i =
A(rk )
e
.
n k=1
Wik

Drawbacks:
Must specify the parameter set {i } properly to ensure the proper movement
between modes.
Must know how to choose weights Wk for a given set of {k }. This can be done
iteratively, but requires a fair amount of computational effort.

3.3. SIMULATED ANNEALING AND PARALLEL TEMPERING

3.3.3

45

Parallel Tempering or Replica Exchange, C.J. Geyer, 1991

Use an extended state space composed of replicas of the system to define a Markov
chain X = (r1 , . . . rm ), where each ri is a complete configuration of the system.
Design a transition matrix so that limiting distribution is
P (X) = 1 (r1 ) . . . m (rm )
The (statistically independent) individual components i of the extended state
space vector can be assigned any weight i . One choice is to use a Boltzmann
distribution i (ri ) = ei U (ri ) with inverse temperature i .
The Monte-Carlo process on the extended state space can be carried out as follows:
1. Carry out a fixed number of updates on all replicas, each with a transition matrix
Ki that has a limit distribution i .
2. Attempt a swap move, in which different components of the extended state space
vector (replicas) are swapped.
For example, any pair of components, possibly adjacent to one another, can
be selected from a set of all possible pairs with uniform probability. Suppose
one picks components 2 and 3, so that the original configuration Xi and
proposed configuration Yi are

r1
r1
r3
r2

Yi = r2
Xi = r3
..
..
.
.
rm
rm
The proposed configuration Yi should be accepted with probability A(Xi
Yi ) given by




P (Yi )
2 (r3 )3 (r2 )
A(Xi Yi ) = min 1,
= min 1,
P (Xi )
2 (r2 )3 (r3 )
Note that no adjustable weight factors Wi are needed.
If i = ei U (ri ) , then
2 (r3 )3 (r2 )
e2 U (r3 ) e3 U (r2 )
= 2 U (r2 ) 3 U (r3 ) = e(2 3 )U (r3 ) e(2 3 )U (r2 ) = eU ,
2 (r2 )3 (r3 )
e
e
where = 3 2 and U = U (r3 ) U (r2 ).
Each component of Xi in Markov chain of extended states {X(1) , . . . X(N ) } is distributed
with weight i

46

3. APPLICATIONS OF THE MONTE CARLO METHOD


Averages at any of the temperatures are therefore readily computed using
hAii =

N
1 X
(k)
A(Xi ),
N k=1

(k)

where Xi is the ith component of the extended configuration X(k) (the kth configuration in the Markov chain).
Advantages: Quasi-ergodic sampling is mitigated by using a range of parameters such
as i . The parameters should be defined such that their extreme values (such as highest
temperature) should demonstrate no trapping in single modes.
Disadvantages: There must be some overlap in adjacent densities i and i+1 if a swap
move is to be accepted with significant probability. Ideally, the parameters i should
be chosen so that any given labelled configuration spends an equal amount of time at
all parameter values.
Sometimes requires many replicas, which means that it takes many Monte-Carlo
exchange moves for a given configuration to cycle through the parameters.
One of the major advantages of the replica exchange method is the ease with which
one can parallelize the algorithm.
Normal single-chain algorithm cannot be parallelized efficiently because of sequential nature of random walk procedure.
Can parallelize the generation of many trial configurations and use an asymmetric
proposal procedure to select one preferentially.
Can parallelize computation of energy, if this is a rate-determining step.
The exchange frequency between replicas can be optimized to suit the computer architecture.
Each processor can deal with a single replica, or sub-sets of replicas.

4
Molecular dynamics
4.1

Basic integration schemes

4.1.1

General concepts

Aim of Molecular Dynamics (MD) simulations:


compute equilibrium and transport properties of classical many body systems.
Basic strategy: numerically solve equations of motions.
For many classical systems, the equations of motion are of Newtonian form
1
R N = P N
m
U
P N = F N = N ,
R
or
X N = LX N , with LA = {A, H},
where X N = (RN , P N ).
The energy H =

P N P N
2m

+ U (RN ) is conserved under this dynamics.

The potential energy is typically of the form of a sum of pair potentials:


N

U (R ) =

X
(i,j)

(rij ) =

N X
i1
X

(rij ),

i=1 j=1

which entails the following expression for the forces F N :


X
X
rij X 0
rj ri
Fi =
(rij ) =
0 (rij )
=
(rij )
ri
ri
r
j6=i
j6=i
j6=i |
{z ij }
Fij
47

48

4. MOLECULAR DYNAMICS
Examples of quantities of interest:
1. Radial distribution function (structural equilibrium property)
g(r) =

X
2V
h(ri rj r)i ,
N (N 1)
(i,j)

where
N

dX N A(X N )eH(X )
R
hAi =
dX N eH(X N )
R
dX N A(X N ) (E H(X N ))
R
or =
dX N (E H(X N ))
R

(canonical ensemble)
(microcanonical ensemble).

2. Pressure (thermodynamic equilibrium property):


pV = N kT +

1X
hFij rij i
3
(i,j)

which can be written in terms of g(r) as well.


3. Mean square displacement (transport property):


|r(t) r(0)|2 6Dt for long times t,
where D is the self diffusion coefficient.
4. Time correlation function (relaxation properties)
C(t) = hv(t) v(0)i
which is related to D as well:
Z t
1
C( ) d
D = lim lim
3 t N,V 0
If the system is ergodic then time average equals the microcanonical average:
R
Z tfinal
dX N A(X N ) (E H(X N ))
1
N
R
lim
dt A(X (t)) =
.
tfinal tfinal 0
dX N (E H(X N ))
For large N , microcanonical and canonical averages are equal for many quantities A.
Need long times tfinal !

4.1. BASIC INTEGRATION SCHEMES

49

The equations of motion to be solved are ordinary differential equations.


There exist general algorithms to solve ordinary differential equations numerically (see
e.g. Numerical Recipes Ch. 16), such as Runge-Kutta and predictor/correction algorithms. Many of these are too costly or not stable enough for long simulations of
many-particle systems. In MD simulations, it is therefore better to use algorithms
specifically suited for systems obeying Newtons equations of motion, such as the Verlet algorithm.
However, we first want to explore some general properties of integration algorithms.
For this purpose, consider a function x of t which satisfies
x = f (x, t).

(4.1)

We want to solve for the trajectory x(t) numerically, given the initial point x(0) at
time t = 0.
Similar to the case of integration, we restrict ourselves to a discrete set of points,
separated by a small time step t:
tn = nt
xn = x(tn ),
where n = 0, 1, 2, 3, . . . .
To transform equation (4.1) into a closed set of equations for the xn , we need to express
the time derivative x in terms of the xn . This can only be done approximately.
Using that t is small:
x(t
n)

x(tn + t) x(tn )
xn+1 xn
=
.
t
t

Since this should be equal to f (x(t), t) = f (xn , tn ):


xn+1 xn
f (xn , tn )
t
xn+1 = xn + f (xn , tn )t

Euler Scheme.

(4.2)

This formula allows one to generate a time series of points which are an approximation
to the real trajectory. A simple MD algorithm in pseudo-code could look like this:1
1

Pseudo-code is an informal description of an algorithm using common control elements found in most
programming language and natural language; it has no exact definition but is intended to make implementation in a high-level programming language straightforward.

50

4. MOLECULAR DYNAMICS
EULER ALGORITHM
SET x to the initial value x(0)
SET t to the initial time
WHILE t < tfinal
COMPUTE f(x,t)
UPDATE x to x+f(x,t)*dt
UPDATE t to t+dt
END WHILE
DO NOT USE THIS ALGORITHM!
It is easy to show that the error in the Euler scheme is of order t2 , since
1
x(t + t) = x(t) + f (x(t), t)t + x(t)t2 + . . . ,
2
so that
xn+1 = xn + f (xn , tn )t + O(t2 ) .
| {z }

(4.3)

local error

The strict meaning of the big O notation is that if A = O(tk ) then limt0 A/tk
is finite and nonzero. For small enough t, a term O(tk+1 ) becomes smaller than
a term O(tk ), but the big O notation cannot tell us what magnitude of t is small
enough.
A numerical prescription such as (4.3) is called an integration algorithm, integration
scheme, or integrator.
Equation (4.3) expresses the error after one time step; this is called the local truncation
error.
What is more relevant is the global error that results after a given physical time tf of
order one. This time requires M = tf /t MD steps to be taken.
Denoting fk = f (xk , tk ), we can track the errors of subsequent time steps as follows:
x1 = x0 + f0 t + O(t2 )
x2 = [x0 + f0 t + O(t2 )] + f1 t + O(t2 )
= x0 + (f0 + f1 )t + O(t2 ) + O(t2 )
..
.
xM = x0 +

M
X
k=1

fk1 t +

M
X
k=1

O(t2 );

4.1. BASIC INTEGRATION SCHEMES

51

but as M = tf /t:
tf /t

x(t) = xM = x0 +

X
k=1

tf /t

fk1 t +

O(t2 ) .

|k=1 {z

global error

Since tf = O(1), the accumulated error is


tf /t

O(t2 ) = O(t2 )O(tf /t) = O(tf t),

(4.4)

k=1

which is of first order in the time step t.


Since the global error goes as the first power of the time step t, we call equation (4.3)
a first order integrator.
In absence of further information on the error terms, this constitutes a general principle:
If in a single time step of an integration scheme, the local truncation error is O(tk+1 ),
then the globally accumulated error over a time tf is O(tf tk ) = O(tk ), i.e., the
scheme is kth order.
Equation (4.4) also shows the possibility that the error grows with physical time tf :
Drift.
Illustration of local and global errors:
Let f (x, t) = x, so that equation (4.1) reads
x = x,
whose solution is exponentially decreasing with a rate :
x(t) = et x(0).

(4.5)

The numerical scheme (4.3) gives for this system


xn+1 = xn xn t.

(4.6)

Note that the true relation is x(t + t) = et x(t) = x(t) tx(t) + O(t2 ), i.e.,
the local error is of order t2 .
Equation (4.6) is solved by
xn = (1 t)n x0 = (1 t)t/t = e[ln(1t)/t]t .

(4.7)

52

4. MOLECULAR DYNAMICS
By comparing equations (4.5) and (4.7), we see that the behaviour of the numerical
solution is similar to that of the real solution but with the rate replaced by 0 =
ln(1t)/t. For small t, one gets for the numerical rate 0 = +2 t/2+ =
+O(t), thus the global error is seen to be O(t), which demonstrates that the Euler
scheme is a first order integrator. Note that the numerical rate diverges at t = 1/,
which is an example of a numerical instability.

4.1.2

Ingredients of a molecular dynamics simulation

1. Boundary conditions
We can only simulate finite systems.
A wall potential would give finite size effects and destroy translation invariance.
More benign boundary conditions: Periodic Boundary Conditions:
Let all particles lie in a simulation box with coordinates between L/2 and L/2.
A particle which exits the simulation box, is put back at the other end.
Infinite checkerboard picture (easiest to visualize in two dimensions):

The box with thick boundaries is our simulation box.


All other boxes are copies of the simulation box, called periodic images.
The other squares contain particles with shifted positions

iL
r0 = r + jL ,
kL
for any negative or positive integers i, j, and k. Thus, if a particle moves out of
the simulation box, another particle will fly in from the other side.

4.1. BASIC INTEGRATION SCHEMES

53

Conversely, for any particle at position r0 not in the simulation box, there is a
particle in the simulation box at

0 L
(x + 2 ) mod L L2
(4.8)
r = (y 0 + L2 ) mod L L2 ,
(z 0 + L2 ) mod L L2
Yet another way to view this is to say that the system lives on a torus.
2. Forces
Usually based on pair potentials.
A common pair potential is the Lennard-Jones potential
 

12  6
(r) = 4
,

r
r

is a measure of the range of the potential.


is its strength.
The potential is positive for small r: repulsion.
The potential is negative for large r: attraction.
The potential goes to zero for large r: short-range.
The potential has a minimum of at 21/6 .
Computing all forces in an N-body system requires the computation of N (N 1)/2
(the number of pairs in the system) forces Fij
Computing forces is often the most demanding part of MD simulations.
A particle i near the edge of the simulation box will feel a force from the periodic
images, which can be closer to i than their original counter-parts.
A consistent way to write the potential is
U=

N X
n1
XX

(|rn rm + iL
x + jL
y + kL
z|).

(4.9)

i,j,k n=1 m=1

While this converges for most potentials , it is very impractical to have to compute an infinite sum to get the potential and forces.
To fix this, one can modify the potential such that it becomes zero beyond a
certain cut-off distance rc :
(
(r) (rc ) if r < rc
0 (r) =
0
if r rc
where the subtraction of (rc ) is there to avoid discontinuities in the potential
which would cause violations of energy conservation.

54

4. MOLECULAR DYNAMICS
To also avoid discontinuities in derivatives, one can use a schemes such as
00 (r) = (r)(r)

(4.10)

where
(r) =

(rc

0
c 3rc +2r)
(rc rc0 )3

r)2 (r

r < rc0
rc0 r rc .
r > rc

(4.11)

Here is an example of these procedures applied to the Lennard-Jones potential:


Cutoff Lennard-Jones potentials, ==1, rc = 2.5, rc = 2
0.2
0
-0.2
-0.4
-0.6
-0.8

, cutoff

-1
1

1.2

1.4

1.6

1.8
r

2.2

2.4

2.6

Once the potential is zero beyond a point, the sums over i, j, and k in equation (4.9) become finite.
In fact, if rc < L/2, the sum contains at most one non-zero contribution for each
pair of particles (i, j). This pair is either in the same box, or one of them is in
the adjacent boxes.
For any pair, the correct distance vector can be found from the original distance
vector r = ri rj using equation (4.8).
3. Initial conditions
The initial conditions are to some extend not important if the system naturally
tends to equilibrium (ergodicity).
Nonetheless, one would not want too extreme an initial configuration.

4.1. BASIC INTEGRATION SCHEMES

55

Starting the system with the particles on a lattice and drawing initial momenta
from a uniform or Gaussian distribution is typically a valid starting point.
One often makes sure that the kinetic energy has the target value 23 N kT , while
the total momentum is set to zero to avoid the system moving as a whole.
4. Integration scheme
Needed to solve the dynamics as given by the equations of motion.
Below, we will discuss in detail on how to construct or choose an appropriate
integration scheme.
5. Equilibration/Burn-in
Since we do not start the system from an equilibrium state, a certain number of
time steps are to be taken until the system has reached an equilibrium.
One can check for equilibrium by seeing if quantities like the potential energy are
no longer changing in any systematic fashion and are just fluctuating around a
mean values.
The equilibrium state is microcanonical at a given total energy Etot = Epot + Ekin .
Since Ekin > 0, the lattice initialization procedure outlined cannot reach all possible values of the energy, i.e. Etot > Epot (lattice).
To reach lower energies, one can periodically rescale the momenta (a rudimentary
form of a so called thermostat).
Another way to reach equilibrium is to generate initial conditions using the Monte
Carlo method.
6. Measurements
Construct estimators for physical quantities of interest.
Since there are correlations along the simulated trajectory, one needs to take
sample points that are far apart in time.
Although when one is interested in dynamical quantities, all points should be
used. In the statistical analysis there correlations should be taken into account.

56

4. MOLECULAR DYNAMICS

Given these ingredients, the outline of an MD simulation could look like this:
OUTLINE MD PROGRAM
SETUP INITIAL CONDITIONS
PERFORM EQUILIBRATION by INTEGRATING over a burn-in time B
SET time t to 0
PERFORM first MEASUREMENT
WHILE t < tfinal
INTEGRATE over the measurement interval
PERFORM MEASUREMENT
END WHILE
in which the integration step from t1 to t1+T looks as follows (assuming t=t1 at the start):
OUTLINE INTEGRATION
WHILE t < t1+T
COMPUTE FORCES on all particles
COMPUTE new positions and momenta according to INTEGRATION SCHEME
APPLY PERIODIC BOUNDARY CONDITIONS
UPDATE t to t+dt
END WHILE
Note: we will encounter integration schemes in which the forces need to be computed in
intermediate steps, in which case the separation between force computation and integration
is not as strict as this outline suggests.

4.1.3

Desirable qualities for a molecular dynamics integrator

Accuracy:
Accuracy means that the trajectory obeys the equations of motion to good approximation. This is a general demand that one would also impose on integrators for general
differential equations. The accuracy in principle improves by decreasing the time step
t. But because of the exponential separation of near-by trajectories in phase space
(Lyapunov instability), this is of limited help.
Furthermore, one cannot decrease t too far in many particle systems for reasons of
Efficiency:
It is typically quite expensive to compute the inter-particle forces F N , and taking
smaller time steps t requires more force evaluations per unit of physical time.

4.1. BASIC INTEGRATION SCHEMES

57

Respect physical laws:


Time reversal symmetry
Conservation of energy
Conservation of linear momentum
Conservation of angular momentum
Conservation of phase space volume
provided the simulated system also has these properties, of course.
Violating these laws poses serious doubts on the ensemble that is sampled and on
whether the trajectories are realistic.
Unfortunately, there is no general algorithm that obeys all of these conservation laws
exactly for an interacting many-particle system. At best, one can find time-reversible,
volume preserving algorithms that conserve linear momentum and angular momentum,
but that conserve the energy only approximately.
Note furthermore that with periodic boundary conditions:
Translational invariance and thus conservation of momentum is preserved.
There is no wall potential, so the energy conservation is not affected either.
But rotational invariance is broken: No conservation of angular momentum.
Stability:
Given that the energy is only conserved approximately, when studying dynamics on
large time scales, or when sampling phase space using MD, it is important that the simulation is stable, i.e., that the algorithm does not exhibit energy drift, since otherwise,
it would not even sample the correct microcanonical energy shell.
Remarks:
Since the computational cost thus limits the time step, the accuracy of the algorithm
has to be assessed at a fixed time step.
Since t is not necessarily small, higher order algorithms need not be more accurate.
The most efficient algorithm is then the one that allows the largest possible time step
for a given level of accuracy, while maintaining stability and preserving conservation
laws.

58

4. MOLECULAR DYNAMICS
All integration schemes become numerically unstable for large time steps t, even if
they are stable at smaller time steps. A large step may can the system to a region of
large potential energy. With infinite precision, this would just cause a large restoring
force that pushes the system back into the low energy region. But with finite precision,
the low energies cannot be resolved anymore, and the system remains in the high energy
state.
The dominance of the force evaluations means that the efficiency of a simulation can
greatly be improved by streamlining the evaluation of the forces, using
1. Cell divisions:
Divide the simulation box into cells larger than the cutoff rc .
Make a list of all particles in each cell.
In the sum over pairs in the force computation, only sum pairs of particles in
the same cell or in adjacent cells.
When adjacent is properly defined, this procedure automatically picks out
the right periodic image.
Draw-backs:
1. needs at least three cells in any direction to be of use.
2. Still summing many pairs that do not interact (corners).
2. Neighbour lists (also called Verlet lists or Verlet neighbour lists):
Make a list of pairs of particles that are closer than rc + r: these are neighbours.
Sum over the list of pairs to compute the forces.
The neighbour list are to be used in subsequent force calculations as long as
the list is still valid.
Invalidation criterion: a particle has moved more than r/2.
Therefore, before a new force computation, check if any particle has moved
more than r/2 since the last list-building. If so, rebuild the Verlet list,
otherwise use the old one.
Notes:
1. r needs to be chosen to balance the cost of rebuilding the list and considering non-interacting particles.
2. The building of the list may be sped up by using cell divisions.
For large systems, these methods of computing the interaction forces scale as N instead
of as N 2 , as the naive implementation of summing over all pairs would give.

4.1. BASIC INTEGRATION SCHEMES

59

Assessing the Euler scheme for the harmonic oscillator


Consider the Euler scheme applied to x = (r, p), and f (x, t) = (p, r). i.e., (cf. equation (4.1))
r = p
p = r.
This is the simple one-dimensional harmonic oscillator with mass 1 and frequency 1,
whose solutions are oscillatory
r(t) = r(0) cos t + p(0) sin t
p(t) = p(0) cos t r(0) sin t.
The Euler scheme (4.3) gives for this system
 
 

1
t
rn
rn+1
.
=
pn
pn+1
t 1

(4.12)
(4.13)

(4.14)

The eigenvalues of the matrix on the right hand side of equation (4.14) are given by
= 1 it, and the solution of equation (4.14) can be expressed as
n  
  
1
t
r0
rn
(4.15)
=
p0
pn
t 1
1 n
 
1
n
n
n
)
)
+

r0
(
(

+
+
2
2i
= 1
(4.16)
1
n
n
n
n
(

)
(
+

)
p
0
+

2i
2
! 
0
0
0
0
ei tn ei tn
ei tn +ei tn
r0
2
2i
= (+ )n/2
(4.17)
0
0
0
0
ei tn ei tn
ei tn +ei tn
p
0

2i
2



0
0
t
cos( t) sin( t)
r0
2 2t
,
(4.18)
= (1 + t )
sin( 0 t) cos( 0 t)
p0
0

where ei t = (+ / )1/2 . By comparing equations (4.13) and (4.18), we see that the
behaviour of the numerical solution is similar to that of the real solution but with a
different frequency, and with a prefactor which is larger than one and grows with time.
Rather than performing a periodic, circular motion in phase space, the Euler scheme
produces an outward spiral.
Accuracy: 0 = + O(t) so this is only a first order integration scheme.
Time reversal invariant? No.

60

4. MOLECULAR DYNAMICS
Conserves energy? No.
Conserves angular momentum? No.
Conserves phase space? No.
Stable? No.

4.1.4

Verlet scheme

If the system is governed by Newtons equations:


1
R N = P N
m
N

P = FN,
then one can exploit the form of these equations to construct (better) integrators.
The Verlet scheme is one of these schemes. It can be derived by Taylor expansion
t
t2 ...N t3
+ FnN
R (t)
+ O(t4 )
m
2m
6
2
...N t3
N
N
N t
N t
= R (t + t) = Rn + Pn
+ Fn
+ R (t)
+ O(t4 ).
m
2m
6

N
Rn1
= RN (t t) = RnN PnN
N
Rn+1

Adding these two equations leads to


t2
+ O(t4 )
m
t2
N
Position-only Verlet Integrator
= 2RnN Rn1
+ FnN
m

N
N
Rn1
+ Rn+1
= 2RnN + FnN
N
Rn+1

(4.19)

No momenta!
Requires positions at the previous step!
Simple algorithm (r(i) and rprev(i) are particle is current and previous position)
VERLET ALGORITHM
SET time t to 0
WHILE t < tfinal
COMPUTE the forces F(i) on all particles
FOR each particle i
COMPUTE new position rnew = 2*r(i)-rprev(i)+F(i)*dt*dt/m

4.1. BASIC INTEGRATION SCHEMES

61

UPDATE previous position rprev(i) to r(i)


UPDATE position r(i) to rnew
END FOR
UPDATE t to t+dt
END WHILE
Accuracy: This scheme is third order in the positions, so reasonably accurate.
Respect physical laws?
Time reversal symmetry? Yes, since
N
N
Rn1
= 2RnN Rn+1
+ FnN

t2
.
m

Total energy conservation? No momenta, so energy conservation cannot be checked.


Linear momentum? Also not defined.
Angular momentum? Not defined.
Volume preserving? No phase space volume can be defined without momenta.
Stability: very stable, no energy drift up to relatively large time steps. Why this is so
will become clear later.

4.1.5

Leap Frog scheme

Is a way to introduce momenta into the Verlet scheme.


Define momenta at a half time step
N
Pn+1/2
= P N (t + t/2) = m

N
RnN
Rn+1
.
t

(4.20)

These momenta are correct up to O(t2 ).


If we get the positions from the Verlet scheme, then the errors in the momenta do not
accumulate, so that the global order of the momenta in the Leap Frog method is also
O(t2 ).
Given the half-step momenta, one may also perform the Leap Frog algorithm as follows:
N
N
= RnN + Pn+1/2
Rn+1

t
(which follows from (4.20)),
m

62

4. MOLECULAR DYNAMICS
where
N
N
N
N
Rn+1
RnN
RN Rn1
+ Rn+1
+ Rn1
2RnN
=m n
t
t
N
N
= Pn1/2 + Fn t (as follows from (4.19)).

N
=m
Pn+1/2

The scheme is thus:


N
N
+ FnN t
= Pn1/2
Pn+1/2
N
Rn+1

N
= RnN + Pn+1/2

t
m

Leap Frog integrator.

(4.21)

Since the Leap Frog algorithm is derived from the Verlet scheme, it is equally stable.
The Leap Frog algorithm has the appearance of a first order Taylor expansion, but
because of the half-step momenta, it is third order in positions and second order in
momenta.
Since momenta are defined at different time points than positions, conservation laws
(energy, momentum,. . . ) can still not be checked.
The Leap Frog scheme is easy to implement:
LEAP-FROG ALGORITHM
SET time t to 0
WHILE t < tfinal
COMPUTE the forces F(i) on all particles
FOR each particle i
UPDATE momentum p(i) to p(i)+F(i)*dt
UPDATE position r(i) to r(i)+p(i)*dt/m
END FOR
UPDATE t to t+dt
END WHILE

4.1.6

Momentum/Velocity Verlet scheme

This scheme will integrate positions and momenta (or velocities) at the same time
points, while keeping the position equivalence with the original Verlet scheme.
We define the momenta at time t = nt as
PnN =


1 N
N
.
Pn+1/2 + Pn1/2
2

4.1. BASIC INTEGRATION SCHEMES

63

Using that the half step momenta are correct to O(t2 ), we see that this is also correct
to that order, since





t
t
1
N
N
t+
t
P
+P
2
2
2


1
2
N
N t
N
N t
P (t) + Fn
+ P (t) Fn
+ O(t )
=
2
2
2
= P N (t) + O(t2 ).
Using the momentum rule of the Leap Frog algorithm
N
N
Pn+1/2
= Pn1/2
+ FnN t,

and the definition of PnN , one gets


N
N
Pn+1/2
= 2PnN Pn+1/2
+ FnN t,

or
N
Pn+1/2
= PnN + FnN

t
.
2

(4.22)

Substituting Eq. (4.22) into the position rule of the Leap Frog gives the position transformation in the momentum-Verlet scheme
N
Rn+1

RnN

PnN

2
t
N t
+ Fn
.
m
2m

The corresponding momentum rule is found using the definition of the momentum and
the momentum rule of the Leap Frog:
1 N
N
N
Pn+1
= [Pn+3/2
+ Pn+1/2
]
2
1 N
N
N
t + Pn+1/2
]
= [Pn+1/2
+ Fn+1
2
N
N t
= Pn+1/2
+ Fn+1
2
N
F
+ FnN
= PnN + n+1
t,
2

(4.23)

where (4.22) was used.


This algorithm is usually called velocity Verlet, and is then expressed in terms of the
velocities V N = P N /m.

64

4. MOLECULAR DYNAMICS
Summarizing:

FN
PnN
t + n t2
m
2m
N
N
F
+ Fn
+ n+1
t
2

N
Rn+1
= RnN +
N
Pn+1
= PnN

Momentum Verlet Scheme (first version).

N
is required. But to compute
The momentum rule appears to pose a problem since Fn+1
N
N
Fn+1 , we need only Rn+1 , which is computed in the integration step as well. That is,
given that the forces are known at step n, the next step is can be taken by

STORE all the forces F(i) as Fprev(i)


FOR each particle i
UPDATE position r(i) to r(i)+p(i)*dt/m+F(i)*dt*dt/(2*m)
END FOR
RECOMPUTE the forces F(i) using the updated positions
FOR each particle i
UPDATE momentum p(i) to p(i)+(F(i)+Fprev(i))*dt/2
END FOR

The extra storage step can be avoided by reintroducing the half step momenta as
intermediates. From Eqs. (4.21) (first line), (4.23) and (4.22), one finds

1
N
= PnN + FnN t
Pn+1/2
2
N
Pn+1/2
N
Rn+1
= RnN +
t
m
1 N
N
N
t
Pn+1
= Pn+1/2
+ Fn+1
2

Momentum Verlet Scheme, second version.

(4.24)

4.2. SYMPLECTIC INTEGRATORS FROM HAMILTONIAN SPLITTING METHODS65


In pseudo-code:
MOMENTUM-VERLET ALGORITHM
SET time t to 0
COMPUTE the forces F(i)
WHILE t < tfinal
FOR each particle i
UPDATE momentum p(i) to p(i)+F(i)*dt/2
UPDATE position r(i) to r(i)+p(i)*dt/m
END FOR
UPDATE t to t+dt
RECOMPUTE the forces F(i)
FOR each particle i
UPDATE momentum p(i) to p(i)+F(i)*dt/2
END FOR
END WHILE

4.2

Symplectic integrators from Hamiltonian splitting


methods

For sampling, one wants a long trajectory (formally tf ).


It is therefore important that an integration algorithm be stable.
The instability of the Euler scheme is general, so generally, one would not use it.
The momentum Verlet scheme, on the other hand, is much more stable.
To see why, we will re-derive the momentum Verlet scheme from a completely different starting point, using a so-called Hamiltonian splitting method (also known as
Geometric integration).
We return to the formulation of the equations of motion using the Liouville operator:
X N = LX N ,

(4.25)

where the Liouville operator L acting on a phase space function A was defined in terms

66

4. MOLECULAR DYNAMICS
of the Poison bracket as
LA = {A, H}
A
H
H
A
=

N
N
N
R
P
R
P N


H

A=
=

P N RN
RN P N

!

T
H

J
A,

X N
X N

where J is the symplectic matrix




0 1
J=
.
1 0
Remembering that the Hamiltonian H was defined as
PN PN
+ U (RN ),
2m
it is easy to show that (4.25) leads to
H=

(4.26)

H
X N = J
X N

(4.27)

which are just Newtons equation of motion


R N = P N /m;

U
P N = F N = N .
R

Equations of motion of the form equation (4.27) are called symplectic.


Symplecticity of the equations of motion has a number of important implications:
1. Symplecticity implies Liouvilles theorem, i.e., conservation of phase space volume,
because the rate by which phase space volume changes is given by the divergence
of the flow in phase space, and



H
2H

=
J
:

X
=

=0
X N
X N
X N
X N X N
since J is an antisymmetric matrix and

2H
X N X N

is symmetric.

2. If the Hamiltonian is independent of time, symplecticity implies conservation of


energy H, since

T

T
H
dH
H
H
N

=
X =
J
= 0,
N
N
dt
X
X
X N
again using the antisymmetry of J.

4.2. SYMPLECTIC INTEGRATORS FROM HAMILTONIAN SPLITTING METHODS67


3. If the Hamiltonian is invariant under time reversal (i.e., even in momenta), then
symplecticity implies time-reversibility.
Time reversal means reversing the momenta, which will be denoted by an operator
T . Time reversal symmetry means that
 1
T eLt T = eLt
= eLt .
The infinitesimal-t version of this is
T LT = L
When acting on a phase space point X N = (RN , P N ), one may also write
T XN = T XN ,
where the symmetric matrix

T=


1 0
.
0 1

was defined. Similarly, for derivative one has

=T
,
N
X
X N

A=T
T A,
N
X
X N

i.e.

One easily shows that the T matrix and the symplectic matrix anti-commute
T J = J T.
This property, when combined with a T -invariant Hamiltonian, implies time-

68

4. MOLECULAR DYNAMICS
reversal, since

T LT = T

H
J
X N

T

T
X N

T
H

=T J
TT
N
X
X N

T
H

=TT JT
T
N
X
X N

T
H

= JT
T
N
X
X N
T


= TJ
X N
X N

T
H

= J
TT T
N
X
X N

T
H

= J

N
X
X N
= L


q.e.d.

The idea is now to construct symplectic integrators, such that (by construction), they
conserve phase space volume, conserve momentum and linear momentum when applicable and are time-reversible.
Remember that the formal solution of the equations of motions in equation (4.25) is
X N (t) = eLt X N (0),
but this exponent can only be evaluated explicitly for exceptional forms of L, such as
for
free particles (e.g. an ideal gas),
systems with harmonic forces (e.g. particles harmonically bound on a lattice),
and
free rigid bodies.
Thus, for a Hamiltonian without the potential in (4.26), which is just the kinetic energy
K=

PN PN
,
2m

4.2. SYMPLECTIC INTEGRATORS FROM HAMILTONIAN SPLITTING METHODS69


one has

LK =

K
J
X N

T

X N

PN

=
,
m RN

the exponentiated Liouville operator eLK t corresponds to free motion:




PN
N
LK t
N
N
N
.
t, P
e A(R , P ) = A R +
m
We call this a free propagation over a time t.
For a Liouville operator composed of just the potential energy,

T
U

LU = J

= FN
N
N
X
X
P N
one can also evaluate the exponential

eLU t A(RN , P N ) = A RN , P N + F N t .
We will call this a force propagation over a time t.
Although we can solve the exponent of the operators LK and LU separately, we cannot
exponentiate their sum, because
eX+Y 6= eX eY
when X and Y are non-commuting operators. The operators LK and LU do not
commute since
[LK , LU ] = LK LU LU LK

N
N
N
F

= PN
RN  P N
P N
RN

N

= PN

FN
N
N
R
P
RN
6= 0.
We can however use the Baker-Campbell-Hausdorff (BCH) formula
1

eX eY = eX+Y + 2 [X,Y ]+ 12 [X,[X,Y ]]+ 12 [Y,[Y,X]]+further repeated commutators of X and Y . (4.28)


In the current context this is useful if X and Y are small, so that repeated commutators
become successively smaller. This smallness can be achieved by taking M = t/t small
time steps, as follows:

t/t  L t+L t M
U
eLt = eLt(t/t) = eLt
= e K
.
(4.29)

70

4. MOLECULAR DYNAMICS
Now let X = LU t and Y = LK t, then [X, Y ] = O(t2 ), [X, [X, Y ]] = O(t3 ) etc.
Denote any repeated commutators by . . ., then the BCH formula gives
1

eX eY e 2 [X,Y ] = eX+Y + 2 [X,Y ]+... e 2 [X,Y ]


1

= eX+Y + 2 [X,Y ] 2 [X,Y ]+ 2 [X+Y + 2 [X,Y ], 2 [X,Y ]]+...


1

= eX+Y + 2 [X+Y + 2 [X,Y ], 2 [X,Y ]]+...


= eX+Y +...
Since . . . = O(t3 ) and [X, Y ] = O(t2 ), we see that
eLt = eLU t eLK t + O(t2 )

(4.30)

Using this in Eq. (4.29) would give a scheme in which alternating force and free propagation is performed over small time steps t.
The accumulated error of this scheme is O(t), but the Verlet scheme was O(t2 ).
The missing ingredient is time reversal. Note that while the true propagation satisfies
time reversal invariance:
T eLU t+LK t T = [eLU t+LK t ]1 .
Due to the symplecticity of the LU t and LK t operators separately, the approximate
evolution has
T eLU t eLK t T = T eLU t T T eLK t T = eLU t eLK t ,
which is not equal to the inverse
 L t L t 1
e U e K
= eLK t eLU t .
Time reversal can be restored by taking
eLU t+LK t = eLU t/2 eLK t eLU t/2 + . . . ,
since
T eLU t/2 eLK t eLU t/2 T = T eLU t/2 T T eLK t T T eLU t/2 T
= eLU t/2 eLK t eLU t/2
1

= eLU t/2 eLK t eLU t/2

(4.31)

4.2. SYMPLECTIC INTEGRATORS FROM HAMILTONIAN SPLITTING METHODS71


Equation (4.31) is actually of higher order than equation (4.30), as one sees from
applying the BCH formula
eLU t+LK t = eLU t/2+LK t+LU t/2
1

= eLU t/2+LK t e 2 [LU t/2+LK t,LU t/2]+... eLU t/2


1

= eLU t/2+LK t e 4 [LK t,LU t]+... eLU t/2


1

= eLU t/2 eLK t e 2 [LU t/2,LK t]+... e 4 [LK t,LU t]+... eLU t/2
1

= eLU t/2 eLK t e 4 [LU t,LK t]+... e 4 [LK t,LU t]+... eLU t/2
= eLU t/2 eLK t e... eLU t/2
Remember that . . . was O(t3 ), so this scheme has a third order local error and
consequently a second order global error.
The scheme in equation (4.31) is our momentum Verlet scheme that we had derived
before, see equation (4.24). It first performs a half force propagation, a whole free
propagation and then another half force propagation.
The splitting method is more general than what we have just derived. One does not
have to split up the Hamiltonian into the kinetic and potential part. All that is required
is that the separate sub-Hamiltonian can be explicitly exponentiated.
The following general construction of a splitting scheme is as follows
1. Split the Hamiltonian H up into n partial Hamiltonians H1 , H2 , . . . Hn :
H=

n
X

Hj .

j=1

In more advanced splitting schemes, one may in addition define auxiliary Hamiltonians Hj>n which do not enter in H.
2. Associate with each partial Hamiltonian Hj a partial Liouville operator
Lj = LHj


T
Hj

= J

N
X
X N

such that the full Liouvillean is given by


L=

n
X
j=1

Lj .

72

4. MOLECULAR DYNAMICS
3. One splits up the exponentiated Liouvillean in S factors
Lt

Pn

=e

j=1

Lj t

S
Y

eLjs ts ,

(4.32)

s=1

where the product is taken left-to-right, such that the first factor on the left is
s = 1 while the last on the right is s = S.
4. Since each Liouvillean is multiplied by a total time interval t in the original
exponent, the separate time intervals for each Liouvillean Lj 0 have to add up to
t as well, at least up to the order of the scheme
S
X

ts = t + O(tk+1 ).

s=1
with js = j 0

for each j 0 = 1 . . . n.
For auxiliary Hamiltonians Hj>n , their combined time steps should be zero up to
the order of the scheme:
S
X

ts = 0 + O(tk+1 ).

s=1
with js = j 0

for j 0 > n.
5. One should take tS+1s = ts and jS+1s = ts in order to have time reversal
invariance (we will assume that the Hj are invariant under time reversal).
6. One uses the BCH formula to adjust the ts further, such that
eLt =

P
Y

eLjs ts + O(tk+1 )

(4.33)

s=1

which would make this scheme of order k.


Properties:
Symplectic phase space volume preserving.
Given 4., also time reversible. Proof:
" S
#
S
S
Y
Y
 Lj ts  Y
Ljs ts
s
T
e
T =
Te
T =
eLjs ts
s=1

s=1

S
Y
s=1

s=1

"

1
Y
 Lj ts 1
e s
=
eLjs ts
s=S

#1

"
=

S
Y
s=1

#1
eLjs ts

4.3. THE SHADOW OR PSEUDO-HAMILTONIAN

73

The scheme is of even order: Reason: the scheme is time reversible, so each error
found by applying the BCH formula must also be time-reversible, i.e., each term
X in the exponent satisfies T XT = X. Thus, the error terms are odd in the
partial Liouvilleans. Since each Liouvillean comes with a factor of t, the local
error terms are odd in t.
The resulting global error is then even in t.
If the full Hamiltonian conserves a quantity Q, i.e. {H, Q} = 0, and if also
each partial Hamiltonian Hj also satisfies {Hj , Q} = 0, then the quantity Q
is conserved in each step in equation (4.32), and thus exactly conserved in the
integration scheme.

4.3

The shadow or pseudo-Hamiltonian

Energy H is rarely conserved in integration schemes, even symplectic ones.


Nonetheless, the energy is almost conserved. We will now see in what sense.
In the derivation of the splitting schemes, we used that repeated commutators of smalltime step Liouvilleans are higher order corrections, so that they may be omitted.
In fact, one can show that the commutator of two Liouvilleans LH1 and LH2 associated
with two Hamiltonians H1 and H2 is again a Liouvillean of another Hamiltonian:
[LH1 , LH2 ]A = LH1 LH2 A LH2 LH1 A
= LH1 {A, H2 } LH2 {A, H1 }
= {{A, H2 }, H1 } {{A, H1 }, H2 }
= {{A, H2 }, H1 } + {{H1 , A}, H2 }.
Using the Jacobi identity for Poisson brackets
{{A, B}, C} + {{B, C}, A} + {{C, A}, B} = 0,
we find
[LH1 , LH2 ]A = {{H2 , H1 }, A}
so that
[LH1 , LH2 ] = L{H2 ,H1 } .

74

4. MOLECULAR DYNAMICS
Consider now the Verlet splitting scheme, writing for brevity X = LU t/2 and Y =
LK t and using the BCH formula to one order further than before:
1

eX eY eX = eX+Y + 2 [X,Y ]+ 12 [X,[X,Y ]]+ 12 [Y,[Y,X]]+... eX


1

= eX+Y + 2 [X,Y ]+ 12 [X,[X,Y ]]+ 12 [Y,[Y,X]]+X+ 2 [X+Y + 2 [X,Y ],X]+ 12 [X+Y,[X+Y,X]]+ 12 [X,[X,X+Y ]]+...
1

= e2X+Y + 2 [X,Y ]+ 12 [X,[X,Y ]]+ 12 [Y,[Y,X]]+ 2 [Y + 2 [X,Y ],X]+ 12 [X+Y,[Y,X]]+ 12 [X,[X,Y ]]+...
1

1 1

= e2X+Y + 12 [X,[X,Y ]]+ 12 [Y,[Y,X]]+ 2 [ 2 [X,Y ],X]+ 12 [X+Y,[Y,X]]+ 12 [X,[X,Y ]]+...


= e2X+Y + 12 [X,[X,Y ]]+ 12 [Y,[Y,X]]+ 4 [[X,Y ],X]+ 12 [X,[Y,X]]+ 12 [Y,[Y,X]]+ 12 [X,[X,Y ]]+...
= e2X+Y + 12 [X,[X,Y ]]+ 12 [Y,[Y,X]] 4 [X,[X,Y ]] 12 [X,[X,Y ]]+ 12 [Y,[Y,X]]+ 12 [X,[X,Y ]]+...
1

= e2X+Y + 12 [X,[X,Y ]] 4 [X,[X,Y ]]+ 12 [X,[X,Y ]] 12 [X,[X,Y ]]+ 12 [Y,[Y,X]]+ 12 [Y,[Y,X]]+...


1

= e2X+Y 6 [X,[X,Y ]]+ 6 [Y,[Y,X]]+...

(4.34)

Re-substituting X and Y , we get


1

eLU t/2 eLK t eLU t/2 = eLH t 24 [LU ,[LU ,LK ]]t + 12 [LK ,[LK ,LU ]]t +O(t )
1
1
2
2
4
= e{LH 24 [LU ,[LU ,LK ]]t + 12 [LK ,[LK ,LU ]]t +O(t )}t
This is the evolution belonging to the following operator
1
1
Lshadow = LH [LU , [LU , LK ]]t2 + [LK , [LK , LU ]]t2 + O(t4 )
24
12
1
1
2
= LH [LU , L{K,U } ]t + [LK , L{U,K} ]t2 + O(t4 )
24
12
1
1
2
= LH L{{K,U },U } t + L{{U,K},K} t2 + O(t4 )
24
12
= LHpseudo ,
where the pseudo-Hamiltonian or shadow Hamiltonian is
1
1
Hpseudo = H {{K, U }, U }t2 + {{U, K}, K}t2 + O(t4 ).
24
12
N 2
N
If H is of the form |P | /(2m) + U (R ), one has

2
2K
U
1 U
U
1 N2
{{K, U }, U } =

=
=
|F |
RN P N P N RN
m RN
m
2U
K
1 N
2U
K
{{U, K}, K} =

=
P

PN,
P N RN RN P N
m2
RN RN
so that
1 U
U
1
2U
N
Hpseudo = H

+
P

P N + O(t4 ).
24m RN RN
12m2
RN RN
The leading correction to the scheme is of Hamiltonian form.

(4.35)
(4.36)

(4.37)

4.4. STABILITY LIMIT OF THE VERLET SCHEME FOR HARMONIC OSCILLATORS75


This could be worked out to any order in t, and because commutators of Liouvilleans
are Liouvilleans themselves, one would always find that the correction terms to the
integration scheme can in principle be taken together into a pseudo-Hamiltonian:
eLU t/2 eLK t/2 eLU t/2 = eLHpseudo t
where
Hpseudo =

Hh th ,

(4.38)

h=0

with all odd Hh zero and


H0 = K + U = H
1
1
H2 = {{K, U }, U } + {{U, K}, K}.
24
12
For a general splitting scheme (4.33), there is also a pseudo-Hamiltonian, although the
expressions for the Hh differ (except for H0 which is always H).
If the scheme is of order k, all Hh with 0 < h < k vanish.
We conclude that for schemes derived from Hamiltonian splitting schemes, the dynamics in the simulation is that corresponding to the pseudo-Hamiltonian rather than the
real Hamiltonian.
Since the dynamics generated by the splitting scheme is that of the pseudo-Hamiltonian,
the value of the pseudo-Hamiltonian is conserved in the dynamics.
Because the real Hamiltonian and the pseudo-Hamiltonian differ by an amount O(tk ),
the value of the real Hamiltonian varies in the simulation only up to that order.
No energy drift.

4.4

Stability limit of the Verlet scheme for harmonic


oscillators

Despite the theoretical prediction that splitting schemes should be stable, one sees in
practice that large time steps still lead to instabilities.
To understand that, we will apply the splitting method to a harmonic oscillator.
Hamiltonian of the harmonic oscillator:
1
1
H = p2 + r 2
2
2

76

4. MOLECULAR DYNAMICS
Note: mass and frequency have been set to 1.
Split the Hamiltonian in a kinetic part K = 12 p2 and a potential part U = 12 r2
Use the Verlet scheme
eLH t = eLU t/2 eLK t eLU t/2 + O(t3 )
Since LK = (J

K T
)

one has

 

r + pt
1 t
LK t
e
x=
=
x,
p
0 1

where
 
r
x=
,
p
so the operator eLK t acts as a linear operator.

)T
, and
Similarly, LU = (J U


 

1
r
1
0
LU t
2
e
x=
=
x.
p 21 rt
12 t 1

Combining these linear operators in a single Verlet step gives




1 12 t2
t
LU t/2 LK t LU t/2
e
e
e
x=
x.
t(1 14 t2 ) 1 12 t2

(4.39)

We saw above that a splitting method conserves a pseudo-Hamiltonian, which is composed of the original Hamiltonian plus repeated Poisson brackets of the partial Hamiltonians.
To leading order, we had
Hpseudo = H

1
1
{{K, U }, U }t2 + {{U, K}, K}t2 + . . .
24
12

Since {K, U } = pr, one finds


{{K, U }, U } = r2
{{U, K}, K} = p2

4.4. STABILITY LIMIT OF THE VERLET SCHEME FOR HARMONIC OSCILLATORS77


Thus, the first additional terms in the pseudo-Hamiltonian are of the same form as
those in the Hamiltonian itself.
Since higher order terms are repeated Poisson brackets, these too are of the same forms,
so the full pseudo-Hamiltonian can be written as a renormalized harmonic oscillator
Hamiltonian:
Hpseudo

p2
1
=
+ m 2 r2 ,
2m 2

where m and are a renormalized mass and a renormalized frequency.


From the leading order terms in the pseudo Hamiltonian, we find
t2
=1+
+ O(t4 )
24
t2
m=1
+ O(t4 )
6
As long as m and 2 are positive quantities, the scheme is stable, since a harmonic
oscillator is stable.
To test whether this is the case, we need to know all the correction terms.
In this harmonic case, this can be done, as follows:
We know the general form
form

p2
2m

+ 21 m 2 r2 of the Hamiltonian, which we write in matrix

Hpseudo = xT H x
where

H=

1
2m


0
.
1
m 2
2

The resulting Liouvillean is then



T
Hpseudo

Hpseudo
LHpseudo x = J
x=J
= 2J
| {z H} x,
x
x
x
L

so the linear matrix corresponding to the Liouvillian is given by




0 m 2
L=
m1
0

78

4. MOLECULAR DYNAMICS
The solutions of the equations of motion are determined by eLt , which is found to be


cos(t)
m sin(t)
Lt
e
=
1
m
sin(t)
cos(t)
This is not a surprising result, since it is just the solution of the classical harmonic
oscillator.
This result should coincide with the form on the right-hand side of equation (4.39).
We can therefore identify the renormalized frequency and mass in the splitting scheme:
1
1
arccos(1 t2 )
t
2
t
m=
sin(t)
=

(4.40)
(4.41)

Note that the arccos gives a real result for t 2.


For larger t, the arccos becomes imaginary, indicating that the scheme has become
unstable.
The way the instability limit is approached is illustrated in figure 4.1, where and m
are plotted as a function of t.
We see that while the renormalized frequency remains finite up to the limit t = 2,
the renormalized mass goes to infinity.
This divergence shows that the pseudo Hamiltonian is not bounded.
Only a bounded pseudo-Hamiltonian guarantees a stable algorithm.
So an instability for large time steps arises even for the simple case of an harmonic
oscillator
The instability arises from an unbounded, diverging pseudo-Hamiltonian.
Why could the pseudo-Hamiltonian diverge in the first place?
Consider equation (4.38) again, which gives the pseudo-Hamiltonian as a power series
in t.
Generally, power series converge only if t < t , where t is the radius of convergence.

4.5. MORE ACCURATE SPLITTING SCHEMES

renormalized quantities

79

frequency
mass

2.5
2
1.5
1
0.5
0

0.5

1
dt

1.5

Figure 4.1: The way the instability limit is approached when using the Verlet splitting
scheme on the harmonic oscillator. Plotted are the renormalized frequency and mass m
as a function of t. The unnormalized values were 1.
For the harmonic oscillator, the radius of convergence was clearly t = 2.
Also for general system, one expects a radius of convergence, i.e., a time step beyond which the pseudo-Hamiltonian becomes unbounded and the simulation becomes
unstable.

4.5

More accurate splitting schemes

Higher order schemes give better accuracy, but at the price of performing more force
computations.
Higher order schemes are very important in contexts such as astrophysics, where accurate trajectories matter.
For MD, depending on the level of accuracy required, higher order schemes can also
be advantageous.
Higher order schemes may be devised by
taking time points which are not evenly spaced,

80

4. MOLECULAR DYNAMICS
incorporating leading order correction terms in the pseudo-Hamiltonian
incorporating more time points,
or a combination of the above.
All but the first option lead to more force computations per unit of physical time,
which decreases the efficiency.
We will restrict ourselves to symmetric splitting schemes, to ensure time-reversibility.
To facilitate many of the derivations, we restate the symmetrized BCH formula of
equation (4.34)
1

eX eY eX = e2X+Y 6 [X,[X,Y ]]+ 6 [Y,[Y,X]]+fifth repeated X, Y

4.5.1

commutators

(4.42)

Optimized schemes

Given the same Hamiltonian split-up


H = K + U,
L = LK + LU ,
let us now explore a five-fold split up
eLt eLU t eLK t/2 e(12)LU t eLK t/2 eLU t .

(4.43)

Note that this is a case of uneven time-intervals.


Work out the inner three exponentials using equation (4.42)
eLU t eLK t/2 e(12)LU t eLK t/2 eLU t
(12)t3

(12)2 t3

= eLU t eLK t+(12)LU t 24 [LK ,[LK ,LU ]]+ 12 [LU ,[LU ,LK ]]+O(t ) eLU t
(
(1 2)t3
(1 2)2 t3
= exp LK t + LU t
[LK , [LK , LU ]] +
[LU , [LU , LK ]]
24
12
)
2 t3
t3

[LU , [LU , LK ]] +
[LK + (1 2)LU , [LK , LU ]] + O(t5 )
6
6
LH t+t3

=e

2
61
[LK ,[LK ,LU ]]+ 16+6
[LU ,[LU ,LK ]]
24
12

3 (

= eLH t+t

5
1 [LK ,[LK ,LU ]]+2 [LU ,[LU ,LK ]])+O(t )

+O(t5 )

4.5. MORE ACCURATE SPLITTING SCHEMES

81

where
6 1
24
1 6 + 6 2
2 =
12

1 =

If both 1 and 2 were zero, this would give a fourth order scheme.
However, this is not possible here: we have one parameter and two prefactors to set to
zero.
Alternatively, one could make the error as small as possible, e.g. by minimizing
12 + 22 . This gives
= 0.1931833275037836 . . .

(4.44)

The scheme in equation (4.43) that uses this value of is called the Higher-order
Optimized Advanced scheme of second order, or HOA2 for short.
In general, optimized schemes are based on minimizing the formal error, but this cannot
guarantee that the actual error is small: Numerical tests are necessary.
The HOA2 scheme requires two force computations:
e|L{zU t}

from previous step

eLK t/2

U t
e|(12)L
{z }

eLK t/2

1st force computation

e|L{zU t}

(4.45)

2nd force computation,

whereas the Verlet scheme requires just one.


Thus, to compare accuracies at fixed numerical cost, the time step should be taken
twice as large in the HOA2 scheme as in the Verlet scheme.
The HOA2 scheme was tested on a model of water (TIP4P) [Van Zon, Omelyan and
Schofield, J. Chem. Phys 128, 136102 (2008)], with the following results:
As long as it is stable, the HOA2 scheme leads to smaller fluctuations in the total
energy than the Verlet scheme at the same computational cost.
In the water simulation, the HOA2 scheme becomes unstable for smaller time
steps than the Verlet scheme.
The superior stability of the Verlet scheme means that it is the method of choice
for quick and dirty simulations of water with an accuracy less than approximately
1.5% (as measured by the energy fluctuations).

82

4. MOLECULAR DYNAMICS
For more accurate simulations, the HOA2 is more efficient, by about 50%, until
the HOA2 scheme becomes unstable.
The higher instability of HOA2 at large time steps, is due to the uneven time-steps
than are taken.
The average time step determines the computational cost, but the largest of the time
steps determines the stability.
Using uneven time steps instead of even time steps (at fixed computational cost) therefore increases some of the time intervals and decreases other.
uneven time steps become unstable for smaller time steps than even time step
variants.

4.5.2

Higher order schemes from more elaborate splittings

While using the HOA2 scheme is beneficial, this could only be known after numerical
tests.
True higher-order schemes are a bit better in that respect: one knows that at least for
small enough t, they are more efficient.
The simplest way to get a higher order scheme is to concatenate lower order schemes.
To understand this, note that if we have a k-th order scheme Sk (t) approximating
S(t) = eL up to O(tk+1 ), i.e., if
Sk (t) = S(t) + tk+1 S + O(tk+3 )
then
Sk (s)Sk (t 2s)Sk (s)
h
i
= S(t) + 2sk+1 + (t 2s)k+1 S + O(tk+3 )

(4.46)
(4.47)

The leading order term can be eliminated by choosing


2sk+1 = (t 2s)k+1 ,

(4.48)

which, if k is even, can be solved and gives


s =

t
.
2 21/(k+1)

(4.49)

4.5. MORE ACCURATE SPLITTING SCHEMES

83

When Sk = S2 is given by the second Verlet scheme, the corresponding fourth order
scheme becomes
eLt eLU s/2 eLK s eLU s/2 eLU (t/2s) eLK (t2s) eLU (t/2s) eLU s/2 eLK s eLU s/2
= eLU s/2 eLK s eLU (ts)/2 eLK (t2s) eLU (ts)/2 eLK s eLU s/2

(4.50)

This is called the fourth order Forest-Ruth integration scheme (FR4). Note that it has
seven parts and requires three force evaluations per time step.
Note that s > t/2, so that t 2s < 0.
It therefore requires to take a negative time step:
This tends to lead to instabilities.
One can prove that order k splitting schemes using a two-operator split up (such as
LU and LK ) must have at least one negative time step if k > 2.
The negative steps thus seem unavoidable.
One can minimize these however, by allowing more general splitting schemes than those
constructed from equation (4.46), i.e., using the general form in equation (4.32).
This gives more parameters, allowing one to combine this with the higher-order nature
with optimization, i.e. minimizing the leading error terms (cf. the HOA2 scheme).
A good fourth order scheme of this type is called EFRL4 (Extended Forest-Ruth-like
Fourth order scheme) and looks like this:
1

eLt = eLU t eLK ( 2 )t eLU t eLK t eLU (122)t eLK t eLU t eLK ( 2 )t eLU t
(4.51)
+ O(t5 ),
where
= 0.3281827559886160
= 0.6563655119772320
= 0.09372690852966102
Even though this requires four force evaluations for each time step, it is more efficient
than the FR4 scheme due to a much smaller leading order error.

84

4. MOLECULAR DYNAMICS

4.5.3

Higher order schemes using gradients

Gradients are, in this context, derivatives and higher-order derivatives of the potential.
Using gradients as auxiliary Hamiltonians, one can reduce the order of a scheme
The simplest can be derived by considering once more the five-fold splitting scheme,
before optimization:
3 (

eLU t eLK t/2 e(12)LU t eLK t/2 eLU t = eLH t+t

5
1 [LK ,[LK ,LU ]]+2 [LU ,[LU ,LK ]])+O(t )

3
5
= eLH t+t (1 L{{U,K},K} +2 L{{K,U },U } )+O(t )

where
6 1
24
1 6 + 6 2
2 =
12

1 =

and let K and U be of the usual form such that (cf. equations (4.35) and (4.36))

2
U
2K
U
1 U
1
{{K, U }, U } =

=
= |F N |2


N
N
N
N
N
R
P P
R
m R
m
2
2
U
K
1 N
U
K

=
P

PN.
{{U, K}, K} =
P N RN RN P N
m2
RN RN
The former depends only on RN , but the latter is more complicated and depends on
both RN and P N .
We can eliminate the more complicated term by setting 1 = 0 = 1/6, leaving us
with
1

1
L N 2 +O(t5 )
LH t+t3 72m
|F |

e 6 LU t e 2 LK t e 3 LU t e 2 LK t e 6 LU t = e

LK +L

=e

t2 |F N |2
U + 72m

t+O(t5 )

This equation holds for any form of U !


Thus we may substitute for U the expression
t2 N 2
U = U
F
,
72m

(4.52)

4.5. MORE ACCURATE SPLITTING SCHEMES

85

giving
e

1
L t
6 U

1
L t
2 K

2
L t
3 U

1
L t
2 K

1
L t
6 U

LK +L

t2 |F N |2
U + 72m

=e

t+O(t5 )

5)

= e[LK +LU ]t+O(t ) = eLH t+O(t


A fourth order integration scheme!
Note that no negative time steps were needed.

The scheme in the current form uses an effective potential


U , which differs from the

t2 N 2
real potential U by an additional term U = 72m F
of order O(t2 ). To leading
order, this term commutes with all factors, so one may also write
5

eLH t+O(t ) = e 6 LU t e 2 LK t e 3 LU t+U t e 2 LK t e 6 LU t


3

= e 6 LU t e 2 LK t e 3 [LU + 2 U ]t e 2 LK t e 6 LU t
1

= e 6 LU t e 2 LK t e 3 [LU ]t e 2 LK t e 6 LU t

(4.53)

where the modified potential in the middle step is


3
t2 N 2
U = U + U = U
F
.
2
48m

(4.54)

Scheme (4.53) is due to Suzuki.


To summarize: by taking uneven intervals and incorporating the correction terms
(using gradients of the potentials), we get a fourth order scheme which does not require
negative partial steps and only needs two force evaluations per step.
Note that to be able to use the modified potentials, the forces have to be well defined,
i.e., any cut-off has to be smooth enough.

4.5.4

Multiple time-step algorithms

Decompose the potential and the corresponding Liouville operator into two parts: one
for fast varying forces and the other for the slow varying forces:
U = Uf + Us

(4.55)

The fast motion could represent e.g. , intermolecular vibrations while the slowly varying
forces might be intermolecular forces.

86

4. MOLECULAR DYNAMICS
The simplest multiple time-step algorithm is then
 1
M 1
1
1
3
e 2 LUs t e 2 LUf t/M eLK t/M e 2 LUf t/M
e 2 LUs t = eLt+O(t ) .

(4.56)

While this is of second order, like the Verlet scheme, the time step for the fast part of
the motion if M times smaller than that of the slow motion.

5
Advanced topics
5.1
5.1.1

Hybrid Monte Carlo


The Method

One drawback of traditional Monte-Carlo simulation methods is that typically the method of
generating trial configurations based on a probability T(x y) results in trial configurations
y that are highly correlated with the initial configuration x, with only small differences
between the configurations.
As an example, consider a dense liquid system where one could generate trial configurations by randomly selecting one of the particles in the fluid and giving the particle a
random displacement.
Very inefficient since particles are close to one another on average in a dense fluid,
so the trial configuration has a high probability of either overlapping with another
particle (hard spheres) or being in the strongly repulsive region of the interaction
potential of another particle, resulting in a configuration of high potential energy.
Molecular dynamics, in contrast, moves all particles in a cooperative fashion, so
that the potential energy undergoes only small fluctuations.
It is possible to try to move a group or cluster of particles cooperatively using
tricks, but this requires some cleverness.
Can we devise a dynamical procedure of generating a trial configuration in which a large
number of degrees of freedom are moved cooperatively in a energetically reasonable
fashion?
Suppose we use a dynamical procedure to change a set of coordinates for use as a trial
configuration in a Monte-Carlo procedure. See S. Duane, A.D. Kennedy, B.J. Pendleton and
D. Roweth, Phys. Lett. B 45, 216 (1987).
87

88

5. ADVANCED TOPICS
We require a momentum coordinate p conjugate to each spatial degree of freedom x
we wish to evolve dynamically.
Suppose the evolution is a one-to-one mapping (i.e. deterministic evolution) of the
initial phase point (x0 , p0 ) to another phase point (xt , pt )
g t (x0 , p0 ) = (x0 (t), p0 (t)) = (xt , pt ).

(5.1)

The inverse of this mapping is well-defined, so that


g t (xt , pt ) = (x0 , p0 ).
Consider using this mapping to define a transition matrix in a Markov process somehow
so that the transition matrix is defined using the dynamics, K(x0 xt ).
We want our transition matrix to be stationary with respect to the target distribution
P (x), which requires:
Z
Z
dx0 P (x0 )K(x0 xt ) dxt P (xt )K(xt x0 ) = 0.
(5.2)
How can we define K? Suppose we draw the conjugate momenta p0 randomly from a
density m (p0 ) and then generate the phase point (xt , pt ) according to our mapping
Eq. (5.1). Starting from the initial phase point (x0 , p0 ), the probability of generating
the phase point (xt , pt ) by evolving the system using the deterministic dynamics over
a time interval t is therefore
Pg ((x0 , p0 ) (xt , pt )) = ((x0 (t), p0 (t)) (xt , pt )) = (g t (x0 , p0 ) (xt , pt )).
The transition probability K(x0 xt ) is therefore
Z
K(x0 xt ) = dp0 dpt m (p0 )Pg ((x0 , p0 ) (xt , pt )) A((x0 , p0 ) (xt , pt )), (5.3)
where A((x0 , p0 ) (xt , pt )) is the acceptance probability for the phase point (xt , pt )
if the initial configuration was (x0 , p0 ).
To define this acceptance probability, let (x, p) = P (x)m (p) be the probability
density for the augmented coordinate (x, p).
We then define the acceptance probability to be


(xt , pt )
A((x0 , p0 ) (xt , pt )) = min 1,
(x0 , p0 )

5.1. HYBRID MONTE CARLO

89

Claim: The transition probability Eq. (5.3) satisfies the stationarity condition Eq. (5.2)
provided
1. The dynamics is symplectic, so that the mapping is volume-preserving and timereversible.
2. The probability density for the conjugate momenta satisfies m (p) = m (T p) =
m (p), where T is the momentum inversion operator T f (p) = f (p).
Proof. To show Eq. (5.2) is satisfied, we must compute K(xt x0 ) under the dynamical
procedure. Suppose the dynamical mapping is symplectic and hence reversible. The
mapping g t therefore satisfies g t = T g t T and hence if g t (x0 , p0 ) = (x0 (t), p0 (t)) =
(xt , pt ), we have T g t T (xt , pt ) = (x0 , p0 ) and hence g t (xt , T pt ) = (x0 , T p0 ). From the
definition of the transition probability, we have
Z
K(xt x0 ) =

d(T p0 )d(T pt ) m (T pt )Pg ((xt , T pt ) (x0 , T p0 ))


A((xt , T pt ) (x0 , T p0 ))
Z

(5.4)

dp0 dpt m (pt )(g t (xt , T pt ) (x0 , T p0 ))A((xt , pt ) (x0 , p0 ))

since m (T pt ) = m (pt ), (xt , T pt ) = (xt , pt ) and (x0 , T p0 ) = (x0 , p0 ) by


assumption. Now
Z

dx0 dp0 dpt P (x0 )m (p0 )(g t (x0 , p0 ) (xt , pt ))




(xt , pt )
min 1,
(x0 , p0 )
Z

=
dx0 dp0 min (x0 , p0 ), (g t (x0 , p0 )) ,
(5.5)

dx0 P (x0 )K(x0 xt ) =

whereas
Z

dxt dp0 dpt (xt , pt )(g t (xt , T pt ) (x0 , T p0 ))




(x0 , T p0 )
min 1,
(xt , pt )
Z

=
dxt dpt min (xt , T pt ), (g t (xt , T pt ))
(5.6)

dxt P (xt )K(xt x0 ) =

90

5. ADVANCED TOPICS
Changing the variables of integration from (xt , pt ) to (x0 , p0 ) = T g t (xt , T pt ) gives
Z

Z
dxt P (xt ) K(xt x0 ) =
Z
=
Z
=


dx0 dp0 J((xt , T pt ); (x0 , p0 )) min (g t (x0 , T p0 )), (x0 , T p0 )
dx0 dp0 min (T g t (x0 , p0 )), (x0 , p0 ),


dx0 dp0 min (g t (x0 , p0 )), (x0 , p0 )

since the Jacobian J((xt , T pt ); (x0 , p0 )) of the transformation is unity due to the
volume-preserving property of the dynamical evolution. Since Eq. (5.5) and Eq. (5.7)
are equal, the equilibrium distribution is stationary under the transition matrix K.

5.1.2

Application of Hybrid Monte-Carlo

As an example of a useful application of hybrid Monte-Carlo, consider a bead polymer system


where the interaction between monomers is determined by a potential of the form
U (r

(N )

) =

N X
N
X
i=1 j=i+4

Vnb (|ri rj |) +

N
X

Utor (i ) +

i=4

N
X

Ubend (i ) +

i=3

N
X

Ubond (|ri ri1 |).

i=2

In this potential, we have the following contributions:


1. A bond-stretching potential Ubond that typically is harmonic.
2. A bond-angle potential Ubend , taken to be harmonic in cos i , where i is the bond angle
defined by
li li1 cos i = (ri ri1 ) (ri2 ri1 ),
where li = |ri ri1 | is the bond distance between monomers i and i + 1.
3. A torsional potential Utor that depends on the torsional angle i , defined to be the
angle between the normal vectors to the planes formed by monomers (i 3, i 2, i 1)
and (i 2, i 1, i). The torsional angle can be computed readily from the Cartesian
positions of monomers i 4 through i.
4. A non-bonded potential Unb that describes interactions between monomers separated
from one another by at least 4 other monomers. This potential typically is of LennardJones form, and can also include electrostatic interactions.

(5.7)

5.1. HYBRID MONTE CARLO

91

The configuration of the polymer can be specified by either the Cartesian coordinates of
all the monomers, (r1 , . . . , rN ), or by a set of generalized coordinates (r1 , q2 , . . . , qN ) where
qi = (li , i , i ).
The coordinates 2 , 3 and 4 can be defined with respect to fictitious fixed Cartesian
coordinates r0 and r1 that do not change.
Cartesian positions can be calculated from the generalized coordinates via a set of
locally-defined spherical polar reference frames in which the monomer i is placed along
the x-axis at a distance li from the origin, chosen to be the Cartesian position of bead
i 1. Then the Cartesian position of bead i can be written in the lab frame (defined
to be the frame of bead 1) as
ri = r1 + T2 l2 + T2 T3 l3 + T2 T3 T4 l4 + + T2 T3 Ti li ,

(5.8)

where li = (li , 0, 0) is the vector position of bead i in the local reference frame for
bead i with monomer i 1 as an origin. In Eq. (5.8), the matrix Ti (i , i ) is the
transformation (a rotation) matrix between the reference frames of bead i and bead
i 1, and is given by

cos i
sin i
0
Ti = sin i cos i cos i cos i sin i .
sin i sin i cos i sin i cos i
Note that Tlab
k = T2 Tk transforms between the reference frame of bead k and the
fixed lab frame and Tlab
k lk is the lab frame representation of the relative vector rk rk1
connecting monomers k and k 1.
Ensemble averages hAi of a variable A can be written as integrals over Cartesian or
generalized coordinates using
Z
hAi =

dr1 . . . drN P (r1 , . . . , rN )A(r1 , . . . , rN )


Z

dr1 dq2 . . . dqN J(q2 , . . . , qM )P (r1 , . . . qN )A(r1 , . . . qN ),

92

5. ADVANCED TOPICS
where J(q2 , . . . , qN ) is the Jacobian of the transformation between Cartesian and generalized coordinates. It can be shown that
J(q2 , . . . , qN ) =

N
Y

li2 cos i .

i=2

We could construct a Monte-Carlo procedure to generate equilibrium configurations of


the polymer either
Randomly displacing Cartesian positions of the monomers and accepting/rejecting
trial configurations based on energy differences. This typically is very inefficient
since the bond-stretching potential greatly restricts the range of displacements
allowed that are accepted.
Randomly displacing the generalized coordinates qi = (li , i , i ). This can result
in large conformational changes as small rotations around one of the torsional
angles rotates all the subsequent monomers. In some situations, the drastic change
in conformation caused by changing a single dihedral angle can lead to leads to
trial configurations with large energies, and hence poor acceptance.
General problem is wed like to move the generalized coordinates q together in
a cooperative fashion to change the configuration of the polymer in a reasonable
way. Ideally, wed like to keep some degrees of freedom, such as bond lengths,
fixed during the generation of a trial configuration. This can be accomplished by
using a dynamical updating scheme on a restricted set of coordinates ({i , i }).
Implementation: use hybrid Monte-Carlo with fictitious momenta Pi and Pi conjugate to each of the selected coordinates i and i .
Draw momenta for degrees of freedom of monomer i from Boltzmann weights
P2

2mi

m (i , i ) e

2
P

2mi

where the effective masses m and m are arbitrary.


Evolve the configuration (i , i , Pi , Pi ) for a fixed amount of time using a symplectic integrator, preferably with a large time step to change the configuration
rapidly. The equations to integrate are
Pi
i =
m
U
Pi =
i

P
i = i
m
U
Pi =
.
i

5.1. HYBRID MONTE CARLO

93

A symplectic integrator is easy to construct based on the usual Hamiltonian


splitting scheme of separating the kinetic and potential energy terms and
defining Liouville operators for each of the terms.
The effective forces in the evolution of momenta require evaluation of derivatives rj /i and rj /i .
Track the effective Hamiltonian

X  P2
P2i
i
He =
+
+ U (r1 , q2 , . . . , qN ).
m
m

i
The stationary distribution for the phase point in the Monte-Carlo process is
proportional to eHe , due to the form of the densities of the momenta.
If the current configuration at the start of the dynamical update is X, accept the
trial configuration Y generated by the trajectory with probability


J(Y) He
min 1,
e
,
J(X)
where He = He (Y) He (X).
Note that if the dynamics was integrated perfectly, He = 0 since the effective
Hamiltonian is conserved by the dynamics.
Comments:
1. The smaller the time step, the smaller the average He and the greater the
acceptance probability.
2. The smaller the time step, the smaller the overall change in configuration of the
system for a fixed number of molecular dynamics steps.
3. The dynamics is fictitious, as the equations of motion are not the Hamiltonian
equations for the generalized coordinates. In particular, the kinetic energy in the
Hamiltonian has a different form.
4. The Jacobian factor is important in the acceptance probability. However, if the
li and i are held fixed and only the torsional angles evolve, the Jacobian factor
is constant.
5. Any subset of coordinates can be selected for updating for any trial move. It may
be advantageous to select different groups of coordinates for updating at different
times.

94

5. ADVANCED TOPICS

5.2

Time-dependent correlations

We have considered primarily averages of static properties that are constructed out ensemble
averages involving a single phase point at a time. We have seen that these averages may be
computed by either:
1. A Monte-Carlo algorithm that generates ensemble averages by sampling phase points
from the phase space density for the ensemble. Different ensemble averages can be
computed by altering the transition matrix. Within this class of algorithms, we include
the hybrid Monte-Carlo scheme, which uses a dynamical procedure to generate trial
configurational coordinates.
2. A molecular dynamics procedure that uses the real Hamiltonian dynamics of the system
to compute the time average of a dynamical variable. By hypothesis, the time average
is equal to the micro-canonical ensemble average. In the dynamical evolution, the
micro-canonical phase space density and phase space volume are conserved, which
reflects the conservation of probability.
In many physical situations, we are interested in computing quantities that obey some physical law, and which may involve the real dynamics of the system. For example, suppose one
is interested in computing how an initially nonuniform concentration profile (say a drop of
dye in water) is smoothed in the absence of flow or stirring. Such a process is described
phenomenologically by the law of diffusion (Ficks law), which states that the flux j of the
diffusing species is proportional to the negative gradient in the concentration of the species:
c(r, t)
,
j = D
r
where c(r, t) is the local concentration of the species and D is a constant known as the
diffusion coefficient. Under Ficks law, the concentration obeys the equation

c(r, t)
= j(r, t)
t
r
c(r, t)
= D2 c(r, t).
(5.9)
t
If the initial dye is concentrated in a small region c(r, 0) = (r), then the concentration
profile is
1
r 2 /(4Dt)
c(r, t) =
e
.
(4Dt)3/2
R
Note that the concentration profile is normalized, dr c(r, t) = 1. This is of the form of a
Gaussian distribution with a time-dependent width 2Dt. Hence the diffusion coefficient is
related to the second moment of c(r, t):
Z

2
r (t) = dr r2 c(r, t).

5.2. TIME-DEPENDENT CORRELATIONS

95

The second moment can be interpreted to mean the average distance squared that a particle
moves away from the origin in a time interval t. To see how one might compute the coefficient
D, we multiply Eq. (5.9) by r2 to compute the second moment to get
Z
Z
hr2 (t)i
2 2
= D dr r c(r, t) = 6D dr c(r, t) = 6D,
t
after the angular integrals have been carried out using the spherical symmetry of c(r, t).
This equation suggests that hr2 (t)i = 6Dt, so that a plot of the average squared distance
versus time should be linear with slope 6D. To compute diffusion coefficient from the timedependent correlation function hr2 (t)i, we must:
1. Draw an initial configuration according to the correct phase space density (i.e. microcanonical, canonical, etc..).
2. Propagate the system using the correct dynamics and measure the displacement vector
ri (t) = ri (t) ri (0) of each particle for a set of different times t.
3. If we ignore any cross-correlation between the motion of particles, we can measure the
self-diffusion coefficient Ds by calculating
N
hr2 (t)i
1 X ri (t) ri (t)
Ds =
=
.
6t
N i=1
6t

Rt
From the equations of motion, ri (t) = 0 d vi ( ), and hence the average can be
written as
Z t
N Z
1 X t
2
d 0 vi ( ) vi ( 0 ).
d
hr (t)i =
N i=1 0
0
Note that the procedure consists of two different steps, the drawing of the initial phase
points and then the propagation of the system.
General time-dependent correlation functions of the form
Z
hAB(t)i = dx(N ) f (x(N ) )A(x(N ) )B(x(N ) (t))
can be computed analogously.
If the ensemble is not microcanonical, the initial points must be drawn using
a Monte-Carlo algorithm (or with a special dynamical procedure) that generates
phase points according to f (x(N ) ). Then the system must be evolved using the real
dynamics of the system, preferably with a symplectic integrator. This evolution
therefore evolves the system with constant energy.

96

5. ADVANCED TOPICS

5.3

Event-driven simulations

Consider a system of N impenetrable hard spheres that interact via the potential

r
U (r) =
,
0 r>
where r is the distance between the centres of two hard spheres and is the diameter of the
spheres.
The probability of finding the system in a configuration in which spheres overlap is
zero.
The energy of the system is given by the Hamiltonian

N
X
p2i
1X
K if no overlap
H=
+
U (rij ) = K + U =
.
otherwise
2m
2
i=1
i,j
How can dynamics of this system be performed on a computer?
Particles coming together should bounce off one another, but preserve the total energy
H.
To examine the complications of systems with discontinuous potentials, consider a system
interacting with the short-ranged potential
10
U (r, 100)
U (r, 50)

U (r, ) = e(r)
f (r, ) = 2 e(r)

1.02

1.04

1.06

1.08

1.1

1.12

1.14

1.16

1.18

1.2

r/

As , the potential approaches the hard sphere potential.


Note that the force on sphere i due to sphere j is given by Fij = f (rij )rij , where rij
is the unit vector along the relative vector rij = rj ri . This force becomes infinite in
magnitude and infinitely short-ranged around as .

5.3. EVENT-DRIVEN SIMULATIONS

97

If integrated by Verlet-scheme, the shadow Hamiltonian Hs to leading order in t is


Hs = H +

t2 X
2U
t2 X
p

Fi Fi + O(t4 ).
i
j
2
12m i,j
ri rj
24m i

From form of the potential, we see that the correction term is proportional to (2 t)2 ,
and hence the radius of convergence of the shadow Hamiltonian shrinks to zero as
.
The integrator is unstable for any choice of t.
In impulsive limit , force acts discontinuously at a time tc where rij (tc ) = .
How can we integrate equations of motion with impulsive forces that act at only one
time? Consider the integral form of the equation of motion for the momentum
Z

t+t

pi (t + t) = pi (t t) +

d Fi ( ).
tt

In impulsive limit, the force takes the form


rij (rij )
Fij = S
= Srij (t tc ),
where the second equality is obtained by solving the equation rij (tc ) = and rewriting
the delta function in terms of time.
S is a constant dependent on the configuration at the moment of collision, to be
determined.
Note that the direction of the force is along the relative vector rij for a central
potential that depends only on the magnitude |rij | of the relative vector rij .
Inserting the impulsive force into the momentum equation gives

pi = pi (t + t) pi (t t) =

Srij if tc [t t, t + t]
0
otherwise.

In impulsive limit, only one collision at most can occur as t 0.


How is the collision time determined? Solve for the collision time by considering when
rij (tc ) = or rij (tc ) rij (tc ) = 2 .

98

5. ADVANCED TOPICS
Up to the moment of collision, the pair of spheres i and j move freely with constant
momentum, so
pi
tc
mi
pj
tc
rj (tc ) = rj (0) +
mj
ri (tc ) = ri (0) +

rij (tc ) = rij (0) + vij tc

The collision time is therefore determined from


rij (tc )2 = 2 = rij (0)2 + 2vr tc + vij2 t2c
2
2
2vr tc rij
+
=0
vij2
vij2
1/2
vr
1
tc = 2 2 vr2 2ij
,
vij
vij

t2c +

if vij2 6= 0, where vr = rij vij is the projection of the relative along the relative
2
vector rij and 2ij = (rij
2 )vij2 .
Real solutions exist if vij2 6= 0 and vr2 2ij .
2
> 2 and hence 2ij > 0.
If no initial overlap, then rij
1/2
.
If 2ij > 0, then |vr | > vr2 2ij
1. If vr > 0, then tc < 0. Particles are moving away from one another and will
not collide (in a non-periodic system).
2. If vr < 0, the particles moving towards one another and 2 positive roots are
found:
q
vr vr2 2ij
tc =
= time of initial contact
(5.10)
v j2
qi
vr + vr2 2ij
= time particles pass through one another
tleave =
vij2

1
2

v12
1

tc

tleave

5.3. EVENT-DRIVEN SIMULATIONS

99

Determination of magnitude of impulse S is based on conservation principles. If the


potential depends only on the magnitude of the relative vector rij and the Hamiltonian
does not depend explicitly on time, then linear momentum as well as total energy must
be conserved once the collision has occurred.
Conservation of linear momentum implies:
p0i + p0j = pi + pj ,
p0i = pi + pi = pi + Srij
p0j = pj + pj = pj Srij

(5.11)

where p0i is the momentum of particle i after the collision with particle j.
Conservation of energy implies that the post-collisional energy H 0 is equal to the precollisional energy H, or
pj pj
p0i p0i p0j p0j
p p
+
= i i+
,
2mi
2mj
2mi
2mj
which, using Eq. (5.11), gives a condition on the impulse S
S2
Svr = 0
2

S = 2vr ,

where = mi mj /(mi + mj ) is the reduced mass.


After a collision, the momentum are therefore given by
p0i = pi + 2vr rij
p0j = pj 2vr rij .

5.3.1

(5.12)

Implementation of event-driven dynamics

For the hard sphere system, the dynamics can be executed by:
1. Initially calculate all collision events for system using Eq. (5.10).
Boundary conditions must be properly included: hard walls, periodic system, ...
2. Find first collision event and evolve system (with free evolution) up to that time.
3. For colliding pair, compute the consequences of the collision using Eq. (5.12).

100

5. ADVANCED TOPICS

4. Compute the new first collision time for system after the momentum adjustments and
repeat steps 2 to 4.
Tricks of the Trade: A number of standard techniques have been developed to improve
the efficiency of event-driven simulations. These include:
1. Cell division and crossing events
If cubic cells of length are used to partition the system, collisions of a given
particle in a cell can occur only with particles in the same or neighboring cells
before the particle moves out of a cell.

If a particle moves out of a cell (i.e. a cell-crossing event), the new neighboring
cells must be check for a collision.
Cell-crossing time is analytically computable.
Can treat cell-crossing as an event to be processed like a collision, with the processing of the event defined to mean the computation of new collision times with
particles in the new neighboring cells.
2. Local clocks
The position of any particle not involved in an event does not need to be updated
since it will continue to move freely until it is specifically involved in a collision.
Time of last event for each particle can be stored and used to compute interaction
times or updates of positions.
Global updates of the positions of all particles must be performed before any
measurement of the system takes place.
3. Elimination of redundant calculation of event times

5.3. EVENT-DRIVEN SIMULATIONS

101

If a collision occurs between a pair of particles i j, new event times result only
for particles that have an collision event involving particle i or j as a partner.
Most event times unaffected in large system, and need not be recomputed.
Can devise routines that only compute new possible events following a collision.
4. Usage of data structures to manage event times
Search for first event time in system can be facilitated by use of binary tree data
structures, using the event time as an ordering key.

Functions required to insert new events in tree and search for earliest times.
Information for whether an event in the tree has been invalidated by an earlier
event can be stored in each node of tree.
Small hybrid tree structures that only insert valid events in the near future are an
efficient means of managing events in the system. These data structures typically
use linked lists of events in specific time intervals that are periodically used to
populate the binary tree.

5.3.2

Generalization: Energy discretization

To mimic systems interacting by a continuous potential U (r), one can construct a discontinuous potential V (r) with discrete energies [see van Zon and Schofield, J. Chem. Phys. 128,
154119 (2008)]:

V (r) = Umin +

K
X
k=1

(U (r) Uk )Vk ,

102

5. ADVANCED TOPICS

where is the Heaviside function and Vk is the change in potential when U (r) = Uk .

If pair of particles has a potential energy U (r) under the continuous potential, where
U
Pkk < U (r) < Uk+1 , then the interaction is assigned potential energy V (r) = Umin +
i=1 Vi .
Collision times at which U (r) = Uk ca be solved analytically for simple potentials.
By examining representations of the Heaviside function, one can derive that the impulse
on body i due to an interaction with another body j can be written as
p0i = pi + Fij (tc )
Fij (t) = SFij (tc )(t tc ),
where Fij (t) is the force on i due to j arising from the continuous potential U (rij ).
At discontinuity at Uk , the impulse S satisfies
S2

Fij Fij
+ Svij Fij + Vk = 0.
m

If real roots of quadratic exist, the physical solution is given by


1. Positive branch of root if vij Fij > 0.
2. Negative branch of root if vij Fij < 0.

5.3. EVENT-DRIVEN SIMULATIONS

103

If roots are complex, the collision is a reflection (bounce back) due to inadequate
total energy to overcome the discontinuity. In this case, S = mvij Fij /Fij2 .
High energy

Low energy

The level of discretization Vk /(kT ) relative to kT is an adjustable parameter.


For small values Vk /(kT )  1, the dynamics is effectively equivalent to that in
the continuous potential system.
Method is ideally suited for low density systems where free motion dominates.
First event corresponds to time at which two particles enter a range of the potential, which can be quite rare.
In ordinary molecular dynamics, the integration time step is restricted by the
curvature of the potential in the repulsive region. This can be extremely small if
the potential is short-ranged.
Event driven dynamics is like an adaptable time step integrator, where large time
steps are used in between interactions while small time steps are used in the
interaction region.
Method can be implemented for rigid body systems [de la Pena et al., J. Chem. Phys.
126, 074106 (2007)].
Requires the solution of free motion [van Zon and Schofield, J. Comput. Phys.
225, 145 (2007)].
Torques, angular velocities and orientational matrices necessary.
Need to use numerical methods to find event times.

104

5.4

5. ADVANCED TOPICS

Constraints and Constrained Dynamics

Typical time scales of most intramolecular motions are 10 to 50 times shorter than the
translational time scale of a molecule.
Time step of an integrator determined by shortest relevant time scale.
Multiple time step methods can be used to deal with differing time scales in dynamical
simulations.
Many observables are independent or insensitive to intramolecular motion.
Molecular conformation is usually only weakly dependent on bond lengths. Bond
vibrations are typically restricted to small motions on rapid time scales.
Some bond angles remain relatively constant, aside from high frequency oscillations.

5.4.1

Constrained Averages

Suppose a set of ` variables, such as bond lengths, are effectively constant during a
simulation.
We will assume that the constraints are only functions of the positions, and independent
of momenta. Such constraints are called holonomic.
Can specify these constraints by
1 (r(N ) ) = 0

such as

2
1 (r12 ) = r12
d2

2 (r(N ) ) = 0
..
.
We write the condition that the set of all constraints = (1 , . . . , ` ) are satisfied in
the compact notation
`
Y

(i (r(N ) )) = ().

i=1

An ensemble average of an observable A(r(N ) ) that depends only on the configuration


of the system can be written as
Z
Z
(N )
(N )
(N )
(N )
(N )
(N )
hA(r )i =
dr dp P (r , p )A(r ) = dr(N ) (r(N ) )A(r(N ) ),

5.4. CONSTRAINTS AND CONSTRAINED DYNAMICS


where P (r(N ) , p(N ) ) is the full phase space density and (r(N ) ) =
is the configurational phase space density.

105
R

dp(N ) P (r(N ) , p(N ) )

If A(r(N ) ) depends only weakly on the constraints,


Z
Z
(N )
(N )
(N )
(N )
(N )
(N )
hA(r )i
dr dp P (r , p )A(r )()/ dr(N ) (r(N ) )()
Z
=
dr(N ) con (r(N ) , = 0)A(r(N ) , = 0) = hA(r(N ) )ic
How can the conditional ensemble average h. . . ic be computed?
This can be analyzed most easily by working in generalized coordinates rather than
Cartesian coordinates.
Define a coordinate transformation r = (r1 , . . . , rN ) u = (u1 , . . . , uN ), where the
last ` coordinates are the ` constraint conditions . We can represent u = (qi , ),
where the dimension of the qi is 3N ` and the dimension of is `.
If p = (p1 , . . . , pn ) and the potential of the system is V (r), then the Lagrangian and
the Hamiltonian in the Cartesian coordinates are
L(r, r ) =

1
r m r V (r)
2

1
H(r, p) = p m1 p + V (r),
2

1
where mij = mi i,j and m1
ij = mi i,j .

Using r(u), the Lagrangian in the generalized coordinates can be written as


1X
1
ri ri

u V (u) = u G u V (u)
mi u
2 i
u u
2
X
ri ri
=
mi

u u
i

=
L(u, u)
G

where we have used a notation that repeated Greek indices are summed over. The
conjugate momenta pu to the generalize coordinates are therefore
pu =

L
= G u
u

so

u = G1 pu ,

leading to the Hamiltonian


1
H = pu q L = pu G1 pu + U (u).
2

106

5. ADVANCED TOPICS
Note from the definition of the matrix G, we have
X 1 u u
G1
=

mi ri ri
i
Note that the equations of motion in the Cartesian phase space coordinates =
(r, p) and in the generalized phase space coordinates = (u, pu ) are in symplectic
form
= J

H()
= J
.

Transformations that preserve the symplectic form are called canonical.


Claim: The phase space probability satisfies P ()d = P (())d.
Proof. Consider the transformation of phase space coordinates = (). The time
derivative of this relation gives

=
= M

M =

From the symplectic form of the equation of motion for , we see that
H
= M J
.

Considering the inverse of the transformation, = (), we find that


H
H
H
=
=
M

so

H
H
= MT
.

Thus we find the equation of motion for the transformed phase space coordinates obeys
 H
= M J MT
.

This equation maintains symplectic form if M J M = J . Now consider the transformation of the volume element d = | det M|d.
If the transformed coordinates preserve

T
the symplectic form, then det M J M = det (J ) = det2 (M) det (J ), and hence
det (M) = 1, and so d = d. If the phase space probability is P ()d, it therefore
follows that P ()d = P (())d.

5.4. CONSTRAINTS AND CONSTRAINED DYNAMICS

107

From this equality, the canonical configurational density can be expressed as


Z
Z
dr
du
u 1 u
H
(r)dr =
dpe
=
dpu e/2 p G p eV (u)
Z
Z

V (u)
= c det G e
du = (u)du,
where c is a normalization constant.
From our transformation where u = (q, ), the conditional density is therefore

con (q, = 0) = c det G eV (q,=0) .

Note that the factor det G is related to the Jacobian of the transform from
Cartesian spatial coordinates r to generalized spatial coordinates u.
Conditional averages can be computed by either
1. Devising a Monte-Carlo procedure that works in the q generalized coordinates.
Note trial moves must not violate the constraints = 0, which is easy to implement if generalized spatial coordinates are used. The transition matrix should
have limit density of c , and therefore the Jacobian factor must be used in the
final acceptance criterion of the Monte-Carlo procedure.
2. Carrying out constrained dynamics, which effectively correspond to Hamiltonian
dynamics in a lower-dimensional sub-space of the full phase space (r, p) or (u, pu ).

5.4.2

Constrained Dynamics

To construct the equations of motion for a constrained system, consider the Lagrangian
written in the generalized coordinates u while the ` constraints = 0 are maintained:
1
=
L(u, u)
u G u V (u)
2
1
=
q A q V (q, = 0),
2
since = 0 under the constraint. In this equation, the matrix A is a sub-matrix of G given
by
X
ri ri
A =
mi

,
q q
i
which is of dimension (N `) (N `). From the Lagrangian, we construct the Hamiltonian
in generalized coordinates
1
L

Hc = pq A1 pq + V (q, = 0)
pq =
= A q.
2
q

108

5. ADVANCED TOPICS

Note that there is no momentum conjugate to the fixed coordinates .


The canonical probability density for this Hamiltonian system is obtained from
Z
q 1 q
(q)dq = dq dpq e/2 p A p eV (q,=0)

= c0 det AeV (q,=0) dq

det A
con (q, = 0).
(q) = c
det G
Note that the probability density associated with the Hamiltonian dynamics of the
constrained system is (q), while the targeted constrained density is c (q, = 0).
Each configuration generated
by constrained dynamics must be weighted by ratio
p
of Jacobian factors det G/ det A.
How can this weight factor be evaluated?
We write G and its inverse G1 in block form:
!
X ri ri
X ri
X ri ri
A B

B
=
m

=
mi

A
=
m
G=
i
i
q q
q

BT
i
i
i
!
X 1 q q
X 1 q
X 1
E
1

E
=

Z
=

=
G =
mi ri ri
mi ri ri
mi ri
ET Z
i
i
i
We now define a matrix X so that det X = det A:

A 0
,
X =
T
B
I
where I is the identity matrix.
Writing X = G G1 X, we get

T
A+EB
E

X = G
T
T
E A+ZB
Z

G1 G =

A+EB
T

E A+ZB

so A + E BT = I and ET A + Z BT = 0, and hence


!
I E
.
X = G
0 Z

B+E
T

E B+Z

ri

ri

5.4. CONSTRAINTS AND CONSTRAINED DYNAMICS

109

Thus, det X = det A = det G det Z, from which we conclude


Z
det G
1
(N )
=
implying
hA(r )ic = dq det Z1/2 (q)A(q, = 0),
det A
det Z
where
Z =

X 1

m
r
ri
i
i
i

is a simple matrix to calculate from the constraint conditions (r(N ) ).


Specific example
2
Consider a molecular trimer like water with bond constraints 1 (r12 ) = r12
d2 = 0 and
2
d2 = 0.
2 (r23 ) = r23

2
d

If all atoms in the trimer have equal masses, then

Z =
m

3
3
X
1 1 X 1 2



2
ri ri i=1 ri ri
1
4r
2r

r
12
23
12
i=1
=
,
2
3
3

X
4r23
2 1 X 2 2 m 2r12 r23

ri ri i=1 ri ri
i=1

but |r12 | = |r23 | = d, so


det Z =

 8d4

8d4
1 1/4 (r12 r23 )2 =
1 cos2 /4 .
m
m

Procedure
Generate dynamics and then calculate constrained averages properly weighted by factor
det Z1/2 .
Constrained dynamics may be done in two different ways:

110

5. ADVANCED TOPICS
1. Re-writing Hamiltonian in generalized coordinates and using symplectic integrator
with Hamiltonian system.
Typically, Hamiltonian has complicated form as A1 is not diagonal.
2. Lagrange multiplier approach: Define Lagrangian with constraints
X
L0 = L
L=
mi /2ri2 V (r(N ) )
i

Equation of motion from constrained Lagrangian:


L0
L0
=
t r
r
V

miri =

,
ri
ri
provide the constraints are holonomic (depend only on r and not on p).
The dynamics in the full phase X = (r, p) is generated by the linear operator L0
L0

X
=
= X
X
i

pi

+ Fi

mi ri
pi

ri

pi

dA(X(t))
= L0 A(X(t))
dt
with formal solution A(X(t)) = exp {L0 t}A(X(0)).
The effective forces have be re-defined to include a constraint force Gi = /ri .
The Lagrange multipliers typically depend on both r and p, and so the system
is non-Hamiltonian, as the generalized forces depend on p.
The operator L0 is not obtained from the Poisson bracket of a Hamiltonian.
Even though the dynamics is not Hamiltonian in the full phase space X, we saw
earlier that it is Hamiltonian in a sub-space of the phase space X.
To solve for the Lagrange multipliers, note that the time-derivatives of the constraint
condition = 0 must vanish, so
= 0

so


X

ri
+ r i
= 0.
ri
ri
i

= 0

=0
ri

so

r i

5.4. CONSTRAINTS AND CONSTRAINED DYNAMICS

111

The equation of motion miri = Fi /ri therefore gives a linear equation for
the Lagrange multipliers

X  Fi
1
X
2


+
r i
r j = 0
m
m
r
r
r
r
i
i
i
i
i
j
i
i,j
or
= Z1
(F + T )

F =

X Fi

m
ri
i
i

T =

Xp
2 pj
i

.
m
r
r
m
i
i
j
j
i,j

and Z is the matrix defined earlier.


From this expression, one sees an explicit dependence of the Lagrange multipliers
on the momenta through the T term.
The equations of motion

pi


ri

mi
=
L0 X =

1
p i
Fi
Z (F + T )
ri
is difficult to solve in the full phase space. One can construct a symplectic integrator in a lower dimensional phase space, or resort to an iterative solution method
known as SHAKE.
Example:
Consider the example of a particle moving on a sphere of radius d centered at the origin.
The constraint condition on the position vector r can be written as = (r2 d2 )/2 = 0. For
this system, the constraint force G = r and Z = r2 /m and hence Z1 = m/r2 . We also
find
F=

Fr
F

=
m r
m

T = r

2
r = r 2 .
r2

leading to
m
= 2
r


Fr
2
+ r .
m

If there is no potential, F = 0, and = mr 2 /r2 . Noting that the angular velocity of a


particle on a sphere is = r/r,

we can write the constraint force as G = m 2 r, which is


known as the centripetal force, and the equation of motion can be written as
mr = r = m 2 r.

112

5. ADVANCED TOPICS

In finite difference form, this may be written as


r(t + t) = 2r(t) r(t t) 2 r(t)t2 .
How well would the constraint be satisfied at time t + t in this updating scheme if it is
satisfied exactly at time t, r2 (t) = d2 , as well as at time t t? Squaring the left hand side
of the equation above, we get

r2 (t + t) = d2 5 + (t)4 4(t)2 + cos (t)(2(t)2 4)


(t)4
2
6
= d 1
+ O(t ) ,
6
where we have used the exact solution r(t) = r(0) cos(t)+ r (0) sin t/. Although the error
in the constraint may appear small, it will build over the course of the simulation and will
eventually result in serious violations of the constraint condition. When working with the
difference equations, it turns out to be better to solve the Lagrange multiplier by requiring
that the constraint condition itself is satisfied exactly at each time step (rather than using
the second derivative condition). Consider the difference equation with an undetermined
Lagrange multiplier
r(t + t) = ru (t + t)

r(t),
m

where ru (t + t) = 2r(t) r(t t) = r(t) + r (t)t + . . . would be the position of the


particle at time t + t in the absence of the constraint. Imposing the constraint condition
results in a quadratic equation for
2

= r (t + t) =

ru2 (t

2
+ t) r(t) ru (t + t) +
m

2

r(t) ,
m

whose solution is
q

m
=
rp (t + t) rp2 (t + t) (ru2 (t + t) d2 ) ,
d
where rp (t + t) = r(t) ru (t + t) is the projection of the new unconstrained position
ru (t + t) along the direction r(t).
Cannot always solve for the exact that satisfies the constraint condition analytically
if multiple constraints are applied.
Iterative and efficient numerical solutions of the Lagrange multiplier exist, called
SHAKE.

5.4. CONSTRAINTS AND CONSTRAINED DYNAMICS

113

Basis of SHAKE algorithm


In situations where an analytical solution of the Lagrange multipliers is not possible, a
numerical solution can be found by the following procedure:
1. The equation of motion is written as a difference equation to some order in the time
step:


t2

u
ri (t + t) = ri (t + t)

,
mi
ri r(t)
where rui (t + t) is the solution of the unconstrained position of component i at time
t + t.
2. We require that all constraints are satisfied at the new time step, (t+t) = (r(t+
t)) = 0. To enforce this, we use an iterative procedure starting with a guess of
(0)
(0)
= 0 and taking the positions xi (t + t) = rui (t + t). We then Taylor expand
(0)
(0)
the constraint condition at the estimated coordinates xi around the difference i =
(0)
ri (t + t) xi (t + t),
X  
(0)
(0)
0 = (t + t) = (xi (t + t)) +
i ,
(0)
ri x
i
i

but from the difference equation, we have




t2 (1)
(0)
i =

+ O(t4 ),
mi
ri ri (t)
and hence we need
(0)
(xi )

X t2
i

mi

(1)

ri

(0)
xi

ri

(0) (1)

= t2 Z .
ri (t)

Solving this equation for the new estimate of the Lagrange multipliers (1) corresponds
to the matrix form
(0)1

(0)

t2 (1)
= Z (xi (t + t)).
3. Armed with this Lagrange multiplier, we form a new estimate of the constrained position at time t + t using

(1) 
t2

(1)
(0)
xi (t + t) = xi (t + t)
mi
ri ri (t)

114

5. ADVANCED TOPICS
(n)

4. We repeat the previous steps to get (n+1) by forming Z(n+1)1 and (xi ) until the
(n)
(n)
xi (t+t) converge, at which point ri (t+t) = xi (t+t) satisfy all the constraints
to some specified precision.
It is possible to avoid the matrix inversion step by modifying the algorithm to evaluate
the sequentially. In practice, this is carried out by taking only the diagonal terms
of the matrix Z1 . This effectively amounts to using the iteration
=
t2 (n+1)

(x(n) (t + t))
(n)

for each constraint in succession.

5.5

Statistical Mechanics of Non-Hamiltonian Systems

In the previous section, we saw that the constrained dynamics generated configurational
states with a probability density which differs from the constrained probability density
con . What ensemble does the constrained dynamics generate?
Consider the unconstrained Hamiltonian written in the generalized coordinates (u, pu ) =
(q, , pq , p ),
1
H(u, pu ) = puT G1 pu V (u),
2
where we recall the block form of the matrices G and G1

E
A B
,

G1 =
G=
T
T
E
Z
B

we see
and from pu = G u,
pq = A q + B

p = BT q +

In the constrained system, we have = 0 and = 0, and hence the momenta p


are no longer independent variables and must be constrained to a specific value that
depends on (q, pq ). In particular, we now have
q
pq = A

T q = B
T A
1 pq = p
,
p = B

= A(q, = 0) and B
= B(q, = 0).
where A

5.5. STATISTICAL MECHANICS OF NON-HAMILTONIAN SYSTEMS

115

Recall that before, the constrained Hamiltonian was written as


1
1 pq + V (q, = 0),
H c = pq A
2
and the equations of motion in the phase space (q, pq ) was of symplectic form (canonical). The probability to find the system in a volume dqdpq around (q, pq ) for this
Hamiltonian system is

) dqdpq .
c (q, pq ) dqdpq = c H(q, = 0, pq , p = p
In the extended phase space (q, , pq , p ), the probability is therefore

)ddp = c H(u, pu ) ()(p p
) dudpu
c (q, pq )dqdpq ()(p p
(r, p)) drdp,
= c H(r, p))((r))(p (r, p) p
where we have used the fact that the Jacobian for the canonical transformation between
coordinates (u, pu ) and (r, p) is unity.
in terms of the coordinates (r, p), note that
To re-write the condition p = p


pq
p


,

T pq + Z
p , where A
indicates a matrix A evaluated when = 0.
and hence = E
1 gives
Multiplying this equality by the matrix Z
1 (r) (r)
1 E
T pq .

Z
= p + Z
From the block forms of G and G1 , we have AE+BZ = 0, and hence ET AT +ZT BT =
0. From the definitions of the symmetric matrices A and Z, we see that AT = A and
ZT = Z, so ET A = Z BT , which implies Z1 ET = BT A1 . Finally, when = 0,
we have
1 = p B
A
1 pq = p p
,
Z
1 ).
1 (r)(r,
) = (Z
Note that Z

and so (p p
p) are easily expressed in the
original phase space coordinates (r, p).

116

5. ADVANCED TOPICS

The probability therefore obeys


1 )
)ddp = c (H(r, p))()(Z
drdp (5.13)
c (Hc ) dqdpq ()(p p
drdp (5.14)
= c (H(r, p))() det Z ()
= det Z ().

using the fact that (Z1 )


In the constrained Hamiltonian system in canonical coordinates (q, pq ), the probability
P (q, pq ) = c dqdpq is conserved under the evolution of the system by Liouvilles
theorem, so that P (q(t), pq (t)) = P (q(0), pq (0)). In addition, it is found that the
phase space volume dXq (t) = dXq (0) is also conserved under the flow.


d

)ddp
c (Hc ) dqdp ()(p p
= 0
dt


d
det Z drdp
c (H(r, p)) () ()
= 0.
dt

so

The constrained dynamics conserves H(r, p) = Hc (q, pq ) and satisfies the constraint
conditions = 0 and = 0 at all times. Hence, for the full phase space X = (r, p),


d
det Z dX
= 0
dt
det Z(r(t)) dX(t) = det Z(r(0)) dX(0).
Once can therefore interpret d(X) = det Z dX as the invariant measure for the phase
space flow.
To see the connection between the dynamics of the system and the det Z factor, consider
the flow of the standard volume element dX0 under the dynamics to a time t at
which the volume is dXt = det J(Xt ; X0 ) dX0 , where the matrix J has elements Jij =
Xi (t)/Xj (0). To find the evolution of the Jacobian determinant, we use the fact that
det J = eTr ln J , which follows from the general property of a square matrix, det A = eTr A ,
with J = eA .
Proof. To establish this fact, note that any square matrix can be written in Jordan
normal form as A = P1 D P, where D is in Jordan form. Recall that the Jordan form
consists of an upper-triangular matrix with the eigenvalues of A along the diagonal, and
that the determinant of a triangular matrix is the product of the diagonal elements.
The exponential of the matrix A can be written as eA = P1 eD P, and the determinant
of this exponential is det eA = det P1 det eD det P. Since det P1 = 1/ det P, we see

5.5. STATISTICAL MECHANICS OF NON-HAMILTONIAN SYSTEMS

117

that det eA = det eD . Since D is in Jordan form, eD is a triangular matrix with diagonal
elements 1 + i + 2i /2 + = ei , where i = Dii is an eigenvalue of A, where we
have used the fact that Dnij = nj inj for j i. Since the determinant of eD is the
P
Q
productPof its diagaonal elements i ei = e i i , we see that det eA = eTr A since
Tr A = i i .
From this property, we have


d det J
dJ 1
Tr ln J d
= e
(Tr ln J) = det J Tr
J
dt
dt
dt
!
X X
X X
i (t) X(0)
i (t)
= det J

= det J
X(0) Xi (t)
Xi (0)
i
 i


= det J
X(t)
= det J (t),
X(t)

where (t) = /X(t) X(t)


is called the phase space compressibility.
For a Hamiltonian system, phase space volume is conserved and = 0 at all times
(incompressible phase space).
The relation between the Jacobian and the compressibility can be written
d ln det J
= (t)
dt
Rt
Rt
det J(t) = det J e 0 d ( ) = e 0 d ( ) .
From the form of the invariant measure for which d(t) = d(0), it therefore follows
that



d
d det Z
d det J
det Z det J =
det J + det Z
=0
dt
dt
dt
d det Z
det Z det J + det J
= 0,
dt
which implies that
d det Z
= det Z
dt

d ln det Z
= .
dt

118

5. ADVANCED TOPICS
from the equations
This relation can be verified explicitly by evaluating = X X
of motion
X (X)
d
=
= X X

= ln det Z.
pi
ri
dt
i
For the holonomically constrained system, we note that is the total time derivative of aR function of the phase space variable X, ln det Z = (X). Thus the
t
integral 0 d ( ) = (X(t)) (X(0)) and
det J(t)e(X(t)) = det J(0)e(X(0)) = e(X(0)) .
Noting that dX(t) = det J(X(t); X(0)) dX(0), we see that the invariant volume
element can be written as e(X(t)) dX(t) = e(X(0)) dX(0), and e(X(0)) =
det Z(X(0)).

5.5.1

Non-Hamiltonian Dynamics and the Canonical Ensemble

Consider a system with phase space coordinate x that obeys the evolution equation
x0 ) =
x(t;


dx
= x(t; x0 )
dt

and suppose the solution of this equation is x(t; x0 ) subject to the boundary condition
x(0; x0 ) = x0 .

The time derivative of a dynamical variable B is given by B(x)


= x B(x) = LB(x),
where L = x .
We assume the dynamics is ergodic so that the time average is equal to an ensemble
average according to
1
A(t) = lim
T T

Z
d A(x(t + )) = hAit =

dx0 (x0 )A(x(t; x0 )),

where (x0 ) is the density at the phase point x0 at time t = 0.


Noting that
Z
hAit =

Z
dxdx0 (x x(t; x0 )) (x0 )A(x) =

dx h(x x(t; x0 ))i0 A(x)

Z
=

dx (x, t)A(x),

where (x, t) = h(x x(t; x0 )i0 =

dx0 (x0 )(x x(t; x0 )).

5.5. STATISTICAL MECHANICS OF NON-HAMILTONIAN SYSTEMS

119

The equation of motion for (x, t), the Liouville equation, follows from


(x, t)
d
x0 ) x(t) (x x(t; x0 )) 0
=
h(x x(t; x0 ))i0 = x(t;
t
dt
x0 )(x x(t; x0 ))i0 = x h(x(t; x0 ))(x x(t; x0 ))i0
= x hx(t;


= x (x)h(x x(t; x0 ))i0 = x (x)(x, t) .
We have seen that many dynamics have non-zero phase space compressibilities =
x (x), so that the invariant measure, if it exists, assumes the general form d(x) =
(x)dx. We therefore define a density with respect to this measure f (x) by the relation
(x) = (x)f (x), where (x) is a positive function of the phase point x.
Inserting this definition in the Liouville equation, we find
f (x, t)
+ (x) x f (x, t) = (x, t)f (x, t)
t


1
(x, t)
+ x ((x)(x, t)) .
(x, t) =
(x, t)
t
If we choose (x, t) to satisfy (x, t) = 0, then we find it must satisfy Liouvilles
equation,

+ x () = 0
t

+ x = x =
t
then
d ln (x, t)
=
dt
df (x, t)
f (x, t)
f (x, t)
(5.15)
=
+ (x) x f (x, t) =
+ L0 f (x, t) = 0.
dt
t
t
If a function w(x) exists that satisfies
(x) =

dw(x)
= (x) x w(x),
dt

then ln((x, t)/(x, 0)) = w(x(t; x0 )) w(x0 ), and one can define the invariant
measure to be
d(x) = ew(x) dx.

120

5. ADVANCED TOPICS
If (x) cannot be written as the total time derivative, then
(x(t; x0 )) = (x0 )e

Rt
0

d (x(,x0 ))

which is equivalent to solving the full equation for the density (x, t). There is
no clear way to define an invariant measure for this type of dynamics.
Note that if we have periodic orbits in the dynamics (only possible in nonHamiltonian systems) so that x(T ) = x(0), then we must have
RT

w(x(T )) = w(x(0)) or e 0 d ( ) = 1.
If the periodic orbit is net contracting or expanding, then we either have that
cannot be written as a total time derivative of w(x) or that w(x) is not
well-behaved (singular).
The formal solution of Eq. (5.15) is
f (X, t) = eL0 t f (X, 0),
as in the normal situation.
For the special case of a constrained system, since the equations of motion have H(X),
and as constants of motion, we can write the equilibrium phase space distribution
function, which is a stationary solution of Eq. (5.15), as

feq (X) = (E)1 (H(X) E)()(),


where (E) is a normalizing factor. Note that this result is suggested by Eq. (5.14).
In non-equilibrium systems, the density f (X, t) has an explicit time dependence, and
non-equilibrium averages can be written as
Z
B(t) =

Z
d(X) B(X)f (X, t) =

d(X) B(X)eL0 t f (X, 0),

where f (X, 0) is the initial non-equilibrium density.


For a canonical Hamiltonian system, the Liouville operator L0 is self-adjoint in
the sense that
Z
Z

L0 t
dX B(X)e
A(X) = dX eL0 t B(X) A(X).

5.5. STATISTICAL MECHANICS OF NON-HAMILTONIAN SYSTEMS

121

To examine whether this still holds, consider


Z





dXB(X)L0 A(X) =
dX L0 + X X B(X) A(X)


Z
= dX (L0 + ) B(X) A(X)
Z

by integration by parts. Now note that since L0 (X) = (X), it therefore


follows that
Z

Z
d(X) B(X)L0 A(X) =
=
=

=
=

dX(X) B(X)L0 A(X)




Z
dX (L0 + ) (X)B(X) A(X)

Z

dX A(X) (X)B(X) + (X) L0 B(X)


+ L0 (X) B(X)


Z
dX L0 B(X) (X)A(X)


Z
d(X) L0 B(X) A(X).

Using this result in the expansion of eL0 t , we see that


Z

L0 t

d(X)B(X)e

Z
A(X) =

L0 t

d(X) e


B(X) A(X) =

Z
d(X) B(X(t))A(X).

Note that this result also applies for A(X) = f (X), so that the operator L0 is
self-adjoint when the invariant measure d(X) is used to define the inner-product.
This is not the case when the inner product does not include the factor of (X)
in the measure.
For Hamiltonian systems, we have (X) = 1, and the Liouville operator L0 is
self-adjoint with respect to the simple measure dX.
Note that for the special case of holonomically-constrained systems, we have
(X) = det Z(X) as the term in the invariant measure.

122

5. ADVANCED TOPICS

Canonical Dynamics
Now consider a system with 3N coordinates q and with corresponding momenta p
and p whose evolution is governed by the equations
p
= p
m
p = F(q) pp KT 0 (q)
q =

p =

pp
3N kT (q),
m

(5.16)

where T (q) is a locally-defined temperature and T 0 (q) = dT /dq.


From these equations of motion, we find that
H(q, , p, p ) =

p2
pp

+ (q) + + 3N kT (q) = H(q,


p, p ) + 3N kT (q)
2m
2

is conserved by the dynamics if F(q) = d(q)/dq.


Suppose the initial conditions set H = E, so that



p2
1
pp
1 
.
3N =
(q)
EH
E
=
kT (q)
2m
2
kT (q)

(5.17)

This allows the system of equations to be simplified by inserting relation (5.17)


into Eq. (5.16).
These dynamics give rise to a phase space compressibility

N 
X

q i +
p i +
= x (x) =
+
= 3N p = 3N
qi
pi
p
i=1
"
#

d EH
=
dt kT (q)
In this case, the compressibility is equal to a total time derivative, and hence the
invariant measure is

(q, p, p ) = e(EH(q,p,p ))/(kT (q)) ,


and the dynamics generates an ensemble with probability density
2

(q, , p, p ) = (E H(q, , p, p )) eE/(kT (q)) ep /(2kT (q)) eH0 (q,p)/(kT (q)) /Z,
where H0 = p p/(2m) + (q) and Z is a normalization constant

5.5. STATISTICAL MECHANICS OF NON-HAMILTONIAN SYSTEMS


Z
Z =

123

dqdp ddp (E H)eE/(kT (q)) ep /(2kT (q)) eH0 (q,p)/(kT (q))

1
2
dqdp ddp
(E/(3N kT (q)) )eE/(kT (q)) ep /(2kT (q)) eH0 (q,p)/(kT (q))
3N kT (q)
s
Z
1
2 H0 /(kT (q))
=
dqdp
e
.
3N kT (q)
=

The reduced density r (q, p) =

ddp (q, , p, p ) is therefore

1
eH0 (q,p)/(kT (q)) /Zr
r (q, p) = p
kT (q)
Z
1
Zr =
dqdp p
eH0 (q,p)/(kT (q)) .
kT (q)
Note that when T = T (q) is uniform, the r (q, p) is the canonical density for an
ensemble with temperature T .
If the system is ergodic, then the dynamics will generate the canonical density, so that
a time average of a quantity will correspond to the canonical ensemble average.
This approach can be extended to include a system with better mixing properties by
using a chain of thermostat variables i .
The isobaric-isothermal ensemble can also be sampled using molecular dynamics schemes
using the volume V as a dynamical coordinate with conjugate moment PV .
Systems with no invariant measure
A different type of system can be defined by the set of equations
q =

p
m

p = F (q) pp

p =

p2
kT (q),
m

This system does not have an invariant measure as = p is not equal to a


total time derivative.
It has been shown that this system has a fractal steady state and attracting
periodic orbits. [Posch and Hoover, Phys. Rev. E 55, 6803 (1997)]
Another type of system that has a singular invariant measure is
x = x

y = y

>>0

124

5. ADVANCED TOPICS
System has a fixed point at the origin (0, 0) that is net attractive.
Compressibility is = ( ) = d/dt (ln |x| + ln |y|), and hence invariant measure = 1/(|x||y|) is singular along the x-axis, y-axis and at the origin.

5.5.2

Volume-preserving integrators for non-Hamiltonian systems

Wed like to develop good integration schemes for non-Hamiltonian systems along the lines
used for Hamiltonian systems. Using splitting methods, it turns out that we can sometimes
develop integrators that
1. are time reversible,
2. preserve the invariant volume (should it exist).
In particular, the preservation of phase space volume is an important property in determining
the stability of an integrator.
P

The idea is to break up the evolution equations


by splitting up the
x()
P x =

= L x.
Liouville operator L = x x into parts L = L so that x()
We have seen that if it exists, the invariant measure d(x) = (x)dx satisfies the
divergence equation
= 0,
x ((x)x)

(5.18)

under the dynamical evolution determined by x.


Wed like to find evolution operators L that each satisfy Eq. (5.18) and lead to dynamics that is exactly solvable.
To see how this can be accomplished, consider the Nose-Hoover system with phase
space x = (q, p, , p ) and equations of motion
q =

p
m

= p

p = F (q) pp

p =

p2
kT,
m

where m, and kT are constants.


The full Liouville operator for this system is
 2

p

L=
+ F (q) pp
+
kT
.
m q
p
p
m
p
P

We now examine a decomposition L = L that satisfies x ((x)x())


= 0,

where L x = x().

5.5. STATISTICAL MECHANICS OF NON-HAMILTONIAN SYSTEMS

125

For the Nose-Hoover dynamics, the invariant measure corresponds to a choice


(x) = e .
There are a number of decompositions that work:
1. The choice:
L1
L3

= F (q)
p

= kT
p



p

L2 =
+p
m q
p



L4 = p p +
.
p

For L1 , x = (0, F (q), 0, 0) so x (e x(1))


=
L1 preserves the volume element d = dx.

For L2 , x = (p/m, 0, 0, p2 /m) so x (e x(2))


=
0.

For L3 , x = (0, 0, 0, kT ) so x (e x(3))


=

(5.19)
(e F (q)) = 0, and hence

(e p/m)+ p (e p2 /m) =

(e kT ) = 0.

Finally, for L4 the time derivative of is non-zero and x = (0, pp , p , 0)

(e pp ) +
(e p ) = p e + p e = 0.
and x (e x(4))
= p
Note that each of the effect of each of the propagators eL t on the phase
point x is exactly computable since

q + mp t
q

p + F t

eL2 t x =
eL1 t x =

p2
p
p + m t

q
q
p t

p
eL4 t x = pe

eL3 t x =

+ p t ,

p kT t
p
where we have used the fact that exp{cxd/dx}x = xec .
This P
splitting can then be used in a symmetric sequence of exp{Li tk }
with k tk = t.
2. The choice:

p
L1 = F (q)
L2 =
p
m q
 2



p

L3 =
kT
L4 = p p +
.
m
p
p
The difference from the first scheme is in the combination of operators in
the definitions of L2 and L3 . This are trivial re-arrangements that still

126

5. ADVANCED TOPICS
satisfy the conservation of the invariant phase space volume, but now

q
q + mp t

L2 t
L3 t

e
x=
e
x=
.

 2


p
p pm kT t
3. The choice
p2

p
+
L2 = F (q) kT
m q m p
p
p



L4 = p p +
.
p
Note that this choice is superior since it involves fewer Liouville operators.
L1 =

Note that all these schemes use the same L4 , since this form is the one which
preserves the volume when the compressibility is non-zero.
General splitting procedure
Suppose we can write the dynamics in a form that is a generalization of the symplectic
form:
x i = Aij (x)

H
= i (x),
xj

(5.20)

where H is the Hamiltonian function that is conserved in the dynamics and Aij (x) is
an anti-symmetric matrix, satisfying Aij (x) = Aji (x), that generalizes the symplectic
matrix J.
Since A is anti-symmetric, we can write Aij = 12 (Aij Aji ), which implies
H
H
H
H
H =
= i
=
Aij
x
xi
xi
xj


H
H
H
1 H
Aij

Aji
= 0,
=
2 xi
xj
xi
xj
where we have used the convention that repeated indices are summed over.
If d(x) = (x)dx is the invariant volume element under the dynamics, then (x)
obeys

=0=
x ( x)
( x i ) =
xi
xi

H
Aij
xj


.

5.5. STATISTICAL MECHANICS OF NON-HAMILTONIAN SYSTEMS

127

Defining the anti-symmetric matrix B = A, we have Bij /xi = 0, because





H
Bij
= 0
xi
xj
Bij
2H
Bij
=
+ Bij
=
,
xi
xi xj
xi
since
Bij

2H
=0
xi xj

as 2 H/xi xj = 2 H/xj xi .
P
Separating the Hamiltonian H P
into several different terms, H =
H(), we can
write the flow equation as = (), where i () = Aij H()/xj and define the
Liouville operators
L = i ()

H()
= Aij
.
xi
xj xi

The action of each of the Liouville operators on the invariant measure is therefore

xi

H()
(x)Aij
xj


=

Bij H()
2 H()
+ Bij
= 0,
xi xj
xi xj

since Bij is anti-symmetric and obeys Bij /xi = 0.


The invariant volume (and also H()) preserved under the flow generated by L ,
for any H().
Nos
e-Hoover system
We now show that the Nose-Hoover dynamical system that generates the canonical ensemble
(provided the dynamics is ergodic) can be written using the generalized symplectic matrix
form. Consider the phase space variables (q, p, , p ) with Hamiltonian H = p2 /2m + (q) +
kT + p2 /2. From the equations of motion,
q =

p
m

= p

p = F (q) pp

p =

p2
kT,
m

and the Hamiltonian H, we will deduce the form of the asymmetric matrix A as follows:

128

5. ADVANCED TOPICS

Note that
H
= 0 (q)
x1

H
p
=
x2
m

H
= kT
x3

H
= p .
x4

Since A is anti-symmetric, all diagonal elements are zero.


Since q = p/m = Aij H/xj = H/x2 , we see we must have A1j = j,2 = Aj1 .
From the second equation p = F (q) pp = A2j H/xj and noting that A21 = 1,
we have p = 0 (q) + A23 kT + A24 p , hence A23 = 0 = A32 and A24 = p = A42 .
The only remaining element to determine is A34 , which we get from = p = A34 p ,
so A34 = 1 and A43 = 1, giving the final matrix,

0
1
A=
0
0

1
0
0
p

0
0
0
1

0
p
.
1
0

From this matrix, one can easily verify that Bij = e Aij satisfies Bij /i = 0 for all j.
For example, consider j = 4:


Bi4
=
(e p) +
(e ) = 0.
xi
p

The first integration scheme is obtained by choosing


H(1) = (q)

H(2) =

p2
2m

H(3) = kT

leading to the Liouville operators of Eq. (5.19).

H(4) =

p2
2

Appendix A
Math Appendices
A.1

Taylor expansion

Expand function f (x + a) from small a around a = 0.

1
f (x + a) = f (x) + f 0 (x)a + f 00 (x)a2 +
2

X
aj dj
f (x).
=
j
j!
dx
j=0

Since

= exp (x) =

xj

j=0


f (x + a) = exp a

j
,
j!

d
f (x).
dx
129

130

APPENDIX A. MATH APPENDICES

A.2

Series expansions

For |x| < 1,


1
= 1 x + x2 x3 +
1+x
1
= 1 + x + x2 +
1x
x3 x5
sin(x) = x
+
+
3!
5!
x2 x4
+
+
cos(x) = 1
2!
4!
1
1
ln(1 + x) = x x2 + x3 +
2
3

A.3

Probability theory:

A.3.1

Discrete systems

Suppose have measurable E with n discrete values E1 , E2 , . . . , En . Let


N = number of measurements
Ni = number of measurements of Ei .
Then
Pi = Probability that Ei is measured = lim

Properties:
1. 0 Pi 1
Pn
2.
i=1 Pi = 1
Averages:
E =

n
X

Ei Pi

i=1

E2 =

n
X

Ei2 Pi

i=1

H(E) =

n
X
i=1

H(Ei )Pi

Ni
P (Ei )
N

A.3. PROBABILITY THEORY:

131

Variance of E:
2
E2 E 2 E
2
= Ei E
E2 measures the dispersion of the probability distribution: how spread out values are.
In general, E2 6= 0 unless Pi = ij for some j. This notation means:

1 if i = j
Pi =
which implies E = Ei .
0 otherwise
Tchebycheff Inequality:





2
P rob E E E E 2 .
2 E
Joint probability: Suppose N measurements of two properties E and G.
nij = number of measurements of Ei and Gj
nij
Pij = lim
P (Ei , Gj ) joint probability.
N N
Properties:
P
1.
i.j P (Ei , Gj ) = 1.
P
2.
i P (Ei , Gj ) = P (Gj ).
P
3.
j P (Ei , Gj ) = P (Ei ).
4. If Ei and Gj are independent, then P (Ei , Gj ) = P (Ei )P (Gj ).

A.3.2

Continuous Systems

Probability of measure an observable X with values between x, x + dx is p(x)dx. p(x)


is called the probability density.
Properties:
1. Positive definite: p(x) 0.
R
2. Normalized: dxp(x) = 1
Averages:
Z

x =
x2

dx xp(x)
f (x) =
dx f (x)p(x)

Z

2
2
= x x =
dx x2 x2 p(x)

132

APPENDIX A. MATH APPENDICES

A.3.3

Gaussian distributions

1. Distribution specified by first + second moments:



P (x) =

1
2 2

1/2

(xhxi)2
2 2

hxi = average of x

h(x hxi)2 i = 2
2. Important properties (assuming hxi = 0): hx2n+1 i = 0 and hx2n i = f ( 2 ).
1/2
n  2
o

1/2 
x1
x2n
1
1
exp

3. If P (x1 , . . . , xn ) = 2hx
... 2hx
,
...
+
2i
2
2i
2hx2 i
2hx i
n

Then
hxi i = 0
hxi xj i = i2 i,j
What happens when x2 0? Infinitely narrow distribution, called a dirac delta
function. Probability density has all the weight on one value.
There are other representations of the dirac delta function: basically defined in such a
way that one value receives all the weight.
Delta functions: defined in a limiting sense.


()

(x) =

1


2 x
0
|x| > 2

()


2

()

/2

dx (x) =

/2

1
dx = 1.


dx (x)f (x) f (0)

dx () (x) = f (0)

if   1.

Function f (x) essentially constant over infinitesimal interval.


Definition of delta function: (x) = lim0 () (x).
Representations of delta function in limit  0:
1.
2.
3.

1 |x|/
e
2
1

x2 +2
2 2
1

ex /


A.4. FOURIER AND LAPLACE TRANSFORMS


4.

1 sin x/

For any continuous function f of x, for all forms above we get


Z
dx () (x x0 )f (x) = f (x0 ).
lim
0

Some properties of the delta function


1. (x) = (x)
2. (cx) =

1
(x)
|c|

3. [g(x)] =

(xxj )
j |g 0 (xj )|

where g(xj ) = 0 and g 0 (xj ) 6= 0.

4. g(x)(x x0 ) = g(x0 )(x x0 )


R
5. dx (x y)(x z) = (y z)
R
R
0)
6. dx d(xx
f (x) = dx (x x0 )f 0 (x) = f 0 (x0 )
dx

A.4

Fourier and Laplace Transforms

Fourier Transform:
Z

eikx f (x) dx
f (k) =

Z
1
f (x) =
eikx f(k) dk
2
Properties:
Z

eix(kk0 ) dx

2(k k0 ).
Z
Z
1
f (x y)g(y) dy =
eikx f(k)
g (k) dk.
2


1/2
 2 1/2
1
x2 /2 2
2 k2 /2
(k) =
f (x) =
e

f
e
2 2
2
ex
4
f (x) =
f(k) = 2
x
k + 2
1
4
f (x) =
f(k) = 2
x
k

133

134

APPENDIX A. MATH APPENDICES

Laplace Transform:
f(z) =

ezt f (t) dt

Useful properties and transforms:


Z t
f (t )g( ) = f(z)
g (z)
0

1
z+a
tn1
f(z) =
.
(n 1)!

f (t) = eat f(z) =


f (t) =

A.5

1
tn

Calculus

A.5.1

Integration by parts

Z
udv = uv

vdu

Example:
Z
a

A.5.2

b Z b

dx f (x)g(x) = f (x)g(x)
dx f (x)g 0 (x)
0

Change of Variable and Jacobians

R
Let I = Rxy dxdy f (x, y) be the integral over a connected region Rx,y . Change variables to
u, v via the transform g(u, v) = x and h(u, v) = y. It follows that:


Z
(g, h)
.
I=
dudv f (g(u, v), h(u, v))
(u, v)
Ruv
where the Jacobian


(g, h)

(u, v)

(g,h)
(u,v)

g
u

g
v

h
u

h
v

is





A.5. CALCULUS

135

Example: Suppose
Z

dxdy f (x)g(x y).

I=

Let u = x and v = x y. Under this transformation, range of v is (, ) at a fixed


value of u (or x). The Jacobian J is

1 0
J =
0 1



= 1

Thus,
Z

dudv f (u)g(v)| 1| =

I=

dvg(v).

duf (u)

136

APPENDIX A. MATH APPENDICES

Appendix B
Problem sets
B.1

Problem Set 1

Notes:
This set contains 3 problems.
This first set is due in one week, October 8, 2008.
1. Suppose a Markov

0 1

1 0
K=
0 0

chain has the following transition matrix for a three-state system:

0
0 .
1

(a) What are the eigenvalues of K? What independent eigenvectors have eigenvalue
= 1?
(b) If a random walk starts in state 1, what is the probability of finding state 1 after
2 steps? After 3 steps?
(c) Is the transition matrix regular? Does this random walk have a unique limit
distribution?
2. Consider a one-dimensional, time-homogeneous, discrete time, Markovian random walk
process in which a walker always moves to its neighbor to the right. Suppose there are
4 sites, and a walker at site number 4 always cycles back to site 1:
111
000
000
111
000
111
000
111
000
111
000
111

111
000
000
111
000
111
000
111
000
111
000
111

137

111
000
000
111
000
111
000
111
000
111
000
111

111
000
000
111
000
111
000
111
000
111
000
111

138

APPENDIX B. PROBLEM SETS


(a) Write down the transition matrix K(0) for this process and show that the eigen(0)
values of this matrix are k = eik/2 for k = 1, . . . , 4.
(0)

(b) Show that the right eigenvectors of K(0) are ek = 1/2 (eik/2 , eik , ei3k/2 , ei2k )
(0)
and that the left eigenvectors are the complex conjugates ek .
(c) Now suppose that there is a small but finite probability  that the walker remains
on site 1 and probability 1  to step to site 2. What is the form of K = K(0) + K
now?
(d) The effect of the perturbation K on the eigenvalues can be evaluated using
perturbation theory using
(0)

k = k + k
(0)

(0)

where k = ek K ek , to first order in . Evaluate the effect to first order


in  of the perturbation on the eigenvalues k . Does the perturbed system have a
unique limit distribution?
(e) What is the effect of the perturbation on the eigenvector e4 ? What is the physical
meaning of this eigenvector?
3. Consider a circle of diameter d contained in a square of side length l d.
(a) Construct an estimator for using the fraction of uniformly selected points in the
square that fall in the circle and show that the average of this estimator over the
true (uniform) distribution is .
(b) Derive an expression for the standard error in the estimated value of using this
algorithm. How does it depend on N (number of points sampled) and the ratio
l/d? For fixed l what is the optimal value d that minimizes the standard error?
(c) Write a schematic (or explicit) computer program to compute the value of using
this method.

B.2. PROBLEM SET 2

B.2

139

Problem Set 2

Notes:
This set contains 3 problems.
This second set is due in two weeks, October 22, 2008.
1. Consider a three state process in which the probability vector p(t) satisfies the master
equation
p(t + 1) = K p(t),
with a transition matrix given by
3 1 1
4

K = 18
1
8

32
13
16
5
32

16
1
.
8
13
16

(B.1)

(a) What are the eigenvalues of the transition matrix K? Give the stationary vector
p
(i.e. that which satisfies K p
=p
). Make sure it is normalized such that the
sum of its components is one.
(b) Show that detailed balance is not obeyed by this process.
Recall from linear algebra that any symmetric matrix S can brought to diagonal form
D by a similarity transformation P, in that

1
/0
,
2
D = P1 S P =
(B.2)
/0
3
where k are the eigenvalues and the columns of the matrix P are the (right) eigenveck of S. Yet K is not symmetric and has degenerate eigenvalues. For such cases,
tors e
linear algebra tells us that the matrix can be brought to Jordan normal block form:

1 1
/0

...

/
0
..

.
1

/0

2 1
/0
1
,
(B.3)
J=P KP=
..

.
2

..

. 1

/
0

.
.

.
/0

140

APPENDIX B. PROBLEM SETS


where the size gk of each block is equal to the degeneracy of the corresponding eigenvalue k , and P contains the generalized eigenvectors, which follow from the generalized
(i)
eigenvalue equation (K k 1)gk e
k = 0 (i = 1 . . . gk ).
(c) What is the Jordan normal form of the matrix K in equation (1)?
(d) Determine the similarity matrix P that brings it to that form.
The Jordan normal form is possibly a complicating factor because the iterated matrix
Kt is harder to find.
(e) Show that here

1 0
0
Jt = 0 t2 t2t1 .
0 0
t2
(f) Determine the matrix
K = lim Kt ,
t

and show that it transforms any initial probability vector into the stationary
probability vector.
2. Consider now a slightly different process with transition matrix
3 1 1
4

K = 18
1
8

16
13
16
1
8

16
1
.
8
13
16

(a) Determine the normalized stationary probability vector p


= (
p1 , p2 , p3 ).
(b) Show that detailed balance is satisfied.
(c) Show that the matrix
= Q1 K Q
K
is symmetric, where the similarity transformation Q is defined as

p1 0

p2 0 .
Q= 0

0
0
p3
(d) What does this imply for the Jordan normal form of K?

(B.4)

B.2. PROBLEM SET 2

141

(e) Show that for any n-state process which satisfies detailed balance with stationary
, the transformed matrix Q1 K Q is symmetric when Q =
probability vector p

diag( p1 , p2 , . . .).
3. Suppose one defines a random walk procedure to generate a Markov chains of states
= {x1 , . . . , xN }, where each configuration of the system xi appears with canonical ensemble probability (xi ) = eU (xi ) /Z. Consider the case in which it is very
computationally demanding to compute the potential energy U (xi ), and much easier
to compute an approximate potential energy Uf (xi ). For example, the true potential energy U (xi ) could be computed by ab-initio electronic structure methods, and
Uf (xi ) could be the potential energy computed by a molecular-mechanical or low-level
ab-initio method.

Updates using Kf
u1 = x i
1
0
0
1

u2

1
0
0
1

um1
1
0
0
1

y i1
= um
0
0
1

Accepted ?
yes
no
x i+1 = yi
x1

x
00 i
11
00
11

x i+1= x

111111
000000
1
0
0
1

(a) Suppose one develops a transition matrix Kf (ui uj ) that satisfies detailed
balance in the canonical ensemble for the system with potential energy Uf ,
pf (ui )Kf (ui uj ) = pf (uj )Kf (uj ui ),
where pf (ui ) = eUf (ui ) /Zf . At a given time step i of the Markov chain , suppose the configuration is xi . If one defines u1 = xi and then generates an auxiliary
sequence of states {u1 , . . . , um } according to a random walk with probabilities determined by Kf , and then selects the trial state for the i + 1 step of the Markov
chain to be yi = um , what is the probability T (xi yi ) of generating the configuration yi starting from xi via the sequence {u1 , . . . , um }? What is the reverse
probability T (yi xi ) via the path {um , . . . , u1 }?
(b) How should the acceptance probability A(xi yi ) of the trial configuration be
defined to ensure that the chain of states has limiting distribution ? How

142

APPENDIX B. PROBLEM SETS


does this probability depend on the number of steps m used in the auxiliary chain
{u1 , . . . , um }?
(c) What might the advantage be of such a procedure over a simple, symmetric
trial proposal procedure where T (xi yi ) = T (yi xi )? What conditions are
important if the auxiliary chain method is to be efficient?

B.3. PROBLEM SET 3

B.3

143

Problem Set 3

Notes:
This set contains 3 problems.
This third set is due in two weeks, November 5, 2008.
1. Consider a Hamiltonian of the form
|PN |2
+ U (RN ),
2m
P pi pi
where |PN |2 = N
i=1 2m . This Hamiltonian is split into a kinetic and potential part
H=

K=

|PN |2
and U = U (RN ).
2m


(a) Compute the action of the partial Liouvillians on a phase point =


RN
, i.e.
PN

LK = {K, } and LU = {U, } and prove that


 N t N 
 N
R + mP
LK t R
e
=
N
P
PN

 N

RN
LU t R
e
=
U
PN
PN t R
N
(hint: expand the exponentials.)
(b) Use these relations to show that a single step of the Verlet scheme:
eLU t/2 eLK t eLU t/2

(B.5)

conserves phase space volume.


(c) Show that applying the Verlet scheme to a harmonic oscillator
1
H = (p2 + r2 ).
2
leads to a linear map: (t + t) = M(t) (t), where M is a 2 2 matrix
independent of . Give the explicit form of M.
(d) How can you see that this map conserves phase space volume? What does this
imply for the eigenvalues of the map?
(e) Use the eigenvalues of the map to determine the time step at which the Verlet
integrator for the harmonic oscillator becomes unstable.

144

APPENDIX B. PROBLEM SETS

2. The force between two Lennard-Jones particles with a smooth cutoff is given by

 12
 1
r < rc0
6
2
0

(rc r) (rc 3rc +2r)


(r) = 4
(B.6)
6
rc0 r rc .
(rc rc0 )3

r12
r
0
r > rc
where r = |ri rj | is the (scalar) distance between two particles at (three-dimensional)
positions ri and rj .
(a) Show that the potential is everywhere differentiable.
(b) Derive the force that the two particles feel. What is the sum of these forces?
Consider an N particle system interacting through the smooth Lennard-Jones interaction.
(c) How do the momenta and positions of the particles change in a single Verlet step?
(d) Prove that momentum and angular momentum are conserved in a single Verlet
step (neglect boundary conditions).
(e) For large enough time steps, the Verlet scheme applied to the N particle LennardJones system starts to exhibit energy drift. Why does this drift go towards higher
energies?
(f) The potential and forces can be computed using cell divisions or a Verlet list.
Assuming most of the numerical cost is in computing the potential and forces,
which one is more efficient and by how much? Why do these techniques only work
for larger systems?
3. Write a small program to simulate a Lennard-Jones system of 343 particles (cf. Eq. (B.6))
in a cubic periodic simulation box, using the Verlet scheme (B.5). Set the mass m,
interaction range and interaction strength to 1 (Lennard-Jones units), and set the
density at 1. Choose cutoff parameters rc0 = 2.5 and rc = 3. (Note: why is there no
reason to use cells?) As initial conditions, put the particles on a cubic lattice and draw
their initial momenta from a Gaussian with a standard deviation of 0.1. For a range
of time steps, determine
the appropriate length of the burn-in time,
the average kinetic temperature Tkin = K/( 32 N ) in equilibrium,
the total energy in equilibrium, and
the fluctuations of the total energy in the simulation.
Demonstrate that the fluctuations of the total energy scale as t2 and determine the
time step at which numerical instability sets in.

B.4. PROBLEM SET 4

B.4

145

Problem Set 4

Notes:
This set contains 3 problems.
This fourth and final set is due in two weeks, December 3, 2008.
1. For a harmonic oscillator system with Hamiltonian H = 12 (p2 + q 2 ), compare the
critical points of instability tc of the Verlet, HOA2, FR4, and 4th-order order Suziki
(gradient corrected) integrators.
2. Consider an an-harmonic oscillator system in three dimensions with Hamiltonian
H=

p1 p1 p2 p2
+
+ U (r12 )
2m1
2m2

U (r12 ) =

k 2
4
r12 + r12
,
2
4

where r12 = |r2 r1 |, k is large and is small.


(a) Show that the Hamiltonian can be recast in centre-of-mass R and relative coordinates r = r12 as
H=

PP pp
+
+ U (r) = KP + Kp + U,
2M
2

where M is the total mass of the system, is the relative mass, and P and p are
the momenta conjugate to R and r, respectively.
i. What are the solutions of the equations of motion for R and P?
ii. Construct the Liouville operators LP , Lp and LU and evaluate the commutators of these three operators.
(b) Devise a second-order symplectic integrator of this system and find the shadow
Hamiltonian to order t2 .
(c) Suppose one writes the potential U = Uh + U in terms of a harmonic reference
potential Uh (r) = kr2 /2 and a difference potential U (r) = r4 /4. Use a splitting
scheme to construct a second-order integrator that uses the exact solution of
the evolution of a harmonic oscillator. To order t2 , what does the shadow
Hamiltonian look like for this integrator? Is it better, or worse, than the integrator
in part b?
(d) Suppose the actual potential energy UA (r) and forces are computed using a costly
electronic structure method. Using the potential U (r) as a reference potential,
where UA (r) U (r), devise an integrator using splitting methods that minimizes
the number of electronic structure calculations over a fixed time interval. How
might multiple time step methods be used profitably in this integrator?

146

APPENDIX B. PROBLEM SETS

3. An important dynamical property of molecules in a fluid is the self-diffusion coefficient


D, that is a measure of how quickly each particle in the fluid moves through space.
For a system of N particles, it can be calculated in two different ways:
One can integrate the velocity autocorrelation function:
N Z
1 X
D=
dt0 hvi (t) vi (t + t0 )i ,
3N i=1
where vi (t) is the velocity of particle i at time t. Note that in a simulation, both
t and t + t0 are computed at intervals of the timestep t.
One can use the relation
D = 0lim

N
X
h|ri (t + t0 ) ri (t)|i
i=1

6N t0

where ri (t) is the position of particle i at time t. Note that when using this form,
one must pay careful attention to how the distance grows in a periodic system.
Modify the code you wrote for a Lennard-Jones system (or modify the sample code
provided on the class web site) to compute the self-diffusion coefficient D in the two
ways mentioned above. The system conditions can be taken to be as described in
problem 3 on problem set 3.

You might also like