0% found this document useful (0 votes)

7 views132 pages

Script 2017

The lecture notes provide an overview of a course on the Foundations of Quantum Mechanics, highlighting key learning goals, topics, and mathematical tools necessary for understanding quantum mechanics. It discusses the controversial interpretations of quantum mechanics, the significance of the Schrödinger equation, and the philosophical questions that arise in the field. The course aims to explore both the mathematical framework and the conceptual challenges of quantum theory.

Uploaded by

jejejdoslsndbdbrjejej

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views132 pages

Script 2017

Uploaded by

jejejdoslsndbdbrjejej

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 132

Lecture Notes on

Foundations of Quantum Mechanics

Roderich Tumulka∗

Winter semester 2017/18

These notes will be updated as the course proceeds.

∗
Department of Mathematics, Eberhard-Karls University, Auf der Morgenstelle 10, 72076 Tübingen,
Germany. Email: roderich.tumulka@uni-tuebingen.de

1
1 Course Overview
Learning goals of this course: To understand the rules of quantum mechanics; to
understand several important views of how the quantum world works; to understand
what is controversial about the orthodox interpretation and why; to be familiar with
the surprising phenomena and paradoxes of quantum mechanics.

Quantum mechanics is the field of physics concerned with (or the post-1900 theory
of) the motion of electrons, photons, quarks, and other elementary particles, inside
atoms or otherwise. It is distinct from classical mechanics, the pre-1900 theory of the
motion of physical objects. Quantum mechanics forms the basis of modern physics and
covers most of the physics under the conditions on Earth (i.e., not-too-high temperatures
or speeds, not-too-strong gravitational fields). “Foundations of quantum mechanics” is
the topic concerned with what exactly quantum mechanics means and how to explain
the phenomena described by quantum mechanics. It is a controversial topic. Here are
some voices critical of the traditional, orthodox view:

“With very few exceptions (such as Einstein and Laue) [...] I was the
only sane person left [in theoretical physics].”
(E. Schrödinger in a 1959 letter)

“I think I can safely say that nobody understands quantum mechanics.”

(R. Feynman, 1965)

“I think that conventional formulations of quantum theory [...] are un-

professionally vague and ambiguous.”
(J. Bell, 1986)

In this course we will be concerned with what kinds of reasons people have for
criticizing the orthodox understanding of quantum mechanics, what the alternatives are,
and which kinds of arguments have been put forward for or against important views.
We will also discuss the rules of quantum mechanics for making empirical predictions;
they are uncontroversial. The aspects of quantum mechanics that we discuss also apply
to other fields of quantum physics, in particular to quantum field theory.

Topics of this course:

• The Schrödinger equation

• The Born rule

• Self-adjoint matrices, axioms of the quantum formalism, collapse of the wave func-
tion, decoherence

• The double-slit experiment and variants thereof, interference and superposition

2
• Spin, the Stern-Gerlach experiment, the Pauli equation, representations of the
rotation group

• The Einstein-Podolsky-Rosen argument, entanglement, non-locality, and Bell’s

theorem

• The paradox of Schrödinger’s cat and the quantum measurement problem

• Heisenberg’s uncertainty relation

• Interpretations of quantum mechanics (Copenhagen, Bohm’s trajectories, Ev-

erett’s many worlds, spontaneous collapse theories, quantum logic, perhaps others)

• Views of Bohr and Einstein

• POVMs and density matrices

• No-hidden-variables theorems

• Identical particles and the non-trivial topology of their configuration space, bosons
and fermions

Mathematical tools that will be needed in this course:

• Complex numbers

• Vectors in n dimensions, inner product

• Matrices, their eigenvalues and eigenvectors

• Multivariable calculus

• Probability; continuous random variables, the Gaussian (normal) distribution

The course will involve advanced mathematics, as appropriate for a serious discussion of
quantum mechanics, but will not focus on technical methods of problem-solving (such as
methods for calculating the ground state energy of the hydrogen atom). Mathematical
topics we will discuss in this course:
• Differential operators (such as the Laplace operator) and their analogy to matrices

• Eigenvalues and eigenvectors of differential (and other) operators

• The Hilbert space of square-integrable functions, norm and inner product

• Projection operators

• Fourier transform of a function

• Positive operators and positive-operator-valued measures (POVMs)

3
• Tensor product of vector spaces

• Trace of a matrix or an operator, partial trace

• Special ordinary and partial differential equations, particularly the Schrödinger

equation

• Exponential random variables and the Poisson process

Philosophical questions that will come up in this course:

• Is the world deterministic, or stochastic, or neither?

• Can and should logic be revised in response to empirical findings?

• Are there in principle limitations to what we can know about the world (its laws,
its state)?

• Which theories are meaningful as fundamental physical theories? In particular:

• If a statement cannot be tested empirically, can it be meaningful? (Positivism

versus realism)

• Does a fundamental physical theory have to provide a coherent story of what

happens?

• Does that story have to contain elements representing matter in 3-dimensional

space in order to be meaningful?

Physicists usually take math classes but not philosophy classes. That doesn’t mean,
though, that one doesn’t use philosophy in physics. It rather means that physicists
learn the philosophy they need in physics classes. Philosophy classes are not among the
prerequisites of this course, but we will sometimes make connections with the history of
philosophy.

4
2 The Schrödinger Equation
One of the fundamental laws of quantum mechanics is the Schrödinger equation

X ~2 N
∂ψ
i~ =− ∇2i ψ + V ψ . (2.1)
∂t i=1
2mi

It governs the time evolution of the wave function ψ = ψt = ψ(t, x1 , x2 , . . . , xN ). (It

can be expected to be valid only in the non-relativistic regime, i.e., when the speeds of
all particles are small compared to the speed of light. In the general case (the relativistic
case) it needs to be replaced by other equations, such as the Klein–Gordon equation and
the Dirac equation.) We focus first on spinless particles and discuss the phenomenon of
spin later. I use boldface symbols such as x for 3d vectors.
Eq. (19.2) applies to a system of N particles in R3 . The word “particle” is tradi-
tionally used for electrons, photons, quarks, etc.. Opinions diverge whether electrons
actually are particles in the literal sense (i.e., point-shaped objects, or little grains). A
system is a subset of the set of all particles in the world. A configuration of N particles
is a list of their positions; configuration space is thus, for our purposes, the Cartesian
product of N copies of physical space, or R3N . The wave function of quantum mechan-
ics, at any fixed time, is a function on configuration space, either complex-valued or
spinor-valued (as we will explain later); for spinless particles, it is complex-valued, so

ψ : Rt × R3N
q → C. (2.2)

The subscript indicates the variable: √

t for time, q = (x1 , . . . , xN ) for the configuration.
Note that i in (19.2) either denotes −1 or labels the particles, i = 1, . . . , N ; mi are
positive constants, called the masses of the particles; ~ = h/2π is a constant of nature,
h is called Planck’s quantum of action or Planck’s constant, h = 6.63 × 10−34 kg m2 s−1 ;
∂ ∂ ∂
∇i = , , (2.3)
∂xi ∂yi ∂zi

is the derivative operator with respect to the variable xi , ∇2i the corresponding Laplace
operator
2 ∂ 2ψ ∂ 2ψ ∂ 2ψ
∇i ψ = + 2 + 2. (2.4)
∂x2i ∂yi ∂zi
V is a given real-valued function on configuration space, called the potential energy or
just potential.
Fundamentally, the potential in non-relativistic physics is
X ei ej X Gmi mj
V (x1 , . . . , xN ) = − , (2.5)
1≤i<j≤N
|xi − xj | 1≤i<j≤N |xi − xj |

where p
|x| = x2 + y 2 + z 2 for x = (x, y, z) (2.6)

5
denotes the Euclidean norm in R3 , ei are constants called the electric charges of the
particles (which can be positive, negative, or zero); the first term is called the Coulomb
potential, the second term is called the Newtonian gravity potential, G is a constant of
nature called Newton’s constant of gravity G = 6.67 × 10−11 kg−1 m3 s−2 , and mi are
again the masses. However, when the Schrödinger equation is regarded as an effective
equation rather than as a fundamental law of nature then the potential V may contain
terms arising from particles outside the system interacting with particles belonging to
the system. That is why the Schrödinger equation is often considered for rather arbitrary
functions V , also time-dependent ones. The operator
N
X ~2 2
H=− ∇i + V (2.7)
i=1
2m i

is called the Hamiltonian operator, so the Schrödinger equation can be summarized in

the form
∂ψ
i~ = Hψ . (2.8)
∂t
The Schrödinger equation is a partial differential equation (PDE). It determines the
time evolution of ψt in that for a given initial wave function ψ0 = ψ(t = 0) : R3N → C
it uniquely fixes ψt for any t ∈ R. The initial time could also taken to be any t0 ∈ R
instead of 0.
So far I have not said anything about what this new physical object ψ has to do
with the particles. One fundamental connection is

Born’s rule. If we measure the system’s configuration at time t then the outcome is
random with probability density
2
ρ(q) = ψt (q) . (2.9)

This rule refers to the concept of probability density, which means the following. The
probability that the random outcome X ∈ R3N is any particular point x ∈ R3N is zero.
However, the probability that X lies in a set B ⊆ R3N is given by
Z
P(X ∈ B) = ρ(q) d3N q (2.10)
B

(a 3N -dimensional volume integral). Instead of d3N q, we will often just write dq. A
density function ρ must be non-negative and normalized,
Z
ρ(x) ≥ 0 , ρ(q) dq = 1 . (2.11)
R3N

A famous density function in 1 dimension is the Gaussian density

1 (x−µ)2
ρ(x) = √ e− 2σ2 . (2.12)
2πσ

6
A random variable with Gaussian density is also called a normal (or normally dis-
tributed ) random variable. It has mean µ ∈ R and standard deviation σ > 0. The mean
value or expectation value EX of a random variable X is its average value
Z
EX = x ρ(x) dx . (2.13)
R
p
The standard deviation of X is defined to be E(X − EX)2 .

For the Born rule to make sense, we need that

Z
|ψt (q)|2 dq = 1. (2.14)
R3N

And indeed, the Schrödinger equation guarantees this relation: If it holds for t = 0 then
it holds for any t ∈ R. More generally, the Schrödinger equation implies that
Z Z
dq |ψt | = dq |ψ0 |2
2
(2.15)
R
for any ψ0 . One says that dq |ψt |2 satisfies a conservation law. Indeed, the Schrödin-
ger equation implies a local conservation law for |ψ|2 ; that is, it implies the continuity
equation 1
N
∂|ψ(t, q)|2 X
=− ∇i · j i (t, q) with j i (t, q) = ~
mi
Im ψ ∗ (t, q)∇i ψ(t, q) , (2.16)
∂t i=1

where Im means imaginary part, because

∂ ∗
ψ ψ = 2Re ψ ∗ −i Hψ (2.17)
∂t ~

XN
~2 ∗ 2 2
= ~2 Im − 2mi
ψ ∇i ψ + V (q)|ψ| (2.18)
| {z }
i=1
real
N
X N
X
∗ 2 ∗
=− ~
mi
Im ψ ∇i ψ + (∇i ψ ) · (∇i ψ) = − ∇i · j i . (2.19)
| {z }
i=1 i=1
real

The continuity equation expresses that the amount of |ψ|2 cannot be created or de-
stroyed, only moved around, and in fact flows with the current (j 1 , . . . , j N ). To see
this, note that it asserts that the (3N + 1)-dimensional (configuration-space-time) vec-
tor field j = (|ψ|2 , j 1 , . . . , j N ) has vanishing divergence. By the Ostrogradski–Gauss
integral theorem (divergence theorem), the surface integral of a vector field equals the
volume integral of its divergence, so the surface integral of a divergence-less vector field
1
I don’t know where this name comes from. It has nothing to do with being continuous. It should
be called conservation equation.

7
vanishes. Let the surface be the boundary of a (3N + 1)-dimensional cylinder [0, T ] × S,
where S ⊆ R3N is a ball or any set with smooth boundary ∂S. Then the surface integral
of j is Z Z Z Z T
0=− |ψ0 |2 + |ψt |2 + dt dA n∂S · j (2.20)
S S 0 ∂S

with n∂S the unit normal vector field in R3N on the boundary of S. That is, the amount
of |ψ|2 in S at time T differs from the initial amount of |ψ|2 in S by the flux of j across
the boundary of S during [0, T ]—a conservation law. If (and this is indeed the case)
there is no flux to infinity, i.e., if the last integral becomes arbitrarily small by taking S
to be a sufficiently big Rball, then the total amount of |ψ|2 remains constant, see (2.15).
Since the quantity dq |ψ|2 occurs frequently, it is useful to abbreviate it: The L2
norm is defined to be Z 1/2
kψk = dq |ψ(q)|2 . (2.21)
R3N

Thus, kψt k = kψ0 k, and the Born rule is consistent with the Schrödinger equation,
provided the initial datum ψ0 has norm 1, which we will henceforth assume. The wave
function ψt will in particular be square-integrable, and this makes the space L2 (R3N )
of square-integrable functions a natural arena. It is also called the Hilbert space, and is
the space of all wave functions.

8
3 Unitary Operators in Hilbert Space
In the following, we will often simply write L2 for L2 (R3N ). We will leave out many
mathematical details (which will be discussed in the course Mathematical Quantum
Theory).

3.1 Existence and Uniqueness of Solutions of the Schrödinger

Equation
The Schrödinger equation defines the time evolution of the wave function ψt . In math-
ematical terms, this means that for every choice of initial wave function ψ0 (q) there is
a unique solution ψ(t, q) of the Schrödinger equation. This leads to the question what
exactly is meant by “every” wave function. Remarkably, even when ψ0 is not differen-
tiable, there is still a natural sense in which a “weak solution” or “L2 solution” can be
defined. This sense allows for a particularly simple statement:

Theorem 3.1. 2 For a large class of potentials V (including Coulomb, Newton’s gravity,
every bounded measurable function, and linear combinations thereof ) and for every ψ0 ∈
L2 , there is a unique weak solution ψ(t, q) of the Schrödinger equation with potential V
and initial datum ψ0 . Moreover, at every time t, ψt lies again in L2 .

3.2 The Time Evolution Operators

Let Ut : L2 → L2 be the mapping defined by

Ut ψ0 = ψt . (3.1)

Ut is called the time evolution operator or propagator. Often, it is not possible to write
down an explicit closed formula for Ut , but it is nevertheless useful to consider Ut . It
has the following properties.
First, Ut is a linear operator, i.e.

Ut (ψ + φ) = (Ut ψ) + (Ut φ) (3.2)

Ut (zψ) = z (Ut ψ) (3.3)

for any ψ, φ ∈ L2 , z ∈ C. This follows from the fact that the Schrödinger equation
is a linear equation, or, equivalently, that H is a linear operator. It is common to say
operator for linear operator.
Second, Ut preserves norms:
kUt ψk = kψk . (3.4)
2
This follows from Stone’s theorem and Kato’s theorem together. See, e.g., Theorem VIII.8 in
M. Reed and B. Simon: Methods of Modern Mathematical Physics, Vol. 1 (revised edition), Academic
Press (1980), and Theorem X.16 in M. Reed and B. Simon: Methods of Modern Mathematical Physics,
Vol. 2, Academic Press (1975).

9
This is just another way of expressing Eq. (2.15). Operators with this property are
called isometric.
Third, they obey a composition law :
Us Ut = Ut+s , U0 = I , (3.5)
where I denotes the identity operator
Iψ = ψ . (3.6)
It follows from (3.5) that Ut−1 = U−t . In particular, Ut is a bijection. An isometric
bijection is also called a unitary operator ; so Ut is unitary. A family of operators
satisfying (3.5) is called a one-parameter group of operators. Thus, the propagators
form a unitary 1-parameter group.
Fourth,
Ut = e−iHt/~ . (3.7)
The exponential of an operator A can be defined by the exponential series
∞
X An
eA = (3.8)
n=0
n!
if A is a so-called bounded operator ; in this case, the series converges. Unfortunately,
the Hamiltonian of the Schrödinger equation (19.2) is unbounded. But mathematicians
agree about how to define eA for unbounded operators (of the type that H is); we will
not worry about the details of this definition.
Eq. (3.7) is easy to understand: after defining
φt := e−iHt/~ ψ0 , (3.9)
one would naively compute as follows:
d d
i~ φt = i~ e−iHt/~ ψ0 (3.10)
dt dt
iH
= i~ − e−iHt/~ ψ0 (3.11)
~
= Hφt , (3.12)
so φt is a solution of the Schrödinger equation with φ0 = e0 ψ0 = ψ0 , and thus φt = ψt .
The calculation (3.10)–(3.12) can actually be justified for all ψ0 in the domain of H, a
dense set in L2 ; we will not go into details here.

3.3 Unitary Matrices and Rotations

The space L2 is infinite-dimensional. As a finite-dimensional analog, consider the func-
tions on a finite set, ψ : {1, . . . , n} → C, and the norm
n
!1/2
X
kψk = ψ(i) (3.13)
i=1

10
instead of the L2 norm Z 1/2
2
kψk = |ψ(q)| dq . (3.14)

A function on {1, . . . , n} is always square-summable (its norm cannot be infinite). It

can be written as an n-component vector

ψ(1), . . . , ψ(n) , (3.15)

and the space of these functions can be identified with Cn .

The linear operators on Cn are given by the complex n × n matrices. If a matrix
preserves the norm (3.13) as in (3.4), it is automatically bijective and thus unitary. A
matrix Uij is unitary iff3
U † = U −1 , (3.16)
where U † , the adjoint matrix of U , is defined by

Uij† = (Uji )∗ . (3.17)

The norm (3.13) is analogous to the norm (= magnitude = length) of a vector in R3 ,

3
!1/2
X
|u| = u2i . (3.18)
i=1

The norm-preserving operators in R3 are exactly the orthogonal matrices, i.e., those
matrices A with
At = A−1 , (3.19)
where At denotes the transposed matrix, Atij = Aji . They have a geometric meaning:
Each orthogonal matrix is either a rotation around some axis passing through the origin,
or a reflection across some plane through the origin, followed by a rotation. The set of
orthogonal 3 × 3 matrices is denoted O(3). The set of those orthogonal matrices which
do not involve a reflection is denoted SO(3) for “special orthogonal matrices”; they
correspond to rotations and can be characterized by the condition det A > 0 in addition
to (3.19).
In dimension d > 3, one can show that the special orthogonal matrices are still
compositions (i.e., products) of 2-dimensional rotation matrices such as (for d = 4)
 
cos α sin α
− sin α cos α 
 . (3.20)
 1 
1

This rotation does not rotate around an axis, it rotates around a (d − 2)-dimensional
subspace (spanned by the 3rd and 4th axes). However, in d ≥ 4 dimensions, not every
3
iff = if and only if

11
special orthogonal matrix is a rotation around a (d − 2)-dim. subspace through a certain
angle, but several such rotations can occur together, as the following example shows:
 
cos α sin α
− sin α cos α 
 . (3.21)
 cos β sin β 
− sin β cos β

We will simply call every special orthogonal d × d matrix a “rotation.”

Since Cn can be regarded as R2n , and the norm (3.13) then coincides with the 2n-
dimensional version of (3.18), every unitary operator then corresponds to an orthogonal
operator, in fact a special orthogonal one. So if you can image 2n-dimensional space,
every unitary operator is geometrically a rotation. Also in L2 it is appropriate to think
of a unitary operator as a rotation.

3.4 Inner Product

In analogy to the dot product in R3 ,
3
X
u·v = ui vi (3.22)
i=1

one defines the inner product of two functions ψ, φ ∈ L2 to be

Z
hψ|φi = ψ(q)∗ φ(q) dq . (3.23)
R3N

It has the following properties:

1. It is anti-linear (or semi-linear or conjugate-linear ) in the first argument,

hψ + φ|χi = hψ|χi + hφ|χi , hzψ|φi = z ∗ hψ|φi (3.24)

for all ψ, φ, χ ∈ L2 and z ∈ C.

2. It is linear in the second argument,

hψ|φ + χi = hψ|φi + hψ|χi , hψ|zφi = zhψ|φi (3.25)

for all ψ, φ, χ ∈ L2 and z ∈ C. Properties 1 and 2 together are called sesqui-linear

(from Latin sesqui = 1 21 ).

3. It is conjugate-symmetric,
hφ|ψi = hψ|φi∗ (3.26)
for all ψ, φ ∈ L2 .

12
4. It is positive definite,4
hψ|ψi > 0 for ψ 6= 0 . (3.27)

Note that the dot product in R3 has the same properties, the properties of an inner
product, except that the scalars involved lie in R, not C. Another inner product with
these properties is defined on Cn by
n
X
hψ|φi = ψ(i)∗ φ(i) . (3.28)
i=1

The norm can be expressed in terms of the inner product according to

p
kψk = hψ|ψi . (3.29)

Note that the radicand is ≥ 0. Conversely, the inner product can be expressed in terms
of the norm according to the polarization identity

hψ|φi = 14 kψ + φk2 − kψ − φk2 − ikψ + iφk2 + ikψ − iφk2 . (3.30)

Its proof is a homework exercise. It follows from the polarization identity that every
unitary operator U preserves inner products,

hU ψ|U φi = hψ|φi . (3.31)

(Likewise, every A ∈ SO(3) preserves dot products, which has the geometrical meaning
that a rotation preserves the angle between any two vectors.)
In analogy to the dot product, two functions ψ, φ with hψ|φi = 0 are said to be
orthogonal.

3.5 Abstract Hilbert Space

The general and abstract definition of a vector space (over R or over C) is that it is a set S
(whose elements are called vectors) together with a prescription for how to add elements
of S and a prescription for how to multiply an element of S by a scalar, such that the
usual algebraic rules of addition and scalar multiplication are satisfied. Similarly, a
Hilbert space is a vector space over C together with an inner product satisfying the
completeness property: every Cauchy sequence converges. One can then prove the

Theorem 3.2. L2 is a Hilbert space.

4
Another math subtlety: This will be true only if we identify two functions ψ, φ whenever the set
{q ∈ R3N : ψ(q) 6= φ(q)} has volume 0. It is part of the standard definition of L2 to make these
identifications.

13
4 Classical Mechanics
Classical physics means pre-quantum (pre-1900) physics. I describe one particular ver-
sion that could be called Newtonian mechanics (even though certain features were not
discovered until after Isaac Newton’s death). This version is over-simplified in that
it leaves out magnetism, electromagnetic fields (which play a role for electromagnetic
waves and thus the classical theory of light), and relativity theory.

4.1 Definition of Newtonian Mechanics

According to Newtonian mechanics, the world consists of a space, which is a 3-dimensional
Euclidean space, and particles moving around in space with time. Here, a particle means
a material point—a point-shaped physical object. Let us suppose there are N particles
in the world (say, N ≈ 1080 ), and let us fix a Cartesian coordinate system in Euclidean
space. At every time t, particle number i (i = 1, . . . , N ) has a position Qi (t) ∈ R3 .
These positions are governed by the equation of motion

d2 Qi
mi = −∇i V (Q1 , . . . , QN ) (4.1)
dt2
with V the fundamental potential function of the universe as given in Eq. (2.5). This
completes the definition of Newtonian mechanics.
The equation of motion (4.1) is an ordinary differential equation of second order
(i.e., involving second time derivatives). Once we specify, as initial conditions, the initial
positions Qi (0) and velocities (dQi /dt)(0) of every particle, the equation of motion (4.1)
determines Qi (t) for every i and every t.
Written explicitly, (4.1) reads

d2 Qi X Qj − Qi X Qj − Qi
mi 2
= − e i e j 3
+ Gmi mj . (4.2)
dt j6=i
|Qj − Qi | j6=i
|Qj − Qi |3

The right hand side is called the force acting on particle i; the j-th term in the first
sum (with the minus sign in front) is called the Coulomb force exerted by particle j on
particle i; the j-th term in the second sum is called the gravitational force exerted by
particle j on particle i.
Newtonian mechanics is empirically wrong. For example, it entails the absence of
interference fringes in the double-slit experiment (and entails wrong predictions about
everything that is considered a quantum effect). Nevertheless, it is a coherent theory, a
“theory of everything,” and often useful to consider as a hypothetical world to compare
ours to.
Newtonian mechanics is to be understood in the following way: Physical objects such
as tables, baseballs, or dogs consist of huge numbers (such as 1024 ) of particles, and they
must be regarded as just such an agglomerate of particles. Since Newtonian mechanics
governs unambiguously the behavior of each particle, it also completely dictates the
behavior of tables, baseballs, and dogs. Put differently, after (4.1) has been given, there

14
is no need to specify any further laws for tables, baseballs, or dogs. Any physical law
concerning tables, baseballs, or dogs, is a consequence of (4.1). This scheme is called
reductionism. It makes chemistry and biology sub-fields of physics. (This does not
mean, though, that it would be of practical use to try to solve (4.1) for 1024 or 1080
particles in order to study the behavior of dogs.) Can everything be reduced to (4.1)?
It seems that conscious experiences are an exception—presumably the only one.
When we consider a baseball, we are often particularly interested in the motion of
its center Q(t) because we are interested in the motion of the whole ball. It is often
possible to give an effective equation for the behavior of a variable like Q(t), for example
 
2 0
dQ dQ
M 2 = −γ − M g 0 , (4.3)
dt dt
1

where M is the mass of the baseball, the first term on the right hand side is called the
friction force, the second the gravitional force of Earth, γ is the friction coefficient of
the baseball and g the gravitational field strength of Earth. The effective equation (4.3)
looks quite similar to the fundamental equation (4.1) but (i) it has a different status (it
is not a fundamental law), (ii) it is only approximately valid, (iii) it contains a term that
is not of the form −∇V (the friction term), (iv) forces that do obey the form −∇V (Q)
(such as the second force) can have other functions for V (such as V (x) = M gx3 ) instead
of (2.5).
The theory I call Newtonian mechanics was never actually proposed to give the
correct and complete laws of physics (although we can imagine a hypothetical world
where it does); for example, it leaves out magnetism. An extension of this theory, which
we will not consider further but which is also considered “classical physics,” includes
electromagnetic fields (governed by Maxwell’s field equations) and gravitational fields
(governed by Einstein’s field equations, also known as the theory of general relativity).
The greatest contributions from a single person to the development of Eq. (4.1) came
from Isaac Newton (1643–1727), who suggested (in his Philosophiae Naturalis Principia
Mathematica 1687) considering ODEs, in fact of second order, suggested “forces” and
2
the form m ddtQ 2 = force, and introduced the form of the gravitational force, now known

as “Newton’s law of universal gravity.” Eq. (4.2) was first written down, without the
Coulomb term, by Leonhard Euler (1707–1783). The first term was proposed in 1784
by Charles Augustin de Coulomb (1736–1806). Nevertheless, we will call (4.1) and (4.2)
“Newton’s equation of motion.”

4.2 Properties of Newtonian Mechanics

If t 7→ q(t) is a solution of Newton’s equation of motion (4.1), then so is t 7→ q(−t), which
is called the time reverse. This property is called time reversal invariance or reversibility.
It is a rather surprising property, in view of the irreversibility of many phenomena. But
since it has been explained, particularly by Ludwig Boltzmann, how reversibility of

15
the microscopic laws and irreversibility of macroscopic phenomena can be compatible,5
time reversal invariance has been widely accepted. This was also because time reversal
invariance also holds in other, more refined theories after Newtonian mechanics, such as
Maxwell’s equations of classical electromagnetism, general relativity, and the Schrödin-
ger equation.

Definition 4.1. Let v i (t) = dQi /dt denote the velocity of particle i. The energy, the
momentum, and the angular momentum of the universe are defined to be, respectively,
N N
X mk X ej ek 1
E= v 2k − Gmj mk − (4.4)
k=1
2 j,k=1
4πε0 |Qj − Qk |
j<k
N
X
p= mk v k (4.5)
k=1
XN
L= mk Qk × v k , (4.6)
k=1

where v 2 = v · v = |v|2 , and × denotes the cross product in R3 . The first term in (4.4)
is called kinetic energy, the second one potential energy.

Proposition 4.2. E, p, and L are conserved quantities, i.e., they are time independent.

Proof: exercise.

4.3 Hamiltonian Systems

A dynamical system is another name for an ODE. A dynamical system in Rn can be
characterized by specifying the function F : Ω → Rn in
dX
= F (X, t) , (4.7)
dt
with Ω ⊆ Rn × R. F can be called a time-dependent vector field on (a possibly time-
dependent domain in) Rn . (One often considers a more general concept of ODE, in
which F is a time-dependent vector field on a differentiable manifold M .)
Newtonian mechanics has a time evolution that belongs to the class of dynamical
systems, with n = 6N , X = (Q1 , . . . , QN , v 1 , . . . , v N ), and Ω (the phase space) = R6N
or rather Ω = {(Q1 , . . . , QN , v 1 , . . . , v N ) ∈ R6N : Qi 6= Qj ∀i 6= j}.
It also belongs to a narrower class, called Hamiltonian systems. Simply put, these are
dynamical systems for which the vector field F is a certain type of derivative of a scalar
5
For discussion see, e.g., J. L. Lebowitz: From Time-symmetric Microscopic Dynamics to Time-
asymmetric Macroscopic Behavior: An Overview. Pages 63–88 in G. Gallavotti , W. L. Reiter,
J. Yngvason (editors): Boltzmann’s Legacy. Zürich: European Mathematical Society (2008) http:
//arxiv.org/abs/0709.0724

16
function H called the Hamiltonian function or simply the Hamiltonian. Namely, n is as-
sumed to be even, n = 2r, and denoting the n components of x by (q1 , . . . , qr , p1 , . . . , pr ),
the ODE is of the form
dqi ∂H
= (4.8)
dt ∂pi
dpi ∂H
=− . (4.9)
dt ∂qi
Newtonian mechanics fits this definition with r = 3N , q1 , . . . , qr the 3N components
of q = (q 1 , . . . , q N ), p1 , . . . , pr the 3N components of p = (p1 , . . . , pN ) (the momenta
pk = mk v k ), and H = H(q, p) the energy (4.4) expressed as a function of q and p, that
is,
N N
X p2k X ej ek 1
H(q, p) = − Gmj mk − . (4.10)
k=1
2mk j,k=1 4πε0 |q j − q k |
j6=k

For readers familiar with manifolds I mention that the natural definition of a Hamil-
tonian system on a manifold M is as follows. M plays the role of phase space. Let
the dimension n of M be even, n = 2r, and suppose we are given a symplectic form
ω on M , i.e., a non-degenerate differential 2-form whose exterior derivative vanishes.
(Non-degenerate means that it has full rank n at every point.) The equation of motion
for t 7→ x(t) ∈ M reads
dx
ω , · = dH , (4.11)
dt
where dH means the exterior derivative of H. To make the connection with the case
M = Rn just described, dH is then the gradient of H and ω the n × n matrix

0 I
ω= (4.12)
−I 0

with I the r × r unit matrix and 0 the r × r zero matrix; ω(dx/dt, ·) becomes the
transpose of ω applied to the n-vector dx/dt, and (4.11) reduces to (4.8) and (4.9).

17
5 The Double-Slit Experiment
A few remarks about Feynman’s text:

• The word “interference” means the constructive or destructive cooperation (i.e.,

addition) of waves. The word “diffraction” means more or less the same as inter-
ference. The same phenomena arise when using more than two slits. In practice,
it is common to use dozens of slits or more (called a diffraction grating). One
speaks of “interference fringes” when referring to the bright and dark stripes of an
interference patttern.

• Feynman’s statement on page 1,

[The double slit experiment] has in it the heart of quantum mechanics.

In reality, it contains the only mystery.

is a bit too strong. Other mysteries can claim to be on equal footing with this
one. Feynman weakened his statement later.

• Feynman’s statements

We cannot make the mystery go away by “explaining” how it works.

(page 1)
Many ideas have been concocted to try to explain the curve for P12 [...]
None of them has succeeded. (page 6)
No one has found any machinery behind the law. No one can “explain”
any more than we have just “explained.” No one will give you any
deeper representation of the situation. We have no idea about a more
basic mechanism from which these results can be deduced. (page 10)

are too strong. We will see in Chapters 6, 13, and 15 that Bohmian mechanics
and other theories provide some explanation of the double slit experiment.

• Feynman’s presentation conveys a sense of mystery and a sense of paradox about

quantum mechanics. This will be a recurrent theme in this course, and one ques-
tion will be whether there is any genuine, irreducible mystery or paradox in quan-
tum mechanics.

• Feynman suggests that the mysterious character of quantum mechanics is not

surprising (“perfectly reasonable”) “because all of direct, human experience and
of human intuition applies to large objects.” This argument seems not quite on
target to me. After all, the troublesome paradoxes of the double slit are not like the
notions we often find hard to imagine (for example, how big the number 6 × 1023
is, or what 4-dimensional geometry looks like, or how a big light year is) but which
are clearly sensible. They sound more like Alice in Wonderland, like they are not
sensible—well, like paradoxes.

18
Some illustrations I’m showing you, related to the double-slit experiment:

• A picture of interference of water waves

• A picture of actual results of a double-slit experiment taken from A. Tonomura

et al., American Journal of Physics 57(2): 117–120 (1989). In this experiment,
70,000 electrons were detected individually after passing through a double slit.6
Only one electron at a time went through the double slit. About 1,000 electrons
per second went through the double slit, at nearly half the speed of light. Each
electron needed about 10−8 seconds to travel from the double slit to the screen.

• A movie created by B. Thaller (http://vqm.uni-graz.at/movies.html) showing

a numerical simulation of the Schrödinger equation at a double-slit.

Note that the observations in the double-slit experiment are in agreement with, and
in fact follow from, the Born rule and the Schrödinger equation: The relevant system
here consists of one electron, so ψt is a function in just 3 dimensions. The potential V
can be taken to be +∞ (or very large) at every point of the plate containing the two
slits—except in the slits themselves, where V = 0. Away from the plate, also V = 0.
The Schrödinger equation governs the behavior of ψt , with the initial wave function ψ0
being a wave packet, e.g., a Gaussian wave packet as in Exercise 4 of Assignment 1,
x2
ψ0 (x) = (2πσ 2 )−3/4 e−ik·x e− 4σ2 , (5.1)

moving toward the double slit. According to the Schrödinger equation, part of ψ will
be reflected from the wall, part of it will pass through the two slits. The two parts
of the wave emanating from the two slits, ψ1 and ψ2 will overlap and thus interfere,
ψ = ψ1 + ψ2 . When we detect the electron, its probability density is given, according
to the Born rule, by

|ψ|2 = |ψ1 + ψ2 |2 = |ψ1 |2 + |ψ2 |2 + 2 Re(ψ1∗ ψ2 ) . (5.2)

What if we include a device (such as Feynman’s lamp) that will detect the electron
at one of the slits? Then we detect the electron twice: once at a slit and once at the
backdrop screen. Thus, we either have to regard it as a many-particle problem (involving
at least two particles, the electron and the photon), or we need a version of the Born
rule suitable for repeated detection. We will study both approaches in later lectures.

6
More precisely, electrons could pass right or left of a positively charged wire of diameter 1 µm.
Those passing on the right get deflected to the left, and vice versa. Thus, the arrangement leads to the
superposition of waves travelling in slightly different directions—just what is needed for interference.

19
6 Bohmian Mechanics
“[Bohmian mechanics] exercises the mind in a very salutary way.”
J. Bell, Speakable and Unspeakable in Quantum Mechanics, page 171

The situation in quantum mechanics is that we have a set of rules, known as the
quantum formalism, for computing the possible outcomes and their probabilities for
(more or less) any conceivable experiment, and everybody agrees (more or less) about
the formalism. What the formalism doesn’t tell us, and what is controversial, is what
exactly happens during these experiments, and how nature arrives at the outcomes
whose probabilities the formalism predicts. There are different theories answering these
questions, and Bohmian mechanics is one of them.
Let me elucidate my statements a bit. We have already learned part of the quantum
formalism: the Schrödinger equation and the Born rule. These rules have allowed us to
predict the possible outcomes of the double-slit experiment with a single electron (easy
here: a spot anywhere on the screen) and their probability distribution (here: a prob-
ability distribution corresponding to |ψ|2 featuring a sequence of maxima and minima
corresponding to interference fringes). What the rules didn’t tell us was what exactly
happens during this experiment (e.g., how the electron moves). Bohmian mechanics fills
this gap.
We have not seen all the rules of the quantum formalism yet. We will later, in Lec-
tures 6 and 8. So far, we have formulated the Born rule only for position measurements,
and we have not considered repeated detections.

6.1 Definition of Bohmian Mechanics

According to Bohmian mechanics, the world consists of a space, which is a 3-dimensional
Euclidean space, and particles (material points) moving around in space with time. Let
us suppose there are N particles in the world (say, N ≈ 1080 ), and let us fix a Cartesian
coordinate system in Euclidean space. At every time t, particle number i (i = 1, . . . , N )
has a position Qi (t) ∈ R3 . These positions are governed by Bohm’s equation of motion

dQi ~ ∇i Ψ
= Im (t, Q(t)) . (6.1)
dt mi Ψ
Here, Q(t) = (Q1 (t), . . . , QN (t)) is the configuration at time t, and Ψ is a wave function
that is called the wave function of the universe and evolves according to the Schrödinger
equation
N
∂Ψ X ~2 2
i~ =− ∇ Ψ+V Ψ (6.2)
∂t i=1
2mi i
with V given by (2.5). The configuration Q(0) at the initial time of the universe (say,
right after the big bang) is chosen randomly by nature with probability density

ρ0 (q) = |Ψ0 (q)|2 . (6.3)

20
(We write capital Q for the configuration of particles and little q for the configuration
variable in either ρ or Ψ.) This completes the definition of Bohmian mechanics.
The central fact about Bohmian mechanics is that its predictions agree exactly with
those of the quantum formalism (which so far have always been confirmed in experi-
ment). We will understand later why this is so.
Eq. (6.1) is an ordinary differential equation of first order (specifying the velocity
rather than the acceleration). Thus, the initial configuration Q(0) determines Q(t) for
all t, so Bohmian mechanics is a deterministic theory. On the other hand, Q(t) is
random because Q(0) is. Note that this randomness does not conflict with determinism.
It is a theorem, the equivariance theorem, that the probability distribution of Q(t) is
given by |Ψt (q)|2 . We will prove the equivariance theorem later in this Lecture. As a
consequence, it is consistent to assume the Born distribution for every t. Note that due
to the determinism, the Born distribution can be assumed only for one time (say t = 0);
for any other time t, then, the distribution of Q(t) is fixed by (6.1). The state of the
universe at any time t is given by the pair (Q(t), Ψt ).
Let us have a closer look at Bohm’s equation of motion (6.1). If we recall the formula
(2.16) for the probability current then we can rewrite Eq. (6.1) in the form
dQi j probability current
= i2 = . (6.4)
dt |Ψ| probability density
This is a very plausible relation because it is a mathematical fact about any particle
system with deterministic velocities that
probability current = velocity × probability density . (6.5)
We will come back to this relation when we prove equivariance.
Here is another way of re-writing (6.1). A complex number z can be charaterized by
its modulus R ≥ 0 and its phase S ∈ R, z = ReiS . It will be convenient in the following
to replace S by S/~ (but we will still call S the phase of z). Then a complex-valued
function Ψ(t, q) can be written in terms of the two real-valued functions R(t, q) and
S(t, q) according to
Ψ(t, q) = R(t, q) eiS(t,q)/~ . (6.6)
Let us plug this into (6.1): Since
∇i Ψ = ∇i (ReiS/~ ) (6.7)
= (∇i R)eiS/~ + R∇i eiS/~ (6.8)
i∇i S iS/~
= (∇i R)eiS/~ + R e , (6.9)
~
we have that
!
~ ∇i Ψ ~ ∇i R ∇i S
Im = Im +i (6.10)
mi Ψ mi R}
| {z ~
real
~ ∇i S 1
= = ∇i S . (6.11)
mi ~ mi

21
Thus, (6.1) can be rewritten as

dQi 1
= ∇i S(t, Q(t)) . (6.12)
dt mi
In words, the velocity is given (up to a constant factor involving the mass) by the
gradient of the phase of the wave function.
A historical note. A few years before the development of the Schrödinger equation,
Louis de Broglie had suggested a quantitative rule-of-thumb for wave–particle duality:
A particle with momentum p = mv should “correspond” to a wave with wave vector k
according to the de Broglie relation

p = ~k . (6.13)

The wave vector is defined by the relation ψ = eik·x (so it is defined only for plane waves);
it is orthogonal to the wave fronts (surfaces of constant phase), and its magnitude is
|k| = 2π/(wave length). Now, if the wave is not a plane wave then we can still define
a local wave vector k(x) that is orthogonal to the surface of constant phase and whose
magnitude is 1/(rate of phase change). Some thought shows that k(x) = ∇S(x)/~. If
we use this expression on the right hand side of (6.13) and interpret p as mass times
the velocity of the particle, we obtain exactly Eq. (6.12), that is, Bohm’s equation of
motion.

6.2 Historical Overview

The idea that the wave function might determine particle trajectories as a “guiding field”
was perhaps first expressed by Albert Einstein around 1923 and considered in detail by
John C. Slater in 1924. Bohmian mechanics was developed by Louis de Broglie in
1927 but then abandoned. It was rediscovered independently by Nathan Rosen (known
for the Einstein–Rosen bridge in general relativity and the Einstein–Podolsky–Rosen
argument) in 1945 and David Bohm in 1952. Bohm was the first to realize that it
actually makes the correct predictions, and the first to take it seriously as a physical
theory. Several physicists mistakenly believed that Bohmian mechanics makes wrong
predictions, including de Broglie, Rosen, and Einstein. Curiously, Bohm’s 1952 paper
provides a strange presentation of the theory, as Bohm insisted on writing the law
of motion as an equation for the acceleration d2 Qj /dt2 , obtained by taking the time
derivative of (6.1).

6.3 Equivariance
The term “equivariance” comes from the fact that the two relevant quantities, ρt and
|ψt |2 , vary equally with t. (Here, ρt is the distribution arising from ρ0 by transport along
the Bohmian trajectories.) The equivariance theorem can be expressed by means of the

22
following diagram:
Ψ0 −→ ρ
0
Ut y (6.14)
 
y
Ψt −→ ρt
The horizontal arrows mean taking | · |2 , the left vertical arrow means the Schrödinger
evolution from time 0 to time t, and the right vertical arrow means the transport of
probability along the Bohmian trajectories. The statement about this diagram is that
both paths along the arrows lead to the same result.
As a preparation for the proof, we note that the equation of motion can be written
in the form
dQ
= vt (Q(t)) , (6.15)
dt
where vt : R3N → R3N is the vector field on configuration space vt = v = (v 1 , . . . , v N )
whose i-th component is
~ ∇i Ψ
vi = Im . (6.16)
mi Ψ
We now address the following question: If vt is known for all t, and the initial probability
distribution ρ0 is known, how can we compute the probability distribution ρt at other
times? The answer is the continuity equation
∂ρt
= −div ρt vt . (6.17)
∂t
This follows from the fact that the probability current is given by ρt vt . In fact, in any
dimension d (d = 3N or otherwise) and for any density (probability density or energy
density or nitrogen density or . . . ) it is true that

current = density × velocity (6.18)

(provided that the velocity vector field vt is not itself random).

We are now ready to prove the equivariance theorem. (This is not a rigorous proof,
but this argument contains the essence of the reason why the equivariance theorem is
true.) We first show that
∂ρt ∂
if ρt = |ψt |2 then = |ψt |2 (6.19)
∂t ∂t
and then conclude that if ρ0 = |ψ0 |2 then ρt = |ψt |2 (which is the equivariance theorem).
By the continuity equation (6.17) for ρt and the continuity equation (2.16) for |ψt |2 , the
right equation in (6.19) is equivalent to
X X
− ∇i · ρt v i = − ∇i · j i . (6.20)
i i

As mentioned in (6.4), v i = j i /|ψt |2 . Thus, if ρt = |ψt |2 then Eq. (6.20) is true, which
completes the proof.

23
6.4 The Double-Slit Experiment in Bohmian Mechanics
Let us apply what we know about Bohmian mechanics to N = 1 and the wave function
of the double-slit experiment. We assume that the particle in the experiment moves
as if it was alone in the universe, with the potential V representing the wall with two
slits. We will justify that assumption in a later Lecture. We know already what the
wave function ψ(t, x) looks like (remember the movie). Here is a picture of the possible
trajectories of the particle.

Figure 1: Several alternative Bohmian trajectories of a particle in a double-slit experi-

ment

We know from the equivariance theorem that the position will always have proba-
bility distribution |ψt |2 . Thus, if we detect the particle at time t we find its distribution
in agreement with the Born rule.
Note that the particle moves not along straight lines, as it would according to classical
mechanics. Note that the wave passes through both slits, while the particle passes
through one only. Think about how that answers the paradoxes pointed out by Feynman.
Note that the particle trajectories would be different if one slit were closed. Note that
we can find out which slit the particle went through without disturbing the interference
pattern: check whether the particle arrived in the upper or lower half of the detection
screen.

24
“Is it not clear from the smallness of the scintillation on the screen that we
have to do with a particle? And is it not clear, from the diffraction and
interference patterns, that the motion of the particle is directed by a wave?
De Broglie showed in detail how the motion of a particle, passing through
just one of two holes in screen, could be influenced by waves propagating
through both holes. And so influenced that the particle does not go where
the waves cancel out, but is attracted to where they cooperate. This idea
seems to me so natural and simple, to resolve the wave–particle dilemma in
such a clear and ordinary way, that it is a great mystery to me that it was
so generally ignored.” J. Bell, Speakable and Unspeakable in Quantum
Mechanics, page 191

Coming back to Feynman’s description of the double-slit experiment, we see that his
statement that its outcome “cannot be explained” is not quite accurate. It is true that
it cannot be explained in Newtonian mechanics, but it can in Bohmian mechanics.

6.5 Delayed Choice Experiments

John Archibald Wheeler proposed a variant of the double-slit experiment that may
increase further the sense of paradox.7 Since Wheeler’s variant, called the delayed-choice
experiment, uses no more than the Schrödinger equation and Born’s rule, and since we
know that Bohmian mechanics can account for that, it is clear that the paradox must
disappear in Bohmian mechanics. Let us have a look at what Wheeler’s paradox is and
how Bohmian mechanics resolves it.
Wheeler considers preparing, by means of a double-slit or in some other way, two
wave packets moving in different directions, so that they pass through each other. After
passing through each other, they continue moving in different directions and thus get
separated again. Wheeler gives the experimenter two choices: either put a screen in the
overlap region or put it further away, where the two wave packets have clearly separated.
If you put the screen in the overlap region, you will see an interference pattern, which
is taken to indicate that the electron is a wave and went through both slits. However,
if you put the screen further away, the detection occurs in one of two regions. If the
detection occurs in the left (right) region, this is taken to indicate that the particle
went through the right (left) slit because a wave packet passing through the right (left)
will end up in the left (right) region on the screen. So, Wheeler argued, we can choose
whether the electron is particle or wave: if we put the screen far away, it must be particle
because we see which slit it went through; if we put the screen in the overlap, it must be
wave because we see the interference pattern. Even more, we can force the electron to
become wave or particle (and to go through both slits or just one) even after it passed
through the double-slit! So it seems like there must be retrocausation, i.e., situations in
which the cause lies in the future of the effect.
7
J. A. Wheeler: The ‘Past’ and the ‘Delayed-Choice Double-Slit Experiment.’ Pages 9–48 in
A R. Marlow (editor): Mathematical Foundations of Quantum Theory, Academic Press (1978)

25
Bohmian mechanics illustrates that these conclusions don’t actually follow. Bell
described that in his article; here are some key points again. To begin with, there is
no retrocausation in Bohmian mechanics, as any intervention of observers will change
ψ only in the future, not in the past, of the intervention, and the particle trajectory
will correspondingly be affected also only in the future. Another basic observation is
that with the literal wave-particle dualism of Bohmian mechanics (there is a wave and
there is a particle), there is nothing left of the idea that the electron is sometimes a
wave and sometimes a particle, and hence even less of the idea that observers could
force an electron to become a wave or to become a particle. In detail: the wave passes
through both slits, the particle through one; in the overlap region, the two wave packets
interfere, and the particle’s |ψ|2 distribution features an interference pattern; if there
is no screen in the overlap region, then the particle moves on in such a way that the
interference pattern disappears and two separate spots form.
After understanding the Bohmian picture of this experiment, some steps in Wheeler’s
reasoning appear strange. If one assumes that there are no particle trajectories in the
quantum world, as one usually does in orthodox quantum mechanics (recall Feynman’s
chapter), then it would seem natural to say that there is no fact about which slit the
electron went through, given that there was no attempt to detect the electron while
passing a slit. Surprising it is, then, that Wheeler claims that the detection on the far-
away screen reveals which slit it took! How can anything reveal which slit the electron
took if the electron didn’t take a slit?
There is another interesting aspect to the story that I will call Wheeler’s fallacy.
When you analyze the Bohmian picture in the case of far-away screen, it turns out that
the trajectories passing through the left (right) slit end up in the left (right) region.
(We will discuss why in the exercises.) So Wheeler makes the wrong retrodiction of
which slit the electron passed through! How could this happen? Wheeler noticed that
if the right (left) slit is closed, so only one packet comes out, and it comes out of the
left (right) slit, then only detection events in the right (left) region occur. This is also
true in Bohmian mechanics. Now Wheeler concludes then when wave packets come out
of both slit, and if a detection occurs in the right region, then the particle must have
passed through the left slit. This is wrong in Bohmian mechanics, and once you realize
this, it is obvious that Wheeler’s conclusion is a non sequitur —a fallacy.
Shahriar Afshar proposed and carried out a further variant of the experiment, known
as Afshar’s experiment.8 In this variant, one puts the screen in the far position, but one
adds obstacles (that would absorb or reflect electrons) in the overlap region, in fact in
those places where the interference is destructive. If an interference pattern occurs in the
overlap region, even if it is not observed, then almost no electrons arrive at the obstacles,
and almost no electrons get absorbed or reflected. Thus, if all electrons arrive on the far
screen in either the left or the right region, as in fact observed in the experiment, then
this is indicative that there was an interference pattern in the overlap region even if it
was not observed. Afshar argued that this shows that wave and particle must both have
8
S. S. Afshar: Violation of the principle of complementarity, and its implications. Proceedings of
SPIE 5866: 229–244 (2005) https://arxiv.org/abs/quant-ph/0701027

26
existed. Again, Bohmian mechanics easily explains the outcome of this experiment.

27
7 Fourier Transform and Momentum
7.1 Fourier Transform
We know from Exercise 2 of Homework 1 that the plane wave eik·x evolves according to
the free Schrödinger equation to
2
eik·x e−i~k t/2m
. (7.1)

Since the Schrödinger equation is linear, any linear combination of plane waves with
different wave vectors k, X
ck eik·x (7.2)
with complex coefficients ck , will evolve to
2
X
ck eik·x e−i~k t/2m . (7.3)

Moreover, a “continuous linear combination”

Z
d3 k c(k)eik·x (7.4)
R3

with arbitrary complex c(k) will evolve to

Z
2
d3 k c(k)eik·x e−i~k t/2m . (7.5)
R3

Definition 7.1. For a given function ψ : Rd → C, the function

Z
1
ψ(k) =
b ψ(x) e−ik·x dd x (7.6)
(2π)d/2 Rd

is called the Fourier transform of ψ, ψb = F (ψ).

Theorem 7.2. Inverse Fourier transformation:
Z
1 b eik·x dd k .
ψ(x) = ψ(k) (7.7)
(2π)d/2 Rd
Note the different sign in the exponent (it is crucial). If we had not put the pre-factor
in (7.6) we would have obtained the pre-factor squared in (7.7).
We have been sloppy in the formulation of the definition and the theorem in that
we have not specified the class of functions to which these formulas apply. In fact, (7.6)
can be applied whenever ψ ∈ L1 (the space of all integrable functions, i.e., those with
kψkL1 = dx |ψ| < ∞) and then yields ψb ∈ L∞ because |ψ(k)| ≤ (2π)−d/2 kψkL1 by
R
b
the triangle inequality. Conversely, if ψb ∈ L1 , then (7.7) holds, and ψ ∈ L∞ . However,
if ψ ∈ L1 \ L∞ then ψb ∈ / L1 , and (7.7) is not literally applicable. A space of interest
in this context is the Schwartz space S of rapidly decaying functions, which contains

28
the smooth functions ψ : Rd → C such that for every n ∈ N and every α ∈ Nd0 there
is Cn,α > 0 such that |∂ α ψ(x)| < Cn,α |x|−n for all x ∈ Rd , where ∂ α := ∂1α1 · · · ∂dαd .
For example, every Gaussian wave packet lies in S ; note that S ⊂ L1 ∩ L∞ . It
turns out that Fourier transformation maps S bijectively to itself. Moreover, S is a
dense subspace in L2 , and F can be extended in a unique way to a bounded operator
F : L2 → L2 , even though the integral (7.6) exists only for ψ ∈ L1 ∩ L2 .
Going back to Eq. (7.5) and taking c(k) = (2π)−3/2 ψb0 (k), we can express the solution
of the free Schrödinger equation as
Z
1 3

−i~k2 t/2m b

ψt (x) = 3/2
d k e ψ 0 (k) eik·x . (7.8)
(2π) R3

In words, we can find ψt from ψ0 by taking its Fourier transform ψb0 , multiplying by a
2
suitable function of k, viz., e−i~k t/2m , and taking the inverse Fourier transform.
The same trick can be done for N particles. Then d = 3N , ψ = ψ(x1 , . . . , xN ),
ψb = ψ(k
b 1 , . . . , kN ), and the factor to multiply by is

X N
~ 2 ~ 2
exp −i kj t instead of exp −i k t . (7.9)
j=1
2mj 2m

Note that we take the Fourier transform only in the space variables, not in the time
variable. There are also applications in which it is useful to consider a Fourier transform
in t, but not here.
Example 7.3. The Fourier transform of a Gauss function. Let σ > 0 and
x2
ψ(x) = C e− 4σ2 (7.10)

with C a constant. Then, using the substitution y = x/(2σ),

Z
C 2 2
ψ(k) =
b
3/2
e−x /4σ e−ik·x d3 x (7.11)
(2π) R3
3 3 Z
2 Cσ 2
= 3/2
e−y −2iσk·y d3 y (7.12)
(2π) 3
| {z } R
=:C2
Z
2 2 2
= C2 e−(y+iσk) −σ k d3 y (7.13)
R3 Z
−σ 2 k2 2
= C2 e e−(y+iσk) d3 y (7.14)
R3

The evaluation of the last integral involves the Cauchy integral theorem, varying the
path of integration and estimating errors. Here, I just report that the outcome is the
constant π 3/2 , independently of σ and k. Thus,
2 2
ψ(k)
b = C3 e−σ k (7.15)

29
with C3 = C2 π 3/2 . In words, the Fourier transform of a Gaussian function is another
Gaussian function, but with width 1/(2σ) instead of σ. (We see here shadows of the
Heisenberg uncertainty relation, which we will discuss in the next chapter.)
Rule 7.4. (a)
∂ψ
d
(k) = ikj ψ(k)
b . (7.16)
∂xj
That is, differentiation of ψ corresponds to multiplication of ψb by ik.
(b) Conversely,
∂ ψb
−ix
\ jψ = . (7.17)
∂kj

(c) If f (x) = eik0 ·x g(x), then fˆ(k) = ĝ(k − k0 ).

(d) If f (x) = g(x − x0 ), then fˆ(k) = e−ik·x0 ĝ(k).
Proof. (a) Indeed, using integration by parts (and assuming that the boundary terms
vanish),
Z
d∂ψ 1 ∂ψ
(k) = d/2
dd x (x) e−ik·x (7.18)
∂xj (2π) R d ∂x j
Z
1 ∂ −ik·x
=− d/2
dd x ψ(x) e (7.19)
(2π) Rd ∂xj
Z
1
=− dd x ψ(x) (−ikj )e−ik·x (7.20)
(2π)d/2 Rd
Z
1
= ikj dd x ψ(x) e−ik·x (7.21)
(2π)d/2 Rd
= ikj ψ(k)
b . (7.22)
(This calculation is a rigorous proof in S .)
(b) Interchanging differentiation and integration (which again is rigorously justified in
S ),
Z
∂ ψb ∂ 1
= d/2
ψ(x) e−ik·x dd x (7.23)
∂kj ∂kj (2π) d
Z R
1
= d/2
−ixj ψ(x) e−ik·x dd x . (7.24)
(2π) Rd

(c) Indeed,
Z
1
ĝ(k − k0 ) = g(x) e−i(k−k0 )·x dd x (7.25)
(2π)d/2 Rd
Z
1 ik0 ·x

= e g(x) e−ik·x dd x . (7.26)
(2π)d/2 Rd

30
(d) This follows in much the same way.

Example 7.5. The general Gauss packet

(x−x0 )2
ψ(x) = C eik0 ·x e− 4σ 2 (7.27)

has Fourier transform

2 2
ψ(k)
b = C3 eik0 ·x0 e−ik·x0 e−σ (k−k0 ) , (7.28)

which is again a general Gaussian packet with center k0 and width 1/(2σ).

∗∗∗
Fourier transformation defines a unitary operator F : L2 (Rd ) → L2 (Rd ), F ψ = ψ. b
We verify that kF ψkL2 = kψkL2 at least for nice ψ. Note first that, for f, g ∈ L ∩ L2 ,
1

Z Z Z Z
−ik·x
e d d
f (k) d k g(x) d x = e−ik·x g(x) dd x f (k) dd k (7.29)

by changing the order of integration (which integral is done first). The theorem saying
that we are allowed to change the order of integration (for an integrable integrand f g)
is called Fubini’s theorem. From Eq. (7.29) we can conclude hg ∗ |fˆi = hĝ ∗ |f i. Since
Z ∗
∗
(F f )(k) = (2π) −d/2
e−ik·x
f (x) dd x = F −1 (f ∗ )(k) , (7.30)

setting g = F −1 (f ∗ ) = (F f )∗ yields hfˆ|fˆi = hf |f i, which completes the proof.

7.2 Momentum
“Position measurements” usually consist of detecting the particle. “Momentum mea-
surements” usually consist of letting the particle move freely for a while and then mea-
suring its position.9
We now analyze this experiment using Bohmian mechanics. We define the asymptotic
velocity u to be
dQ
u = lim (t) (7.31)
t→∞ dt

if this limit exists. It can also be expressed as

Q(t)
u = lim . (7.32)
t→∞ t
9
Alternatively, one lets the particle collide with another particle, makes a “momentum measurement”
on the latter, and makes theoretical reasoning about what the momentum of the former must have been.

31
To understand this, note that (Q(t) − Q(0))/t is the average velocity during the time
interval [0, t]; if an asymptotic velocity exists (i.e., if the velocity approaches a constant
vector u) then the average velocity over a long time t will be close to u because for
most of the time the velocity will be close to u. The term Q(0)/t converges to zero as
t → ∞, so we obtain (7.32).
We want the momentum measurement to measure p := mu for a free particle (V =
0). So we measure Q(t) for large t, divide by t, and multiply by m. We can and will
also take this recipe as the definition of a momentum measurement, independently of
whether we want to use Bohmian mechanics.
How large do we need t to be? In practice, often not very. When thinking of a particle
emitted by a radioactive atom, or coming from a particle collision in an accelerator
experiment (such as the Large Hadron Collider LHC in Geneva), a millisecond is usually
enough for dQ/dt to become approximately constant.
According to the Born rule, the outcome p is random, and its distribution can be
characterized by saying that, for any set B ⊂ R3 ,
P(u ∈ B) = lim P(Q(t)/t ∈ B) (7.33)
t→∞
= lim P(Q(t) ∈ tB) (7.34)
t→∞
Z
= lim |ψt (x)|2 d3 x , (7.35)
t→∞ tB

where
tB = {tx : x ∈ B} (7.36)
is the scaled set B.
Theorem 7.6. Let ψ(t, x) be a solution of the free Schrödinger equation and B ⊆ R3 .
Then Z Z
2 3
lim |ψ(t, x)| d x = |ψb0 (k)|2 dk . (7.37)
t→∞ tB mB/~
As a consequence, the probability density of p is
1 b p 2
ψ0 . (7.38)
~3 ~
The theorem essentially says that when we think of ψ0 as a linear combination of
plane waves eik·x as in Eq. (7.4) or (7.7), then the contribution from a particular value of
k will move at a velocity of ~k/m (shadows of the de Broglie relation p = ~k!), and in
the long run these contributions will tend to separate in space (i.e., overlap no longer),
leaving the contribution from k in the region around ~kt/m. We see the de Broglie
relation again in (7.38) when we insert p/~ for k in ψ. b The upshot of this analysis can
be formulated as
Born’s rule for momentum. If we measure the momentum of a particle with wave
function ψ then the outcome is random with probability density
1 b p 2
ρmom (p) = 3 ψ . (7.39)
~ ~

32
Likewise, if we measure the momenta of N particles with joint wave function ψ(x1 , . . . , xN ),
then the outcomes are random with joint probability density
1 p p 2
ρmom (p1 , . . . , pN ) = 3N ψb 1 , . . . , N . (7.40)
~ ~ ~
For this reason, the Fourier transform ψb is also called the momentum representation
of ψ, while ψ itself is called the position representation of the wave function.
Example 7.7. The general Gaussian wave packet (7.27), whose Born distribution in
position space is a Gaussian distribution with mean x0 and width σ, has momentum
distribution
2 2
ρmom (p) = (const.) e−2(σ/~) (p−~k0 ) , (7.41)
that is, a Gaussian distribution with mean ~k0 and width
~
σP = . (7.42)
2σ
In particular, if we want a momentum distribution that is sharply peaked around some
value p0 = ~k0 , that is, if we want σP to be small, then σ must be large, so ψ must be
wide, “close to a plane wave.”

7.3 Momentum Operator

Let pj , j = 1, 2, 3, be the component of the vector p in the direction of the xj -axis. The
expectation value of pj is (using Eq. (7.16) in the fourth line and unitarity of F in the
sixth)
Z
hpj i = pj ρmom (p) d3 p (7.43)
R3
Z
= ~kj |ψb0 (k)|2 d3 k (7.44)
D E
= ψb0 ~kj ψb0 (7.45)
D ∂ψ
d0 E
= ψb0 (−i~) (7.46)
∂xj
D ∂ψ d0 E
= −i~ ψb0 (7.47)
∂xj
D ∂ψ E
0
= −i~ ψ0 (7.48)
∂xj
D ∂ E
= ψ0 −i~ ψ0 . (7.49)
∂xj

This relation motivates calling Pj = −i~ ∂x∂ j the momentum operator in the xj -direction,
and (P1 , P2 , P3 ) the vector of momentum operators.

33
We note for later use that, by the same reasoning
Z D ∂ 2 E
hpj i = (~kj )2 |ψb0 (k)|2 dk = ψ0 −i~
2
ψ0 . (7.50)
∂xj

7.4 Tunneling
The tunnel effect is another quantum effect that is widely perceived as paradoxical.
Consider the 1-d Schrödinger equation with a potential V that has the shape of a
potential barrier of height V0 > 0. As an idealized example, suppose

V (x) = V0 10≤x≤L (7.51)

or a smooth approximation thereof.

Classically, the motion of a particle in the potential V (or any potential in 1 dimen-
sion) can easily be deduced from energy conservation: If the initial position is < 0 and
the initial momentum is p0 > 0, then the initial energy is E = p20 /2m, and whenever
the particle reaches location x, its momentum must be
p
p = ± 2m(E − V (x)) . (7.52)

In particular, the particle can never reach a region in which V (x) > E; so, if E < V0 ,
then the particle will turn around at the barrier and move back to the left.
That is different in quantum mechanics. Consider a Gaussian wave packet, initially
to the left of the barrier, with a rather sharp momentum distribution around a p0 > 0
with p20 /2m < V0 . Then part of the packet will be reflected, and part of it will pass
through the barrier! (And
p the part that passes through is much larger than just the
tail of ρmom with p ≥ V0 /2m.) I will show you another movie created by B. Thaller
(http://vqm.uni-graz.at/movies.html) with a numerical simulation of the Schrödin-
ger equation with potential (7.51). As a consequence, the Born rule predicts a substantial
probability for the particle to show up on the other side of the barrier (“tunneling
probability”). Figure 2 shows the Bohmian trajectories for such a situation (with only
a small tunneling probability).
For computing the tunneling probability, an easy recipe is to assume that the initial
ψ is close to a plane wave consider only the interior part of it that actually looks like a
plane wave. One solves the Schrödinger equation for a plane wave arriving, computes
the amount of probability current through the barrier, and compares it to the current
associated with the arriving wave.10
What is paradoxical about tunneling? Perhaps not so much, once we give up New-
tonian mechanics and accept that the equation of motion can be non-classical, such as
Bohm’s. Then it is only to be expected that the trajectories are different, and not sur-
prising that some barriers which Newton’s trajectories cannot cross, Bohm’s trajectories
10
For further discussion of why that yields a reasonable result, see T. Norsen: The Pilot-Wave
Perspective on Quantum Scattering and Tunneling. American Journal of Physics 81: 258 (2013)
http://arxiv.org/abs/1210.7265.

34
Figure 2: Bohmian trajectories in a tunneling situation. Picture taken from D. Bohm
and B. J. Hiley: The Undivided Universe, London: Routledge (1993)

can. Part of the sense of paradox comes perhaps from a narrative that is often told when
the tunnel effect is introduced: that the particle can “borrow” some energy for a short
amount of time by virtue of an energy–time uncertainty relation. This narrative seems
not very helpful.
The tunnel effect plays a crucial role in radioactive α-decay (where the α-particle
leaves the nucleus by means of tunneling) and scanning tunneling electron microscopy
(where the distance between a needle and a surface is measured by means of measuring
the tunneling probability).
There are further related effects: anti-tunneling means that a particle gets reflected
by a barrier so low that a classical particle with the same initial momentum would
be certain to pass it; this happens because a solution of the Schrödinger equation will
partly be reflected even at a low barrier. Another effect has been termed paradoxical
reflection:11 Consider a downward potential step as in

V (x) = −V0 10≤x . (7.53)

Classically, a particle coming from the left has probability zero to be reflected back, but
according to the Schrödinger equation, wave packets will be partly reflected and partly
11
For detailed discussion, see P. L. Garrido, S. Goldstein, J. Lukkarinen, and R. Tumulka: Paradoxical
Reflection in Quantum Mechanics. American Journal of Physics 79(12): 1218–1231 (2011) http:
//arxiv.org/abs/0808.0610

35
transmitted. Remarkably, in the limit V0 → ∞, the reflection probability converges to
1. “A quantum ball can’t roll off a cliff!” On a potential plateau, surrounded by deep
downward steps, a particle can be confined for a long time, although finally, in the limit
t → ∞, all of the wave function will leave the plateau region and propagate to spatial
infinity.

36
8 Operators and Observables
8.1 Heisenberg’s Uncertainty Relation
As before, hXi denotes the expectation of the random variable X. The variance of the
momentum distribution for the initial wave function ψ ∈ L2 (R) (in one dimension) is
D 2 E
σP2 := p − hpi (8.1)
D E
= p2 − 2phpi + hpi2 (8.2)
= hp2 i − 2hpi2 + hpi2 (8.3)
= hp2 i − hpi2 (8.4)
= hψ|P 2 ψi − hψ|P ψi2 (8.5)
D 2 E
= ψ P − hψ|P ψi ψ . (8.6)

The position distribution |ψ(x)|2 has expectation

Z
hQ(0)i = x|ψ(x)|2 dx = hψ|Xψi (8.7)

with the position operator Xψ(x) = xψ(x). Moreover,

Z
hQ(0) i = x2 |ψ(x)|2 dx = hψ|X 2 ψi ,
2
(8.8)

so the variance of the position distribution |ψ(x)|2 is

Z D 2 E
σX := (x − hQ(0)i)2 |ψ(x)|2 dx = ψ X − hψ|Xψi ψ .
2
(8.9)

Theorem 8.1. (Heisenberg uncertainty relation) For any ψ ∈ L2 (R) with kψk = 1,

~
σX σP ≥ . (8.10)
2
This means that any wave function that is very narrow must have a wide Fourier
transform.

Example 8.2. Consider the general Gaussian wave packet (7.27), for simplicity in 1
dimension. The standard deviation of the position distribution is σX = σ, and we
computed the width of the momentum distribution in (7.42). We thus obtain for this ψ
that
~
σX σP = , (8.11)
2
just the lowest value allowed by the Heisenberg uncertainty relation.

37
Example 8.3. Consider a wave packet passing through a slit. Let us ignore the part of
the wave packet that gets reflected because it did not arrive at the slit, and focus on just
the part that makes it through the slit. That is a narrow wave packet, and its standard
deviation in position, σX , is approximately the width of the slit. If that is very small
then, by the Heisenberg uncertainty relation, σP must be large, so the wave packet must
spread quickly after passing the slit. If the slit is wider, the spreading is weaker.

∗∗∗
In Bohmian mechanics, the Heisenberg uncertainty relation means that whenever
the wave function is such that we can know the position of a particle with (small)
inaccuracy σX then we are unable to know its asymptotic velocity better than with
inaccuracy ~/(2mσX ); thus, we are unable to predict its future position after a large
time t (for V = 0) better than with inaccuracy ~t/(2mσX ). This is a limitation to
knowledge in Bohmian mechanics.
The Heisenberg uncertainty relation is often understood as excluding the possibility
of particle trajectories. If the particle had a trajectory, the reasoning goes, then it would
have a precise position and a precise velocity (and thus a precise momentum) at any
time, so the position uncertainty would be zero and the momentum uncertainty would
be zero, so σX = 0 and σP = 0, in contradiction with (8.10). We know already from
Bohmian mechanics that this argument cannot be right. It goes wrong by assuming
that if the particle has a precise position and a precise velocity then they can also be
precisely known and precisely controlled. Rather, inhabitants of a Bohmian universe,
when they know a particle’s wave function to be ϕ(x), cannot know its position more
precisely than the |ϕ|2 distribution allows.
In the traditional, orthodox view of quantum mechanics, it is assumed that electrons
do not have trajectories. It is assumed that the wave function is the complete description
of the electron, in contrast to Bohmian mechanics, where the complete description is
given by the pair (Q, ψ), and ψ alone would only be partial information and thus an
incomplete description. From these assumptions, it follows that the electron does not
have a position before we attempt to detect it. Likewise, it does not have a momentum
before we attempt to measure it. Thus, in orthodox quantum mechanics the Heisenberg
uncertainty relation does not amount to a limitation of knowledge because there is
no fact in the world that we do not know about when we do not know its position.
Unfortunately, the uncertainty relation is often expressed by saying that it is impossible
to measure position and momentum at the same time with arbitrary accuracy; while
this would be appropriate to say in Bohmian mechanics, it is not in orthodox quantum
mechanics because this formulation presumes that position and momentum have values
that we could discover by measuring them.
The uncertainty relation is also involved in the double slit experiment as follows. If
it did not hold, we could make the electron move exactly orthogonal to the screen after
passing through the narrow slits–and arrive very near the center of the screen. Thus, the
distribution on the detection screen could not have a second- or third-order maximum.

38
Since in orthodox quantum mechanics the double-slit experiment is understood as in-
dicative of a paradoxical nature of reality, the uncertainty relation is then understood as
“protecting” the paradox from becoming a visible contradiction. A similar argument, as
pointed out by Feynman, applies to the photon colliding with the electron for detecting
which slit it went through, and its effect of destroying the interference.

8.2 Self-adjoint Operators

The following rule is part of the quantum formalism:
The most relevant experiments are measurements of certain quantities
called observables. Every observable is associated with a self-adjoint op- (8.12)
erator on Hilbert space.
It is actually a mixture of fact and opinion, as it is formulated from the traditional or
orthodox point of view of quantum mechanics. I use this formulation because it is very
common. We need to dissect later which part of it is fact, and which is opinion. As Bell
wrote (Speakable and Unspeakable in Quantum Mechanics, page 215),

On this list of bad words from good books, the worst of all is ‘measurement.’

But first let us get acquainted with the mathematics of self-adjoint operators.

Theorem 8.4. Every bounded operator A : H → H on a Hilbert space H possesses

one and only one adjoint operator A† , defined by the property that for all ψ, φ ∈ H ,

hψ|Aφi = hA† ψ|φi . (8.13)

For an unbounded operator A : D(A) → H with dense domain D(A) ⊂ H , the adjoint
operator A† is uniquely defined by the property (8.13) for all ψ ∈ D(A† ) and φ ∈ D(A)
on the domain
n o
D(A ) = ψ ∈ H : ∃χ ∈ H ∀φ ∈ D(A) : hψ|Aφi = hχ|φi .
†
(8.14)

Definition 8.5. An operator A on a Hilbert space H is called self-adjoint or Hermitian

iff A = A† . Then
hψ|Aφi = hAψ|φi . (8.15)

Example 8.6.

• Let H = Cn . Then every operator A is bounded and correponds to a complex

n × n matrix Aij . The matrix of A† has entries (A† )ij = (Aji )∗ . Indeed, if we
define the matrix Bij by Bij = (Aji )∗ then we obtain, for any ψ = (ψ1 , . . . , ψn )

39
and φ = (φ1 , . . . , φn ),
n
X
hψ|Aφi = ψi∗ (Aφ)i (8.16)
i=1
XX
= ψi∗ Aij φj (8.17)
i j
XX
= (A∗ij ψi )∗ φj (8.18)
j i
X X ∗
= Bji ψi φj (8.19)
j i
X
= (Bψ)∗j φj (8.20)
j

= hBψ|φi . (8.21)
As a consequence, an operator A is self-adjoint iff Aij = A∗ji .
• A unitary operator is usually not self-adjoint.
• Let H = L2 (Rd ), and let A be a multiplication operator,
Aψ(x) = f (x) ψ(x) , (8.22)
such as the potential in the Hamiltonian or the position operators. Then A† is the
multiplication operator that multiplies by f ∗ . Indeed,
Z
hψ|Aφi = ψ(x)∗ f (x)φ(x) dx (8.23)
R d
Z
∗
= f ∗ (x) ψ(x) φ(x) dx (8.24)

= hf ∗ ψ|φi . (8.25)
(This calculation is rigorous if f is bounded. If it is not, them some discussion of
the domains of A and A† is needed.) Thus, A is self-adjoint iff f is real-valued.
• On H = L2 (Rd ), the momentum operators Pj = −i~ ∂x∂ j are self-adjoint with the
domain given by the first Sobolev space, i.e., the space of functions ψ L2 whose
Fourier transform ψb has the property that k 7→ |k| ψb is still square-integrable. The
relation (8.15) can easily be verified on nice functions using integration by parts:
Z
∂φ
hψ|Pj φi = ψ ∗ (x)(−i~) (x) dx (8.26)
∂xj
∂ψ ∗
Z
=− (x)(−i~)φ(x) dx (8.27)
∂xj
Z ∗
∂ψ
= −i~ (x) φ(x) dx (8.28)
∂xj
= hPj ψ|φi . (8.29)

40
• In H = L2 (Rd ), the Hamiltonian is self-adjoint for suitable potentials V on a
suitable domain. By formal calculation (leaving aside questions of domains), since
d
X 1 2
H= Pj + V , (8.30)
j=1
2m

8.3 The Spectral Theorem

Before we can formulate Born’s rule for arbitrary observables, we need to learn about
the spectral theorem.
Definition 8.7. If
Aψ = αψ , (8.37)
where α is a number and ψ ∈ H with ψ 6= 0, then ψ is called an eigenvector (or
eigenfunction) of A with eigenvalue α. The number α is called an eigenvalue of A iff
there exists ψ 6= 0 satisfying (8.37). The set of all eigenvalues is called the spectrum of
A.
If A is self-adjoint then all eigenvalues must be real. Indeed, if ψ is an eigenvector
of A with eigenvalue α, then
αhψ|ψi = hψ|αψi = hψ|Aψi = hAψ|ψi = hαψ|ψi = α∗ hψ|ψi , (8.38)
so α = α∗ or α ∈ R.
Theorem 8.8. (Spectral theorem) For every self-adjoint operator A in a Hilbert space
H there is a (generalized) orthonormal basis {φα,λ } consisting of eigenvectors of A,
Aφα,λ = αφα,λ . (8.39)
(φα,λ has two indices because for every eigenvalue α there may be several eigenvectors,
indexed by λ.)

41
An orthonormal basis (ONB) is a set {φn } elements of the Hilbert space H such
that (a) hφm |φn i = δmn and (b) every ψ ∈ H can be written as a linear combination of
the φn , X
ψ= cn φn . (8.40)
n

A “generalized” orthonormal basis allows a continuous variable k instead of n,

Z
ψ = dk ck φk , (8.41)

as we have encountered with Fourier transformation, where k = k ∈ Rd , ck = ψ(k),

b and

φk (x) = (2π)−d/2 eik·x . (8.42)

For a generalized ONB, we don’t require that the φk themselves be elements of H ;

e.g.,
P the φk of Fourier transformation are not square-integrable. We will often write a
sign even when we mean the integral over k. The precise definition of “generalized
ONB” is a unitary isomorphism U : H → L2 (Ω) with Ω the set of possible k-values
indexing the generalized ONB and U ψ(k) = ck . For example, for the generalized ONB
(8.42), U = F . A “non-generalized” ONB then corresponds to a unitary isomorphism
U : H → `2 = L2 (N).
The big payoff of the spectral theorem is that in this ONB, it is very easy to carry
out the operator A: If X
ψ= cα,λ φα,λ (8.43)
α,λ

then X
Aψ = α cα,λ φα,λ . (8.44)
α,λ

Put differently, in this ONB, A is a multiplication operator, multiplying by the function

f (k) = f (α, λ) = α. For example, in the Fourier basis (8.42), the momentum operator
Pj is multiplication by ~kj .
Put differently again, the matrix associated with the operator A in the ONB φα,λ is
a diagonal matrix. That is why one says that this ONB diagonalizes A.

Born’s rule for arbitrary observables. If we measure the observable A on a system

with wave function ψ then the outcome is random with probability distribution
2 2
X X
ρA (α) = hφα,λ |ψi = U ψ(α, λ) , (8.45)
λ λ

where φα,λ is an orthonormal basis diagonalizing A; ρA may mean either probability

density or just probability, depending on whether α is a discrete or continuous variable.

42
9 Spin
The phenomenon known as spin does not mean that the particle is spinning around its
axis, though it is in some ways similar. The simplest description of the phenomenon
is to say that the wave function of an electron (at time t) is actually not of the form
ψ : R3 → C but instead ψ : R3 → C2 . The space C2 is called spin-space and its elements
spinors (short for spin-vectors). We will in the following write S for spin-space.

9.1 Spinors and Pauli Matrices

Apart from being a 2-dimensional Hilbert space, spin space has the further property
that with every spinor is associated a vector in physical space R3 . This relation can be
expressed as a function
ω : S → R3 , (9.1)
given explicitly by
2 2 2
!
X X X
ω(φ) = φ∗r (σ1 )rs φs , φ∗r (σ2 )rs φs , φ∗r (σ3 )rs φs , (9.2)
r,s=1 r,s=1 r,s=1

where σi are the three Pauli matrices

0 1 0 −i 1 0
σ1 = , σ2 = , σ3 = . (9.3)
1 0 i 0 0 −1

Obviously, they are self-adjoint complex 2 × 2 matrices. It is common to write σ =

(σ1 , σ2 , σ3 ) for the vector of Pauli matrices. With this notation, and writing
2
X
φ∗ χ = (φs )∗ χs (9.4)
s=1

for the inner product in spin-space, Eq. (9.2) can be expressed more succinctly as

ω(φ) = φ∗ σφ . (9.5)

For example, the spinor φ = (1, 0) has ω(φ) = (0, 0, 1), which points in the +z-direction;
(1, 0) is therefore called a spin-up spinor. The spinor (0, 1) has ω(0, 1) = (0, 0, −1),
which points in the −z-direction; (0, 1) is therefore called a spin-down spinor. ω has
the properties
ω(zφ) = |z|2 ω(φ) (9.6)
and (homework problem)
|ω(φ)| = kφk2S = φ∗ φ , (9.7)
so unit spinors are associated with unit vectors.

43
Spinors have the curious property that if we rotate a spinor φ in spin-space through
an angle θ, with angles in Hilbert space defined by the relation

hφ|χi
cos θ = , (9.8)
kφkkχk

the corresponding direction ω(φ) in real space rotates through an angle 2θ. For example,
(0, 1) can be obtained from (1, 0) by rotating through 90◦ , while the corresponding vector
is rotated from the +z to the −z-direction, and thus through 180◦ . Expressed the other
way around, spinors rotate by half the angle of vectors. That is way one says that
electrons have spin one half. As a consequence, a rotation in real space by 360◦ will
correspond to one by 180◦ in spin space and carry φ to −φ, whereas a rotation in real
space by 720◦ will carry φ to itself.
There are also other types of spinors, other than spin- 12 : spin-1, spin- 32 , spin-2, spin-
5
2
, etc. The space of spin-s spinors has complex dimension 2s + 1, and the analogs of
the Pauli matrices are (2s + 1) × (2s + 1) matrices. In this context, wave functions
ψ : R3 → C are said to have spin 0. Electrons, quarks, and all known species of matter
particles have spin 12 ; the photon has spin 1; all known species of force particles have
integer spin; the only elementary particle species with spin 0 in the standard model of
particle physics is the Higgs particle or Higgs boson, which was experimentally confirmed
in 2012 at the Large Hadron Collider (LHC) of CERN in Geneva, Switzerland.

9.2 The Pauli Equation

When spin is taken into account, the Schrödinger equation reads a little differently. The
appropriate version is known as the Pauli equation. We will not study this equation in
detail; we write it down mainly for the sake of completeness:
∂ψ 1 2 ~
i~ = −i~∇ − A(x) ψ(x) − σ · B(x)ψ(x) + V (x)ψ(x) (9.9)
∂t 2m 2m
with B the magnetic field, V the electric and gravitational potential, A the magnetic
vector potential defined by the property

B = ∇ × A. (9.10)

(In words, B is the curl of A. The vector potential is, in fact, not uniquely defined by
this property, but different vector potentials satisfying (9.10) for the same magnetic field
can be translated into each other by gauge transformations, i.e., by different x-dependent
choices of the orthonormal basis in spin-space S.)
The Hilbert space of wave functions with spin is denoted L2 (R3 , C2 ) and contains
the square-integrable functions R3 → C2 . The inner product is
Z Z 2
X
∗
hψ|φi = 3
d x ψ (x) φ(x) = 3
dx ψs∗ (x) φs (x) . (9.11)
R3 R3 s=1

44
9.3 The Stern–Gerlach Experiment
Let us write
ψ1 (x)
ψ(x) = . (9.12)
ψ2 (x)
In the first half of a Stern–Gerlach experiment (first done in 1927 with silver atoms),
a wave packet moves through a magnetic field that is carefully designed so as to deflect
ψ1 (x) in a different direction than ψ2 (x), and thus to separate the two components
in space. Put differently, if the initial wave function ψ(t = 0) has support in the ball
Br (y) of radius r around the center y then the final wave function ψ(t = 1) (i.e., the
wave function after passing through the magnetic field) is such that ψ1 (x, t = 1) has
support in B+ := Br y + (1, 0, d) and ψ2 (x, t = 1) in B− := Br y + (1, 0, −d) with
deflection distance d > r (so that ψ1 and ψ2 do not overlap). The arrangement creating
this magnetic field is called a Stern–Gerlach magnet. In the second half of the Stern–
Gerlach experiment, one applies detectors to the regions B± . If the electron is found in
B+ then the outcome of the experiment is said to be up, if in B− then down.
A case of particular interest is that the initial wave function satisfies

ψs (x) = φs χ(x) , (9.13)

where φ ∈ S, kφkS = 1, and χ : R3 → C, kχk = 1. One says that for such a ψ, the spin
degree of freedom is disentangled from the spatial degrees of freedom. (Before, we have
considered many-particle wave functions for which some particles were disentangled from
others. We may also consider a single particle and say that the x variable is disentangled
from the y and z variables iff ψ(x, y, z) = f (x) g(y, z).)
In the case (9.13), the wave function after passing the magnet is

φ1 χ x − (1, 0, d)
, (9.14)
φ2 χ x − (1, 0, −d)

and it follows from the Born rule for position that the probability of outcome “up” is
|φ1 |2 and that of “down” is |φ2 |2 .
These probabilities agree with the general Born rule (8.45) for the observable A = σ3
on the Hilbert space H = S. The spinors φ+1 = (1, 0) and φ−1 = (0, 1) form an
orthonormal basis of S consisting of eigenvectors of σ3 (with eigenvalues +1 and −1,
respectively); φ plays the role of ψ in (8.45); its coefficients in the ONB referred to
in Eq. (8.45) are hφ+1 |ψi = φ1 and hφ−1 |ψi = φ2 . That is why the Stern–Gerlach
experiment is often called a “measurement of σ3 ”, or a “measurement of the z component
of spin.”
The Stern–Gerlach magnet can be rotated into any direction. For example, by
rotating by 90◦ around the x-axis (a rotation that will map the z-axis to the y-axis),
we obtain an arrangement that will deflect part of the initial wave packet ψ in the +y-
direction and another part in the −y-direction. However, these parts are not φ1 and φ2 .

45
Instead, they are the parts along a different ONB of S:
1 1
φ(+) = √ (1, i) and φ(−) = √ (1, −i) form an ONB of S with ω(φ(±) ) = (0, ±1, 0).
2 2
(9.15)
3 (+) (−)
That is, and ψ : R → S can be written as ψ(x) = c+ (x)φ + c− (x)φ , and these
two terms will get spatially separated (inRthe ±y direction, in fact). The probabilities
of outcomes “up” and “down” are then dx|c± (x)|2 . In the special case (9.13), the
probabilities are just |c± |2 , where φ = c+ φ(+) + c− φ(−) . Equivalently, the probabilities
are |hφ(±) |φi|2 . These values are in agreement with the general Born rule for A = σ2
because φ(±) are eigenvectors of σ2 with eigenvalues ±1.
Generally, if the Stern–Gerlach magnet is rotated from the z-direction to direction
n, where n is any unit vector in R3 , then the probabilities of its outcomes are governed
by the Born rule (8.45) for A = n · σ, which for any n is a self-adjoint 2 × 2 matrix with
eigenvalues ±1.

9.4 Bohmian Mechanics with Spin

John Bell figured out in 1966 how to do Bohmian mechanics for particles with spin.
It is surprisingly simple. Here is the single-particle version. Replace the Schrödinger
equation by the Pauli equation and Bohm’s equation of motion (6.1) by

dQ ~ ψ ∗ ∇ψ
= Im ∗ (t, Q(t)) . (9.16)
dt m ψ ψ
Recall that ψ ∗ ψ means the inner product in spin-space, so the denominator means

ψ ∗ (x)ψ(x) = |ψ1 (x)|2 + |ψ2 (x)|2 . (9.17)

Likewise, the numerator means

ψ ∗ (x)∇ψ(x) = ψ1∗ (x)∇ψ1 (x) + ψ2∗ (x)∇ψ2 (x) . (9.18)

The initial position Q(0) is assumed to be random with probability density

ρ(x) = |ψ(x)|2 := kψ(x)k2S = ψ ∗ (x)ψ(x) = |ψ1 (x)|2 + |ψ2 (x)|2 . (9.19)

It follows that Q(t) has probability density |ψt |2 at every t. This version of the
equivariance theorem can be obtained by a very similar computation as in the spinless
case, involving the following variant of the continuity equation:

∂|ψ(x, t)|2 ~
= −∇ · Im(ψ ∗ ∇ψ) . (9.20)
∂t m
As a consequence of the equivariance theorem, Bohmian mechanics leads to the
correct probabilities for the Stern–Gerlach experiment.

46
9.5 Is an Electron a Spinning Ball?
If it were then the following paradox would arise. According to classical electrodynamics
(which of course is well confirmed for macroscopic objects), a spinning, electrically
charged object behaves like a magnet in two ways: it creates its own magnetic field, and
it reacts to an external magnetic field. Just as the strength of the electric charge can be
expressed by a number, the charge e, the strength of the magnet can be expressed by
a vector, the magnetic dipole moment or just magnetic moment µ. Its direction points
from the south pole to the north pole, and its magnitude is the strength of the magnet.
The magnetic moment of a charge e spinning at angular frequency ω around the axis
along the unit vector u is, according to classical electrodynamics,
µ = γeωu , (9.21)
where the factor γ depends on the size and shape of the object. Furthermore, if such an
object flies through a Stern–Gerlach magnet oriented in direction n then, still according
to classical electrodynamics, it gets deflected by an amount proportional to µ · n. Put
differently, the Stern–Gerlach experiment for a classical object measures µz , or the
component of µ in the direction of n. The vector ωu is called the spin vector.
Where is the paradox? It is that different choices of n, when applied to objects
with the same µ, would lead to a continuous interval of deflections [−γ|e|ω, +γ|e|ω],
whereas the Stern–Gerlach experiment, for whichever choice of n, leads to a discrete set
{+d, −d} of two possible deflections.
The latter fact was called by Wolfgang Pauli the “non-classical two-valuedness of
spin.” This makes it hard to come up with a theory in which the outcome of a Stern–
Gerlach experiment has anything to do with a spinning motion. While Feynman went
too far when claiming that the double-slit experiment does not permit any deeper ex-
planation, it seems safe to say that the Stern–Gerlach experiment does not permit an
explanation in terms of spinning balls. Note also that Bohmian mechanics does not
involve any spinning motion to account for (what has come to be called) spin.

9.6 Many-Particle Systems

The wave function of N electrons is of the form
ψs1 ,s2 ,...,sN (x1 , x2 , . . . , xN ) , (9.22)
where each xj varies in R3 and each index sj in {1, 2}. Thus, at any configuration, ψ
N
has 2N complex components, or ψ : R3N → C2 . The Pauli equation then reads
N N
∂ψ 1 X 2 X ~
i~ = −i~∇k − A(xk ) ψ − σ (k) · B(xk )ψ + V ψ , (9.23)
∂t 2m k=1 k=1
2m

where σ (k) means σ acting on the index sk of ψ. In Bohm’s equation of motion (9.16),
replace Q ∈ R3 by Q ∈ R3N and sum over all spin indices sj whenever taking the spin
inner product φ∗ ψ.

47
9.7 Representations of SO(3)
A deeper understanding of spinors comes from group representations.12 Let us start
easily. Consider the wave function of a single particle. Suppose it were, instead of
a complex scalar field, a vector field, so ψ : R3 → R3 . Well, it should be complex,
so we complexify the vector field, ψ : R3 → C3 . Now rotate your coordinate system
according to R ∈ SO(3). Then in the new coordinates, the same physical wave function
is represented by a different mathematical function,

ψ̃(x) = Rψ(R−1 x) . (9.24)

Instead of real-valued potentials, the Schrödinger equation could then include matrix-
valued potentials, provided the matrices are always self-adjoint:

∂ψ ~2
i~ =− ∆ψ + V ψ . (9.25)
∂t 2m
Now consider another possibility: that the wave function is tensor-valued, ψab with
a, b = 1, 2, 3. Then in a rotated coordinate system,
3
X
ψ̃ab (x) = Rac Rbd ψcd (R−1 x) . (9.26)
c,d=1

What the two examples have in common is that the components of the wave function
get transformed as well according to the scheme, for ψ : R3 → Cd ,
d
X
ψ̃r (x) = Mrs (R) ψs (R−1 x) . (9.27)
s=1

The matrices M (R) satisfy the composition law

M (R1 ) M (R2 ) = M (R1 R2 ) and M (I) = I , (9.28)

which means that they form a representation of the group SO(3) of rotations—in other
words, a homomorphism from SO(3) to GL(Cd ), the “general linear group” comprising
all invertible operators on Cd . Further representations of SO(3) provide further possible
value spaces for wave functions ψ.
Spin space S for spin- 21 is almost of this kind, but there is one more complication:
SO(3) is represented, not by linear mappings S → S, but by mappings P (S) → P (S)
consistent with linear mappings, where P (S) is the set of all 1-dimensional subspaces
of S (called the projective space of S). This seems fitting as two wave functions that
differ only by a phase factor, φ(x) = eiθ ψ(x), are usually regarded as representing the
same physical state (they yield the same Born distribution, at all times and for all
12
More details about the topic of this section can be found in R. U. Sexl and H. K. Urbantke:
Relativity, Groups, Particles, Springer-Verlag (2001).

48
observables, and the same Bohmian trajectories for all times). That is, one can say that
a wave function is really an element of P (H ) rather than H because every normalized
element of Cψ is as good as ψ.
By a mapping F : P (S) → P (S) consistent with a linear mapping, I mean an F
such there is a linear mapping M : S → S with F (Cψ) = CM ψ. While M determines
F uniquely, F does not determine M , as zM with any z ∈ C \ {0} leads to the same F .
In particular, if we are given F (R) and want an M (R), then M (R) is always another
possible candidate. For spin- 21 , it turns out that while F (R1 ) F (R2 ) = F (R1 R2 ) as it
should, M (R) can at best be found in such a way that

M (R1 ) M (R2 ) = ±M (R1 R2 ) . (9.29)

This sign mismatch has something to do with the halved angles. The M are elements of
SU (2) (unitary with determinant 1), and with every element R of SO(3) are associated
two elements of SU (2) that differ by a sign.
This association can actually be regarded as a mapping

ϕ : SU (2) → SO(3) , M 7→ R . (9.30)

This mapping ϕ is a group homomorphism (i.e., ϕ(M1 )ϕ(M2 ) = ϕ(M1 M2 ) and ϕ(I) =
I), is smooth, two-to-one [ϕ(−M ) = ϕ(M )], and locally a diffeomorphism. The situation
is similar to the group homomorphism χ : R → U (1), θ 7→ eiθ , which is also smooth,
many-to-one, and locally a diffeomorphism; just like R is what you get from the circle
U (1) when you unfold it, SU (2) is what you get from SO(3) when you “unfold” it. (The
unfolding of a manifold Q is called the covering space Q, \ = SU (2).) For every
b so SO(3)
continuous curve γ in SO(3) starting in I, there is a unique continuous curve γ̂ in SU (2)
with ϕ ◦ γ̂ = γ, called the lift of γ. Thus, continuous rotations in R3 can be translated
uniquely into continuous rotations in S.
The upshot of all this is that spinors are one of the various types of mathematical
objects (besides vectors and tensors) that react to rotations in a well-defined way, and
that is why they qualify as possible values of a wave function.

49
10 The Projection Postulate
10.1 Notation
In the Dirac notation one writes |ψi for ψ. This may seem like a waste of symbols at
first, but often it is the opposite, as it allows us to replace a notation such as φ1 , φ2 , . . .
by |1i, |2i, . . .. Of course, a definition is needed for what |ni means, just as one would
be needed for φn . It is also convenient when using long subscripts, such as replacing
ψleft slit by |left sliti. In spin space S, one commonly writes

1 0
|z-upi = ↑ = , |z-downi = ↓ = (10.1)
0 1

1 1 1 1
|y-upi = √ , |y-downi = √ (10.2)
2 i 2 −i

1 1 1 1
|x-upi = √ , |x-downi = √ (10.3)
2 1 2 −1
(Compare to Eq. (9.15) and Exercise 11 in Assignment 4, and to Maudlin’s article.)
Furthermore, in the Dirac notation one writes hφ| for the mapping H → C given
by ψ 7→ hφ|ψi. Obviously, hφ| applied to |ψi gives hφ|ψi, which suggested the notation.
Paul Dirac called hφ| a bra and |ψi a ket. Obviously, hφ|A|ψi means the same as
hφ|Aψi. Dirac suggested that for self-adjoint A, the notation hφ|A|ψi conveys better
that A can be applied equally well to either φ or ψ. |φihφ| is an operator that maps ψ
to |φihφ|ψi = hφ|ψiφ. If φ is a unit vector then this is the part of ψ parallel to φ, or the
projection of ψ to φ.
Another common and useful notation is ⊗, called the tensor product. For
Ψ(x, y) = ψ(x) φ(y) (10.4)
one writes
Ψ = ψ ⊗ φ. (10.5)
Likewise, for Eq. (9.13) one writes ψ = φ ⊗ χ.
The symbol ⊗ has another meaning when applied to Hilbert spaces.
L2 (x, y) = L2 (x) ⊗ L2 (y) , (10.6)
where L2 (x) means the square-integrable functions of x, etc. Likewise, when we replace
the continuous variable y by the discrete index s for spin, the tensor product of the
Hilbert space C2 of vectors φs and the Hilbert space L2 (R3 , C) of wave functions χ(x)
is the Hilbert space L2 (R3 , C2 ) of wave functions ψs (x):
C2 ⊗ L2 (R3 , C) = L2 (R3 , C2 ) . (10.7)
Another notation we use is
f (t−) = lim f (s) , f (t+) = lim f (s) (10.8)
s%t s&t

for the left and right limits of a function f at a jump.

50
10.2 The Projection Postulate
Here is the last rule of the quantum formalism:

Projection postulate. If we measure the observable A at time t on a system with

wave function ψt− and obtain the outcome α then the system’s wave function ψt+ right
after the measurement is the eigenfunction of A with eigenvalue α. If there are several
mutually orthogonal eigenfunctions, then
X
ψt+ = C |φα,λ ihφα,λ |ψt− i , (10.9)
λ

where C > 0 is the normalizing constant.

P R
If λ is a continuous variable, then λ should be dλ. The value of C is, explicitly,
−1
X
C= |φα,λ ihφα,λ |ψt− i . (10.10)
λ

10.3 Projection and Eigenspace

To get a better feeling for what the expression on the RHS of (10.9) means, consider a
vector ψ = ψt− and an ONB φn = φα,λ , and expand ψ in that basis:
X
ψ= cn φn . (10.11)
n

The coefficients are then given by

cm = hφm |ψi (10.12)
because
D X E X X
hφm |ψi = φm cn φn = cn hφm |φn i = cn δmn = cm . (10.13)
n n n

Now change ψ by replacing some of the coefficients cn by zero while retaining the others
unchanged: X
ψ̃ = cn φn , (10.14)
n∈J

where J is the set of those indices retained. This procedure is called projection to the
subspace spanned by {φn : n ∈ J}, and the projection operator is
X
P = |φn ihφn | . (10.15)
n∈J

(The only projections we consider are orthogonal projections.) An operator P is a

projection iff it is self-adjoint [P = P † ] and idempotent [P 2 = P ]; equivalently, iff it is
self-adjoint and the spectrum (set of generalized eigenvalues) is {0, 1}.

51
In Eq. (10.9), the index n numbers the index pairs (α, λ), and the subset J corre-
sponds to those pairs that have a given α and arbitrary λ. Except for the factor C,
the RHS of (10.9) is the corresponding projection of ψt− , which gives the projection
postulate its name. The subspace of Hilbert space spanned by the φα,λ with given α
is the eigenspace of A with eigenvalue α, which is the set of all eigenvectors of A with
eigenvalue α (together with the zero vector).
For every closed subspace, there is a projection operator that projects to this sub-
space. For example, for any region B ⊆ R3N in configuration space, the functions whose
support lies in B (i.e., which vanish outside B) form an ∞-dimensional closed subspace
of L2 (R3N ). The projection to this subspace is
(
ψ(q) q ∈ B
(PB ψ)(q) = (10.16)
0 q∈
/ B,

that is, multiplication by the characteristic function 1B of B.

10.4 Remarks
According to the projection postulate (also known as the measurement postulate or
the collapse postulate), the wave function changes dramatically in a measurement. The
change is known as the reduction of the wave packet or the collapse of the wave function.
For example, in a spin-z (or σ3 -) measurement, the wave function before the mea-
surement is an arbitrary spinor (φ1 , φ2 ) ∈ S with |φ1 |2 + |φ2 |2 = 1 (assuming Eq. (9.13)
and ignoring the space dependence). With probability |φ1 |2 , we obtain outcome “up”
and the collapsed spinor (φ1 /|φ1 |, 0) after the measurement. The term φ1 /|φ1 | is just
the phase of φ1 . With probability |φ2 |2 , we obtain “down” and the collapsed spinor
(0, φ2 /|φ2 |).
With the projection postulate, the formalism provides a prediction of probabilities
for any sequence of measurements. If we prepare the initial wave function ψ0 and make
a measurement of A1 at time t1 then the Schrödinger equation determines what ψt1 −
is, the general Born rule (8.45) determines the probabilities of the outcome α1 , and the
projection postulate the wave function after the measurement. The latter is the initial
wave function for the Schrödinger equation, which governs the evolution of ψ until the
time t2 at which the second measurement, of observable A2 , occurs. The probability
distribution of the outcome α2 is given by the Born rule again and depends on α1 because
the initial wave function in the Schrödinger equation, ψt1 + , did. And so on. This scheme
is the quantum formalism. Note that the observer can choose t2 and A2 after the first
measurement and thus make this choice depend on the first outcome α1 .
The projection postulate implies that if we make another measurement of A right
after the first one, we will with probability 1 obtain the same outcome α.
For a position measurement, the projection postulate implies that the wave function
collapses to a delta function. This is not realistic, it is over-idealized. A delta function
is not a square-integrable function, and it contains in a sense an infinite amount of

52
energy. More realistically, a position measurement has a finite inaccuracy ε and could
be expected to collapse the wave function to one of width ε, such as
(x−α)2
ψt+ (x) = Ce− 4ε2 ψt− (x) . (10.17)

However, this operator (multiplication by a Gaussian) is not a projection because its

spectrum is more than just 0 and 1.
Another simple model of position measurement, still highly idealized but less so than
collapse to δ(x − α), considers a region B ⊂ R3 and assumes that a detector either finds
the particle in B or not. The corresponding observable is A = PB as defined in (10.16),
and the probability of outcome 1 is
Z
d3 x |ψt− (x)|2 . (10.18)
B

In case of outcome 1, ψt− collapses to

PB ψt−
ψt+ = . (10.19)
kPB ψt− k

You may feel a sense of paradox about the two different laws for how ψ changes with
time: the unitary Schrödinger evolution and the collapse rule. Already at first sight,
the two seem rather incompatible: the former is deterministic, the latter stochastic; the
former is continuous, the latter not; the former is linear, the latter not. It seems strange
that time evolution is governed not by a single law but by two. And even stranger that
the criterion for when the collapse rule takes over is something as vague as an observer
making a measurement. Upon scrutiny, the sense of paradox will persist and even deepen
in the form of what is known as the measurement problem of quantum mechanics.

53
11 The Measurement Problem
11.1 What the Problem Is
This is a problem about orthodox quantum mechanics. It is solved in Bohmian mechan-
ics and several other theories. Because of this problem, some regard the orthodox view
as incoherent when it comes to analyzing the process of measurement.
Consider a “quantum measurement of the observable A.” Realistically, there are
only finitely many possible outcomes, so A should have finite spectrum. Consider the
system formed by the object together with the apparatus. Since the apparatus consists
of electrons and quarks, too, it should itself be governed by quantum mechanics. (That
is reductionism at work.) So I write Ψ for the wave function of the system (object
and apparatus). Suppose for simplicity that the system is isolated (i.e., there is no
interaction with the rest of the universe), so Ψ evolves according to the Schrödinger
equation during the experiment (recall Exercise 13 of Assignment 3), which begins (say)
at t1 and ends at t2 . It is reasonable to assume that

Ψ(t1 ) = ψ(t1 ) ⊗ φ (11.1)

with ψ = ψ(t1 ) the wave function of the object before the experiment and φ a wave
function representing a “ready” state of the apparatus. By the spectral theorem, ψ can
be written as a linear combination (superposition) of eigenfunctions of A,
X
ψ= cα ψα with Aψα = αψα and kψα k = 1 . (11.2)
α

If the object’s wave function is an eigenfunction ψα , then, by Born’s rule (8.45), the
outcome is certain to be α. Set Ψα (t1 ) = ψα ⊗ φ. Then Ψα (t2 ) must represent a state
in which the apparatus displays the outcome α.
Now consider again a general ψ as in Eq. (11.2). Since the Schrödinger equation is
linear, the wave function of object and apparatus together at t2 is
X
Ψ(t2 ) = cα Ψα (t2 ) , (11.3)
α

a superposition of states corresponding to different outcomes—and not a random state

corresponding to a unique outcome, as one might have expected from the projection
postulate. This is the measurement problem. The upshot is that there is a conflict
between the following assumptions:

• In each run of the experiment, there is a unique outcome.

• The wave function is a complete description of a system’s physical state.

• The evolution of the wave function of an isolated system is always given by the
Schrödinger equation.

54
Thus, we have to drop one of these assumptions. The first is dropped in the many-
worlds picture, in which all outcomes are realized, albeit in parallel worlds. If we drop
the second, we opt for additional variables as in Bohmian mechanics, where the state
at time t is described by the pair (Qt , ψt ). If we drop the third, we opt for replacing
the Schrödinger equation by a non-linear evolution (as in the GRW = Ghirardi–Rimini–
Weber approach). Of course, a theory might also drop several of these assumptions.
Orthodox quantum mechanics insists on all three assumptions, and that is why it has a
problem.
We took for granted that the system was isolated and had a wave function. We may
wonder whether that was asking too much. However, we could just take the system to
consist of the entire universe, so it is disentangled and isolated for sure. More basically,
if we cannot solve the measurement problem for an isolated system with a wave function
then we have no chance of solving it for a system entangled with outside particles.

11.2 How Bohmian Mechanics Solves the Problem

Since it is assumed that the Schrödinger equation is valid for a closed system, the after-
measurement wave function of object and apparatus together is
X
Ψ= cα Ψα . (11.4)
α

Since the Ψα have disjoint supports in the configuration space (of object and apparatus
together), and since the particle configuration Q has distribution |Ψ|2 , the probability
that Q lies in the support of Ψα is
Z Z
3N 2
d3Nq |cα Ψα (q)|2 = |cα |2 , (11.5)

P Q ∈ support(Ψα ) = d q |Ψ(q)| =
support(Ψα ) R3N

which agrees with the prediction of the quantum formalism for the probability of the
outcome α. And indeed, when Q ∈ support(Ψα ), then the particle positions (including
the particles of both the object and the apparatus!) are such that the pointer of the
apparatus points to the value α. Thus, the way out of the measurement problem is
that although the wave function is a superposition of terms corresponding to different
outcomes, the actual particle positions define the actual outcome.
As a consequence of the above consideration, we also see that the predictions of
Bohmian mechanics for the probabilities of the outcomes of experiments agree with
those of standard quantum mechanics. In particular, there is no experiment that could
empirically distinguish between Bohmian mechanics and standard quantum mechanics,
while there are (in principle) experiments that distinguish the two from a GRW world.
If Bohmian mechanics and standard quantum mechanics agree about all probabili-
ties, then where do we find the collapse of the wave function in Bohmian mechanics?
There are two answers, depending on which wave function we are talking about. The
first answer is, if the Ψα are macroscopically different then they will never overlap again

55
10
(until the time when the universe reaches thermal equilibrium, perhaps in 1010 years);
this fact is called decoherence. If Q lies in the support of one among several disjoint
packets then only the packet containing Q is relevant, by Bohm’s law of motion (6.1),
to determining dQ/dt. Thus, as long as the packets stay disjoint, only the packet con-
taining Q is relevant to the trajectories of the particles, and all other packets could be
replaced by zero without affecting the trajectories. That is why we can replace Ψ by
cα Ψα , with α the actual outcome. Furthermore, the factor cα cancels out in Bohm’s law
of motion (6.1) and thus can be dropped as well.
The second answer is, the quantum formalism does not, in fact, talk about the wave
function Ψ of object and apparatus but about the wave function ψ of the object alone.
This leads us to the question what is meant by the wave function of a subsystem. If

Ψ(x, y) = ψ(x)φ(y) (11.6)

then it is appropriate to call ψ the wave function of the x-system, but in general Ψ does
not factorize as in (11.6). In Bohmian mechanics, a natural general definition for the
wave function of a subsystem is the conditional wave function

ψ(x) = N Ψ(x, Y ) , (11.7)

where Y is the actual configuration of the y-system (while x is not the actual configu-
ration X but any configuration of the x-system) and
Z −1/2
2
N = |Ψ(x, Y )| dx (11.8)

is the normalizing factor. The conditional wave function does not, in general, evolve
according to a Schrödinger equation, but in a complicated way depending on Ψ, Y ,
and X. There are special situations in which the conditional wave function does evolve
according to a Schrödinger equation, in particular when the x-system and the y-system
do not interact and the wave packet in Ψ containing Q = (X, Y ) is of a product form such
as (11.6). Indeed, this is the case for the object before, but not during the measurement;
as a consequence, the wave function of the object (i.e., its conditional wave function)
evolves according to the Schrödinger equation before, but not during the measurement—
in agreement with the quantum formalism. To determine the conditional wave function
after the quantum measurement, suppose that Ψα is of the form

Ψα = ψα ⊗ φα (11.9)

with φα a wave function of the apparatus with the pointer pointing to the value α.
Let α be the actual outcome, i.e., Q ∈ support(Ψα ). Then Y ∈ support(φα ) and the
conditional wave function is indeed

ψ = ψα . (11.10)

56
11.3 Schrödinger’s Cat
Often referred to in the literature, this is Schrödinger’s13 1935 formulation of the mea-
surement problem:

One can even set up quite ridiculous cases. A cat is penned up in a steel
chamber, along with the following diabolical device (which must be secured
against direct interference by the cat): in a Geiger counter there is a tiny
bit of radioactive substance, so small, that perhaps in the course of one hour
one of the atoms decays, but also, with equal probability, perhaps none; if it
happens, the counter tube discharges and through a relay releases a hammer
which shatters a small flask of hydrocyanic acid. If one has left this entire
system to itself for an hour, one would say that the cat still lives if meanwhile
no atom has decayed. The first atomic decay would have poisoned it. The
ψ-function of the entire system would express this by having in it the living
and dead cat (pardon the expression) mixed or smeared out in equal parts.
It is typical of these cases that an indeterminacy originally restricted to the
atomic domain becomes transformed into macroscopic indeterminacy, which
can then be resolved by direct observation. That prevents us from so naively
accepting as valid a “blurred model” for representing reality. In itself it
would not embody anything unclear or contradictory. There is a difference
between a shaky or out-of-focus photograph and a snapshot of clouds and
fog banks.

11.4 Positivism and Realism

Positivism is the view that a statement which cannot be tested in experiment is meaning-
less or unscientific. For example, the statement in Bohmian mechanics that an electron
went through the upper slit of a double-slit if and only if it arrived in the upper half
of the screen, cannot be tested in experiment. After all, if you try to check which slit
the electron went through by detecting every electron at the slit then the statement is
no longer true in Bohmian mechanics (and in fact, no correlation with the location of
arrival is found). So a positivist thinks that Bohmian mechanics is unscientific. Good
statements for a positivist are operational statements, i.e., statements of the form “if we
set up an experiment in this way, the outcome has such-and-such a probability distri-
bution.” Positivists think that the quantum formalism (thought of as a summary of all
true operational statements of quantum mechanics) is the only scientific formulation of
quantum mechanics. They also tend to think that ψ is the complete description of a
13
From E. Schrödinger: Die gegenwärtige Situation in der Quantenmechanik, Naturwissenschaften
23: 807–812, 823–828, 844–849 (1935). English translation by J. D. Trimmer: The Present Situa-
tion in Quantum Mechanics, Proceedings of the American Philosophical Society 124: 323–338 (1980).
Reprinted in J. A. Wheeler, W. H. Zurek (ed.s): Quantum Theory and Measurement, Princeton Uni-
versity Press (1983), pages 152–167.

57
system, as it is the only information about the system that can be found experimentally
without disturbing ψ. They tend not to take the measurement problem seriously.
Realism is the view that a fundamental physical theory is meaningless unless it
provides a coherent story of what happens. Bohmian mechanics, GRW theory, and
many-worlds are examples of realist theories. For a realist, the quantum formalism
by itself does not qualify as a fundamental physical theory. The story provided by
Bohmian mechanics, for example, is that particles have trajectories, that there is a
physical object that is mathematically represented by the wave function, and that the
two evolve according to certain equations. For a realist, the measurement problem is
serious and can only be solved by denying one of the 3 conflicting premises.

58
12 The GRW Theory
Bohmian mechanics is not the only possible explanation of quantum mechanics. Another
one is provided by the GRW theory, named after GianCarlo Ghirardi, Alberto Rimini,
and Tullio Weber, who proposed it in 1986. A similar theory, CSL (for continuous
spontaneous localization), was proposed by Philip Pearle in 1989. In both theories, Ψt
does not evolve according to the Schrödinger equation, but according to a modified
evolution law. This evolution law is stochastic, as opposed to deterministic. That is, for
any fixed Ψ0 , it is random what Ψt is, and the theory provides a probability distribution
over Hilbert space. A family of random variables Xt , with one variable for every time t,
is called a stochastic process. Thus, the family (Ψt )t>0 is a stochastic process in Hilbert
space. We leave CSL aside and focus on the GRW process. In it, periods governed by
the Schrödinger equation are interrupted by random jumps. Such a jump occurs, within
any infinitesimal time interval dt, with probability λ dt, where λ is a constant called
the jump rate. Let us call the random jump times T1 , T2 , . . .; the sequence T1 , T2 , . . . is
known as the Poisson process with rate λ; it has widespread applications in probability
theory. Let us have a closer look.

12.1 The Poisson Process

Think of T1 , T2 , . . . as the times at which a certain type of random event occurs; standard
examples include the times when an earthquake (of a certain strength) occurs, or when
the phone rings, or when the price of a certain share falls below a certain value. We
take for granted that the ordering is chosen such that 0 < T1 < T2 < . . ..
Let us figure out the probability density function of T1 . The probability that T1
occurs between 0 and dt is λ dt. Thus, the probability that it does not occur is 1 − λ dt.
Suppose that it did not occur between 0 and dt. Then the probability that it doesn’t
occur between dt and 2 dt is again 1 − λ dt. Thus, the total probability that no event
occurs between 0 and 2 dt is (1−λ dt)2 . Proceeding in the same way, the total probability
that no event occurs between 0 and n dt is (1 − λ dt)n . Thus, the total probability that
no event occurs between 0 and t, P(T1 > t), can be approximated by setting dt = t/n
and letting n → ∞. That is,
n
λt
P(T1 > t) = lim 1 − = e−λt . (12.1)
n→∞ n

Let us write ρ(t) for the probability density function of T1 . By definition,

ρ(t) dt = P(t < T1 < t + dt) . (12.2)

To compute this quantity, we reason as follows. If T1 has not occured until t, then the
probability that it will occur within the next dt is λ dt. Thus, (12.2) differs from (12.1)
by a factor λ dt, or, as the factor dt cancels out,

ρ(t) = 1t>0 e−λt λ , (12.3)

59
where the expression 1C is 1 whenever the condition C is satisfied, and 0 otherwise. The
distribution (12.3) is known as the exponential distribution with parameter λ, Exp(λ).
We have thus found that the waiting time for the first event has distribution Exp(λ).
After T1 , the next dt has again probability λ dt for the next event to occur. The
above reasoning can be repeated, with the upshot that the waiting time T2 − T1 for
the next event has distribution Exp(λ) and is independent of what happened up to time
T1 . The same applies to the other waiting times Tn+1 − Tn . In fact, at any time t0 the
waiting time until the next event has distribution Exp(λ).
The exponential distribution has expectation value
Z ∞
1
t ρ(t) dt = . (12.4)
0 λ
This fact is very plausible if you think of it this way: If in every second the probability of
an earthquake is, say, 10−8 , then you would guess that an earthquake occurs on average
every 108 seconds. The constant λ, whose dimension is 1/time, is thus the average
frequency of the earthquakes (or whichever events).
Another way of representing the Poisson process is by means of the random variables

Xt = #{i ∈ N : Ti < t} , (12.5)

the number of earthquakes up to time t.

Theorem 12.1. If the earthquakes in Australia are governed by a Poisson process with
rate λ1 and the earthquakes in Africa are governed by a Poisson process with rate λ2 , and
the earthquakes in the two places are independent of each other, then the earthquakes in
Africa and Australia together are governed by a Poisson process with rate λ1 + λ2 .

Theorem 12.2. If we choose n points at random in the interval [0, n/λ], independently
with uniform distribution, then the joint distribution of these points converges, as n →
∞, to the Poisson process with parameter λ.

12.2 Definition of the GRW Process

Now let us get back to the definition of the GRW process. To begin with, set the particle
number N = 1, so that Ψt : R3 → C. The random events are, instead of earthquakes,
spontaneous collapses of the wave function. That is, suppose that the random variables
T1 , T2 , T3 , . . ., are governed by a Poisson process with parameter λ; suppose that between
Tk−1 and Tk , the wave function Ψt evolves according to the Schrödinger equation (where
T0 = 0); at every Tk , the wave function changes discontinuously (“collapses”) as if an
outside observer made an unsharp position measurement with inaccuracy σ > 0. I will
give the formula below.
The constants λ and σ are thought of as new constants of nature, for which GRW
suggested the values
λ ≈ 10−16 sec−1 , σ ≈ 10−7 m . (12.6)

60
Alternatively, Steven Adler suggested

λ ≈ 3 × 10−8 sec−1 , σ ≈ 10−6 m . (12.7)

This completes the definition of the GRW process for N = 1.

Now consider arbitrary N ∈ N, and let Ψ0 be (what is normally called) an N -particle
wave function Ψ0 = Ψ0 (x1 , . . . , xN ). Consider N independent Poisson processes with
rate λ, Ti,1 , Ti,2 , . . . for every i ∈ {1, . . . , N }. Let T1 be the smallest of all these random
times, T2 the second smallest etc., and let I1 be the index associated with T1 and I2 the
index associated with T2 etc. Equivalently, T1 , T2 , . . . is a Poisson process with rate N λ,
and along with every Tk we choose a random index Ik from {1, . . . , N } with uniform
distribution (i.e., each i has probability 1/N ), independently of each other and of the Tk .
Equivalently, a collapse with index i occurs with rate λ for each i ∈ {1, . . . , N }. Between
Tk−1 and Tk , Ψt evolves according to the Schrödinger equation. At Tk , Ψ changes as
if an observer outside of the system14 made an unsharp position measurement with
inaccuracy σ on particle number Ik .

12.3 Definition of the GRW Process in Formulas

Let us begin with N = 1.
C(X k )ΨTk −
ΨTk + = , (12.8)
kC(X k )ΨTk − k
where the collapse operator C(X) is a multiplication operator multiplying by the square
root of a 3-d Gaussian function centered at X:
q
C(X)Ψ(x) = gX,σ (x) Ψ(x) (12.9)

with
1 2 2
gX,σ (x) = 2 3/2
e−(X−x) /2σ . (12.10)
(2πσ )
The point X k ∈ R3 is chosen at random with probability density

ρ(X k = y|T1 , . . . , Tk , X 1 , . . . , X k−1 ) = kC(y)ΨTk − k2 , (12.11)

where ρ(· · · | · · · ) means the probability density, given the values of T1 , . . . , Tk , X 1 , . . . , X k−1 .
The right hand side of (12.11) is indeed a probability density because it is nonnegative
and
Z Z Z Z
d y ρ(X k = y| · · · ) = d y kC(y)Ψk = d y d3 x |C(y)Ψ(x)|2 = (12.12)
3 3 2 3

Z Z Z
3 3 2
= dx d y gy,σ (x) |Ψ(x)| = d3 x |Ψ(x)|2 = 1 . (12.13)

14
Or rather, outside of the universe, as the idea is that the entire universe is governed by GRW
theory.

61
For arbitrary N ∈ N and Ψt = Ψt (x1 , . . . , xN ),
CIk (X k )ΨTk −
ΨTk + = (12.14)
kCIk (X k )ΨTk − k
where the collapse operator CI (X) is the following multiplication operator:
q
CI (X)Ψ(x1 , . . . , xN ) = gX,σ (xI ) Ψ(x1 , . . . , xN ) . (12.15)

The random point X k is chosen at random with probability density

ρ(X k = y|T1 , . . . , Tk , I1 , . . . , Ik , X 1 , . . . , X k−1 ) = kCIk (y)ΨTk − k2 . (12.16)

This completes the definition of the GRW process. But not yet the definition of the
GRW theory.

12.4 Primitive Ontology

There is a further law in GRW theory, concerning matter in 3-space. There are two
different versions of this law and, accordingly, two different versions of the GRW theory,
abbreviated as GRWm (m for matter density ontology) and GRWf (f for flash ontology).
For comparison, in Bohmian mechanics the matter in 3-space consists of the particles
(with trajectories).

In GRWm it is a law that, at every time t, matter is continuously distributed in

space with density function m(x, t) for every location x ∈ R3 , given by
N Z
2
X
m(x, t) = mi d3 x1 · · · d3 xN δ 3 (xi − x) ψt (x1 , . . . , xN ) . (12.17)
i=1
R3N

In words, one starts with the |ψ|2 –distribution in configuration space R3N , then obtains
the marginal distribution of the i-th degree of freedom xi ∈ R3 by integrating out all
other variables xj , j 6= i, multiplies by the mass associated with xi , and sums over i.

In GRWf it is a law that matter consists of material points in space-time called

flashes. That is, matter is neither made of particles following world lines, nor of a
continuous distribution of matter such as in GRWm, but rather of discrete points in
space-time. According to GRWf, the space-time locations of the flashes can be read off
from the history of the wave function: every flash corresponds to one of the spontaneous
collapses of the wave function, and its space-time location is just the space-time location
of that collapse. The flashes form the set

F = {(X 1 , T1 , I1 ), . . . , (X k , Tk , Ik ), . . .} . (12.18)

Note that if the number N of the degrees of freedom in the wave function is large,
as in the case of a macroscopic object, the number of flashes is also large (if λ = 10−16

62
s−1 and N = 1023 , we obtain 107 flashes per second). Therefore, for a reasonable choice
of the parameters of the GRWf theory, a cubic centimeter of solid matter contains more
than 107 flashes per second. That is to say that large numbers of flashes can form
macroscopic shapes, such as tables and chairs. “A piece of matter then is a galaxy of
[flashes].” (Bell, page 205) That is how we find an image of our world in GRWf.

A few remarks. The m function of GRWm and the flashes of GRWf are called the
primitive ontology of the theory. Ontology means what exists according to a theory; for
example, in Bohmian mechanics ψ and Q, in GRWm ψ and m, in GRWf ψ and F . The
“primitive” ontology is the part of the ontology representing matter in 3-d space (or 4-d
space-time): Q in Bohmian mechanics, m in GRWm, and F in GRWf.
It may be seem that a continuous distribution of matter should conflict with the
evidence for the existence of atoms, electrons and quarks, and should thus make wrong
predictions. We will see below why that is not the case—why GRWm makes nearly the
same predictions as the quantum formalism.

12.5 The GRW Solution to the Measurement Problem

We will now look at why the GRW process succeeds in solving the measurement problem,
specifically in collapsing macroscopic (but not microscopic) superpositions, and why the
deviations from quantum mechanics are in a sense small.
First, the collapses are supposed to occur spontaneously, just at random, without
the intervention of an outside observer, indeed without any physical cause described by
the theory; GRW is a stochastic theory. Let us look at the number of collapses. The
average waiting time between two collapses is 1/N λ. For a single particle, N = 1, this
time is ≈ 1016 sec ≈ 108 years. That is, for a single particle the wave function collapses
only every 100 million years. So we should not expect to see any of these spontaneous
collapses when doing an experiment with a single particle, or even with hundreds of
particles. If, however, we consider a macroscopic system, consisting perhaps of 1023
particles, then the average waiting time is 10−7 sec, so we have a rather dense shower of
collapses.
A collapse amounts to multiplication by a Gaussian with width σ ≈ 10−7 m, which
is large on the atomic scale (recall that the size of an atom is about one Angstrom =
10−10 m) but small on the macroscopic scale. So, if an electron is in a superposition of
being in Paris and being in Tokyo, and if the center X of the collapse lies in Paris,
then the collapse operator has the effect of damping the wave function in Tokyo (which
is roughly 107 m away from Paris) by a factor of exp(1028 ). Thus, after the collapse,
the wave function in Tokyo is very near zero. On the other hand, if a collapse hits an
electron in a bound state in an atom, the collapse will not much affect the electron’s
wave function.
Let us examine the probability distribution ρ(X = y) of the center X of a collapse.
For a one-particle wave function Ψ, it is essentially |Ψ|2 ; more precisely, it is the quantum
distribution |Ψ|2 convolved with gσ , that is, smeared out (or blurred, or coarse-grained)
over a distance σ that is smaller than the macroscopic scale. For an N -particle wave

63
function Ψ, ρ(X = y) is essentially the marginal of |Ψ|2 connected to the xI -variable,
i.e., the distribution on 3-space obtained from the |Ψ|2 distribution on 3N -space by
integrating out 3N − 3 variables. (More precisely, smeared over width σ.) Thus, again,
on the macroscopic scale, the distribution of X is the same as the quantum mechanical
probability distribution for the position of the I-th particle.
A wave function like the one we encountered in the measurement problem,
X
Ψ= cα Ψα , (12.19)
α

where Ψα is a wave function corresponding to the pointer pointing to the value α, would
behave in the following way. Assuming the pointer contains 1023 particles, then every
10−7 sec a collapse would occur connected to one of the pointer particles. Since Ψα is
concentrated in a region in configuration space where all of the pointer particles are at
some location y α , and assuming that the y α are sufficiently distant for different values of
α (namely much more than σ), a single collapse connected to any of the pointer particles
will suffice for essentially removing all contributions Ψα except one. Indeed, suppose
the collapse is connected to the particle xi , which is one of the pointer particles. Then
the random center X of the collapse will be distributed according to a coarse-grained
version of the i-th marginal of |Ψ|2 ; since the separation between the y α is greater than
σ, we can neglect the coarse graining, and we can just take the i-th marginal of the
|Ψ|2 distribution. Thus, X will be close to one of the y α , and the probability that
X is close to y α0 is |cα |2 . Then, the multiplication by a Gaussian centered at X will
shrink all other packets Ψα by big factors, of the order exp(−(y α −y α0 )2 /2σ 2 ), effectively
collapsing them away.
Thus, within a fraction of a second, a superposition such as (12.19) would decay into
one of the packets Ψα (times a normalization factor), and indeed into Ψα0 with proba-
bility |cα |2 , the same probability as attributed by quantum mechanics to the outcome
α0 .
Let us make explicit how GRW succeeded in setting up the laws in such a way
that they are effectively different laws for microscopic and macroscopic objects: (i) We
realize that a few collapses (or even a single collapse) acting on a few (or one) of the
pointer particles will collapse the entire wave function Ψ of object and apparatus together
to essentially just one of the contributions Ψα . (ii) The frequency of the collapses
is proportional to the number of particles (which serves as a quantitative measure of
“being macroscopic”). (iii) We can’t ensure that microscopic systems experience no
collapses at all, but we can ensure the collapses are Pvery infrequent. (iv) We can’t
ensure that macroscopic superpositions such as Ψ = cα Ψα collapse immediately, but
we can ensure they collapse within a fraction of a second.

12.6 Empirical Tests

I have pointed out why GRW theory leads to essentially the same probabilities as pre-
scribed by the quantum formalism. Yet, it is obvious that there are some experiments

64
104 104

100 ERR 100 ERR

10−4 10−4

10−8 10−8
Adler Adler
−12 −12
10 10
λ [s ]

λ [s ]
GRW GRW
−1

−1
10−16 10−16

10−20 10−20
10−24 10−24
10−28 10−28
PUR PUR
10−32 10−32

10−36 10−36

10−12 10−8 10−4 100 10−12 10−8 10−4 100

(a) σ [m] (b) σ [m]

Figure 3: Parameter diagram (log-log-scale) of the GRW theory with the primitive on-
tology given by (a) flashes, (b) the matter density function. ERR = empirically refuted
region as of 2012 (equal in (a) and (b)), PUR = philosophically unsatisfactory region.
GRW’s and Adler’s choice of parameters are marked. Figure taken from W. Feldmann
and R. Tumulka: Parameter Diagrams of the GRW and CSL Theories of Wave Func-
tion Collapse. Journal of Physics A: Mathematical and Theoretical 45: 065304 (2012)
http://arxiv.org/abs/1109.6579

for which GRW theory predicts different outcomes than the quantum formalism. Here
is an example. GRW theory predicts that if we keep a particle isolated it will sponta-
neously collapse after about 100 million years, and quantum mechanics predicts it will
not collapse. So let’s take 104 electrons, for each of them prepare its wave function to
be a superposition of a packet in Paris and a packet in Tokyo; let’s keep each electron
isolated for 100 million years; according to GRW, a fraction of
Z 1/λ Z 1
−λt
λe dt = e−s ds = 1 − e−1 = 63.2% (12.20)
0 0

of the 104 wave functions will have collapsed; according to quantum mechanics, none
will have collapsed; now let’s bring the packets from Paris and Tokyo together, let
them overlap and observe the interference pattern; according to quantum mechanics, we
should observe a clear interference patterns; if all of the wave functions had collapsed, we
should observe no interference pattern at all; according to GRW, we should observe only

65
a faint interference pattern, damped (relative to the quantum prediction) by a factor
of e. Ten thousand points should be enough to decide whether the damping factor is
there or not. This example illustrates two things: that in principle GRW makes different
predictions, and that in practice these differences may be difficult to observe (because
of the need to wait for 100 million years, and because of the difficulty with keeping the
electrons isolated for a long time, in particular avoiding decoherence).
Another testable consequence of the GRW process is universal warming. Since the
GRW collapse usually makes wave packets narrower, their Fourier transforms (momen-
tum representation) become wider, by the Heisenberg uncertainty relation. As a ten-
dency, this leads to a long-run increase in energy. This effect amounts to a spontaneous
warming at a rate of the order of 10−15 K per year.
No empirical test of GRW theory against the quantum formalism can presently be
carried out, but experimental techniques are progressing; see Figure 3. Adler’s pa-
rameters have in the meantime been empirically refuted as a byproduct of the LIGO
experiment that detects gravitational waves. A test of GRW’s parameters seems fea-
sible using a planned interferometer on a satellite in outer space. Interferometers are
disturbed by the presence of air, temperatures far from absolute zero, vibrations of the
apparatus, and the presence of gravity; that is why being in outer space is an advantage
for an interferometer and allows for heavier objects shot through the double slit and
longer flight times. Such an interferometer is being considered by the European Space
Agency ESA and may be up and running in 2025.

12.7 The Need for a Primitive Ontology

Primitive ontology is a subtle philosophical topic.
We may wonder whether, instead of GRWf or GRWm, we could assume that only ψ
exists, and no primitive ontology; let us call this view GRW∅. To illustrate the difference
between GRWf/GRWm and GRW∅, let me make up a creation myth (as a metaphorical
way of speaking): Suppose God wants to create a universe governed by GRW theory.
He creates a wave function ψ of the universe that starts out as a particular ψ0 that
he chose and evolves stochastically according to a particular version of the GRW time
evolution law. According to GRW∅, God is now done. According to GRWf or GRWm,
however, a second act of creation is necessary, in which he creates the matter, i.e., either
the flashes or continously distributed matter with density m, in both cases coupled to
ψ by the appropriate laws.
There are several motivations for considering GRW∅. First, it seems more parsimo-
nious than GRWm or GRWf. Second, it was part of the motivation behind GRW theory
to avoid introducing an ontology in additon to ψ. In fact, much of the motivation came
from the measurement problem, which requires that we either modify the Schrödinger
equation or introduce additional ontology (such as Q in Bohmian mechanics), and GRW
theory was intended to choose the first option, not the second.
Furthermore, there is a sense in which GRW∅ clearly works: The GRW wave function
ψt is, at almost all times, concentrated, except for tiny tails, on a set of configurations

66
that are macroscopically equivalent to each other. So we can read off from the post-
measurement wave function, e.g., what the actual outcome of a quantum measurement
was.
On the other hand, there is a logical gap between saying

“ψ is the wave function of a live cat” (12.21)

and saying
“there is a live cat.” (12.22)
After all, in Bohmian mechanics, (12.22) follows from (12.21) by virtue of a law of the
theory, which asserts that the configuration Q(t) is |ψt |2 distributed at every time t.
Thus, Bohmian mechanics suggests that (12.22) would not follow from (12.21) if there
was not a law connecting the two by means of the primitive ontology. If that is so, then
it does not follow in GRW∅ either. Another indication in this direction is the fact that
the region “PUR” in Figure 3 depends on the primitive ontology we consider, GRWf or
GRWm.
Other aspects of the question whether GRW∅ is a satisfactory theory have to do
with a number of paradoxes that arise in GRW∅ but evaporate in GRWf and GRWm.15
For the sake of simplicity, I will focus on GRWm and leave aside GRWf.

Paradox: Here is a reason one might think that the GRW theory fails to solve
the measurement problem. Consider a quantum state like Schrödinger’s cat, namely a
superposition
ψ = c1 ψ1 + c2 ψ2 (12.23)
of two macroscopically distinct states ψi with kψ1 k = 1 = kψ2 k, such that both contri-
butions have nonzero coefficients ci . Given that there is a problem—the measurement
problem—in the case in which the coefficients are equal, one should also think that there
is a problem in the case in which the coefficients are not exactly equal, but roughly of
the same size. One might say that the reason there is a problem is that, according to
quantum mechanics, there is a superposition whereas according to our intuition there
should be a definite state. But then it is hard to see how this problem should go away
just because c2 is much smaller than c1 . How small would c2 have to be for the problem
to disappear? No matter if c2 = c1 or c2 = c1 /100 or c2 = 10−100 c1 , in each case both
contributions are there. But the only relevant effect of the GRW process replacing the
unitary evolution, as far as Schrödinger’s cat is concerned, is to randomly make one of
the coefficients much smaller than the other (although it also affects the shape of the
suppressed contribution).
Answer: From the point of view of GRWm, the reasoning misses the primitive
ontology. Yes, the wave function is still a superposition, but the definite facts that our
intuition wants can be found in the primitive ontology. The cat is made of m, not of
15
The following discussion is adapted from R. Tumulka: Paradoxes and Primitive Ontology in Col-
lapse Theories of Quantum Mechanics. Pages 139–159 in S. Gao (editor), Collapse of the Wave Func-
tion, Cambridge University Press (2018) https://arxiv.org/abs/1102.5767.

67
ψ. If ψ is close to |deadi, then m equals m|deadi up to a small perturbation, and that
can reasonably be accepted as the m function of a dead cat. While the wave function
is a superposition of two packets ψ1 , ψ2 that correspond to two very different kinds
of (particle) configurations in ordinary QM or Bohmian mechanics, there is only one
configuration of the matter density m—the definite fact that our intuition wants.

Paradox: As a variant of the first paradox, one might say that even after the GRW
collapses have pushed |c1 |2 near 1 and |c2 |2 near 0 in the state vector (12.23), there is
still a positive probability |c2 |2 that if we make a quantum measurement of the macro-
state—of whether the cat is dead or alive—we will find the state ψ2 , even though the
GRW state vector has collapsed to a state vector near ψ1 , a state vector that might be
taken to indicate that the cat is really dead (assuming ψ1 = |deadi). Thus, it seems not
justified to say that, when ψ is close to |deadi, the cat is really dead.
Answer: In GRWm, what we mean when saying that the cat is dead is that the m
function looks and behaves like a dead cat. In orthodox QM, one might mean instead
that a quantum measurement of the macro-state would yield |deadi with probability 1.
These two meanings are not exactly equivalent in GRWm: that is because, if m ≈ m|deadi
(so we should say that the cat is dead) and if ψ is close but not exactly equal to |deadi,
then there is still a tiny but non-zero probability that within the next millisecond the
collapses occur in such a way that the cat is suddenly alive! But that does not contradict
the claim that a millisecond before the cat was dead; it only means that GRWm allows
resurrections to occur—with tiny probability! In particular, if we observe the cat after
that millisecond, there is a positive probability that we find it alive (simply because it
is alive) even though before the millisecond it actually was dead.

Paradox: Let ψ1 be the state “the marble is inside the box” and ψ2 the state
“the marble is outside the box”; these wave functions have disjoint supports S1 , S2 in
configuration space (i.e., wherever one is nonzero the other is zero). Let ψ be given
by (12.23) with 0 < |c2 |2 |c1 |2 < 1; finally, consider a system of n (non-interacting)
marbles at time t0 , each with wave function ψ, so that the wave function of the system
is ψ ⊗n . Then for each of the marbles, we would feel entitled to say that it is inside the
box, but on the other hand, the probability that all marbles be found inside the box is
|c1 |2n , which can be made arbitrarily small by making n sufficiently large.
Answer: According to the m function, each of the marbles is inside the box at the
initial time t0 . However, it is known that a superposition like (12.23) of macroscopically
distinct states ψi will approach under the GRW evolution either a wave function ψ1 (∞)
concentrated in S1 or another ψ2 (∞) in S2 with probabilities |c1 |2 and |c2 |2 , respectively.
(Here I am assuming H = 0 for simplicity. Although both coefficients will still be nonzero
after any finite number of collapses, one of them will tend to zero in the limit t → ∞.)
Thus, for large n the wave function will approach one consisting of approximately n|c1 |2
factors ψ1 (∞) and n|c2 |2 factors ψ2 (∞), so that ultimately about n|c1 |2 of the marbles
will be inside and about n|c2 |2 outside the box—independently of whether anybody
observes them or not. The occurrence of some factors ψ2 (∞) at a later time provides

68
another example of the resurrection-type events mentioned earlier; they are unlikely but
do occur, of course, if we make n large enough.
The act of observation plays no role in the argument and can be taken to merely
record pre-existing macroscopic facts. To be sure, the physical interaction involved
in the act of observation may have an effect on the system, such as speeding up the
evolution from ψ towards either ψ1 (∞) or ψ2 (∞); but GRWm provides unambiguous
facts about the marbles also in the absence of observers.

69
13 The Copenhagen Interpretation
A very influential view, almost synonymous with the orthodox view of quantum me-
chanics, is the Copenhagen interpretation (CI), named after the research group headed
by Niels Bohr, who was the director of the Institute for Theoretical Physics at the Uni-
versity of Copenhagen, Denmark. Further famous defenders of this view and members
of Bohr’s group (temporarily also working in Copenhagen) include Werner Heisenberg,
Wolfgang Pauli, and Leon Rosenfeld. Bohr and Einstein were antagonists in a debate
about the foundations of quantum mechanics that began around 1925 and continued
until Einstein’s death in 1955. In Feynman’s text you have already seen an exposition
of (parts of) the orthodox view. Here is a description of the main elements of CI.

13.1 Two Realms

In CI, the world is separated into two realms: macroscopic and microscopic. In the
macroscopic realm, there are no superpositions. Pointers always point in definite direc-
tions. The macroscopic realm is described by the classical positions and momenta of
objects. In the microscopic realm, there are no definite facts. For example, an electron
does not have a definite position. The microscopic realm is described by wave functions.
One could say that the primitive ontology of CI consists of the macroscopic matter
(described by its classical positions and momenta). In CI terminology, the macroscopic
realm is called classical and the microscopic realm quantum.16 Instead of classical and
quantum, Bell called them speakable and unspeakable. (The macroscopic realm hosts
the objects with definite properties, of which one can speak. You may have gotten the
sense that Bell is not a supporter of the idea of two separate realms.)
The microscopic realm, when isolated, is governed by the Schrödinger equation.
The macroscopic realm, when isolated, is governed by classical mechanics. The two
realms interact whenever a measurement is made; then the macro realm records the
measurement outcome, and the micro realm undergoes a collapse of the wave function.
I see a number of problems with the concept of two separate realms.

• It is not precisely defined where the border between micro and macro lies. That
lies in the nature of the word “macroscopic.” Clearly, an atom is micro and a
table is macro, but what is the exact number of particles required for an object
to be “macroscopic”? The vagueness inherent in the concept of “macroscopic” is
unproblematical in Bohmian mechanics, GRW theory, or classical mechanics, but
it is problematical here because it is involved in the formulation of the laws of
nature. Laws of nature should not be vague.
16
This is a somewhat unfortunate terminology because the word classical suggests not only definite
positions but also particular laws (say, Newton’s equation of motion) which may actually not apply.
The word quantum is somewhat unfortunate as well because in a reductionist view, all laws (also those
governing macroscopic objects) should be consequences of the quantum laws applying to the individual
electrons, quarks, etc.

70
• Likewise, what counts as a measurement and what does not? This ambiguity is
unproblematical when we only want to compute the probabilities of outcomes of
a given experiment because it will not affect the computed probabilities. But an
ambiguity is problematical when it enters the laws of nature.

• The special role played by measurements in the laws according to CI is also implau-
sible and artificial. Even if a precise definition of what counts as a measurement
were given, it would not seem believable that during measurement other laws than
normal are in place.

• The separation of the two realms, without the formulation of laws that apply to
both, is against reductionism. If we think that macro objects are made out of
micro objects, then the separation is problematical.

13.2 Positivism
CI leans towards positivism. In the words of Werner Heisenberg (1958):

“We can no longer speak of the behavior of the particle independently of the
process of observation.”

Feynman (1959) did not like that:

“Does this mean that my observations become real only when I observe an
observer observing something as it happens? This is a horrible viewpoint.
Do you seriously entertain the thought that without observer there is no
reality? Which observer? Any observer? Is a fly an observer? Is a star an
observer? Was there no reality before 109 B.C. before life began? Or are
you the observer? Then there is no reality to the world after you are dead?
I know a number of otherwise respectable physicists who have bought life
insurance.”

13.3 Impossibility of Non-Paradoxical Theories

Another traditional part of CI is the claim that it is impossible to provide any coherent
(non-paradoxical) realist theory of what happens in the micro realm. Heisenberg (1958)
again:

“The idea of an objective real world whose smallest parts exist objectively
in the same sense as stones or trees exist, independently of whether or not
we observe them [...], is impossible.”

We know from Bohmian mechanics that this claim is, in fact, wrong.

71
13.4 Completeness of the Wave Function
In CI, a microscopic system is completely described by its wave function. That is, there
are no further variables (such as Bohm’s particle positions) whose values nature knows
and we do not. For this reason, the wave function is also called the quantum state or
the state vector.

13.5 Language of Measurement

CI introduced (and established) the words “measurement” and “observable,” and em-
phasized the analogy suggested by these words: E.g., that the momentum operator is
analogous to the momentum variable in classical mechanics, and that the spin observable
σ = (σ1 , σ2 , σ3 ) is analogous to the spin vector of classical mechanics (which points along
the axis of spinning, and whose magnitude is proportional to the angular frequency).
I have already mentioned that these two words are quite inappropriate because they
suggest that there was a value of the observable A that was merely discovered (i.e.,
made known to us) in the experiment, whereas in fact the outcome is often only created
during the experiment. Think, for example, of a Stern–Gerlach experiment in Bohmian
mechanics: The particle does not have a value of z-spin before we carry out the experi-
ment. And in CI, since it insists that wave functions are complete, it is true in spades
that A does not have a pre-existing, well-defined value before the experiment. So this
terminology is even less appropriate in CI—and yet, it is a cornerstone of CI! Well, CI
leans towards paradoxes.

13.6 Complementarity
Another idea of CI, called complementarity, is that in the micro realm, reality is para-
doxical (contradictory) but the contradictions can never be seen (and are therefore not
problematical) because of the Heisenberg uncertainty relation. (Recall Feynman’s dis-
cussion of how the uncertainty relation keeps some things invisible.) Here is Bohr’s
definition of complementarity:

“Any given application of classical concepts precludes the simultaneous use of

other classical concepts which in a different connection are equally necessary
for the elucidation of the phenomena.”

I would describe the idea as follows. In order to compute a quantity of interest (e.g.,
the wave length of light scattered off an electron), we use both Theory A (e.g., classical
theory of billiard balls) and Theory B (e.g., classical theory of waves) although A and
B contradict each other.17 It is impossible to find one Theory C that replaces both A
17
In fact, before 1926 many successful theoretical considerations for predicting the results of exper-
iments proceeded in this way. For example, people made a calculation about the collision between an
electron and a photon as if they were classical billiard balls, then converted the momenta into wave
lengths using de Broglie’s relation p = ~k, then made another calculation about waves with wave
number k.

72
and B and explains the entire physical process. (Here we meet again the impossibility
claim mentioned in Section 13.3.) Instead, we should leave the conflict between A and
B unresolved and accept the idea that reality is paradoxical.
Bell (Speakable and Unspeakable in Quantum Mechanics, page 190) wrote the fol-
lowing about complementarity:
“It seems to me that Bohr used this word with the reverse of its usual
meaning. Consider for example the elephant. From the front she is head,
trunk and two legs. From the back she is bottom, tail, and two legs. From
the sides she is otherwise, and from the top and bottom different again.
These various views are complementary in the usual sense of the word. They
supplement one another, they are consistent with one another, and they are
all entailed by the unifying concept ‘elephant.’ It is my impression that to
suppose Bohr used the word ‘complementary’ in this ordinary way would
have been regarded by him as missing his point and trivializing his thought.
He seems to insist rather that we must use in our analysis elements which
contradict one another, which do not add up to, or derive from, a whole. By
‘complementarity’ he meant, it seems to me, the reverse: contradictoriness.”
Einstein (1949):
“Despite much effort which I have expended on it, I have been unable to
achieve a sharp formulation of Bohr’s principle of complementarity.”
Bell commented (1986):
“What hope then for the rest of us?”
Another version of complementarity concerns observables that cannot be simultane-
ously measured. We have encountered this situation in a homework exercise. Compare
two experiments, each consisting of two measurements: (a) first measure σ2 and then σ3 ,
(b) first measure σ3 and then σ2 . We have seen that the joint probability distribution
of the outcomes depends on the order. Some observables, though, can be measured
simultaneously, i.e., the joint distribution does not depend on the order. Examples: X2
and X3 , the y-component of position and the z-component; or σ2 of particle 1 and σ3 of
particle 2.
Theorem 13.1. The observables A and B can be simultaneously measured (i.e., for
every wave function the joint probability distribution of the outcomes is independent of
the order of the two measurements) iff the operators A and B commute, AB = BA.
Theorem 13.2. (An extension of the spectral theorem) Iff A and B commute, then
there exists an ONB {φn } whose elements are eigenvectors of both operators A and B,
Aφn = αn φn and Bφn = βn φn .
Example 13.3.
0 i 0 −i
σ2 σ3 = , σ3 σ2 = . (13.1)
i 0 −i 0

73
Any two multiplication operators commute. In particular, the position operators Xi ,
Xj commute with each other. The momentum operators Pj = −i~∂/∂xj commute with
each other. Xi commutes with Pj for i 6= j, but
[Xj , Pj ] = i~I , (13.2)
with I the identity operator. Eq. (13.2) is called Heisenberg’s canonical commutation
relation. To verify it, it suffices to consider a function ψ of a 1-dimensional variable x.
Using the product rule,
[X, P ]ψ(x) = XP ψ(x) − P Xψ(x) (13.3)
∂ψ ∂
= x(−i~) − (−i~) xψ(x) (13.4)
∂x ∂x
∂ψ ∂ψ
= −i~x + i~ψ(x) + i~x (13.5)
∂x ∂x
= i~ψ(x) . (13.6)

So, for two commuting observables, the quantum formalism provides a joint proba-
bility distribution. For non-commuting observables, it does not. That is, it provides two
joint probability distributions, one for each order, but that means it does not provide
an unambiguous joint probability distribution. Moreover,
two non-commuting observables typically do not both have sharp values
(13.7)
at the same time.
Also this fact is often called complementarity. For example, there is no quantum state
that is an eigenvector to both σ2 and σ3 . In CI, this fact is understood as a paradox-
ical trait of the micro-realm that we are forced to accept. That this paradoxical trait
is connected to non-commutativity fits nicely with the analogy between operators in
quantum mechanics and quantities in classical mechanics (as described in Section 13.5):
In classical mechanics, which is free of paradoxes, all physical quantities (e.g., positions,
momenta, spin vectors) are just numbers and therefore commute.
As a further consequence of (13.7), a measurement of B must disturb the value of
A if AB 6= BA. (Think of the exercise in which |z-upi underwent a σ2 - and then a σ3 -
measurement: After the σ2 -measurement, the particle was not certain any more to yield
“up” in the σ3 -measurement.) Also the Heisenberg uncertainty relation is connected to
(13.7), as it expresses that position and momentum cannot both have sharp values (i.e.,
σX = 0 and σP = 0) at the same time. In fact, the following generalized version of
Heisenberg’s uncertainty relation applies to observables A and B instead of X and P :
Theorem 13.4. (Robertson–Schrödinger inequality)18 For any bounded self-adjoint op-
erators A, B and any ψ ∈ H with kψk = 1,
1
σA σB ≥ hψ|[A, B]|ψi . (13.8)
2
18
H.P. Robertson: The Uncertainty Principle. Physical Review 34: 163–164 (1929)
E. Schrödinger: Zum Heisenbergschen Unschärfeprinzip. Sitzungsberichte der Preussischen Akademie
der Wissenschaften, physikalisch-mathematische Klasse 14: 296–303 (1930)

74
Note that the inequality is so much the stronger as the commutator [A, B] is bigger,
and becomes vacuous when [A, B] = 0.
Proof. Recall that the distribution over the spectrum of A defined by ψ has expectation
value hAi := hψ|A|ψi and variance

σA2 = hψ|(A − hAi)2 |ψi = kφA k2 (13.9)

with
φA := (A − hAi)ψ , (13.10)
where we simply wrote hAi for hAiI. By the Cauchy-Schwarz inequality,
2
σA2 σB2 = kφA k2 kφB k2 ≥ hφA |φB i . (13.11)

Since

hφA |φB i = hψ|(A − hAi)(B − hBi)|ψi (13.12)

= hψ|(AB − hAiB − AhBi + hAihBi)|ψi (13.13)
= hABi − hAihBi , (13.14)

13.7 Reactions to the Measurement Problem

While Bohmian mechanics, GRW theory, and many-worlds theories have clear answers
to the measurement problem, this is not so with Copenhagen. I report some answers
that I heard Copenhagenists give (with some comments in brackets); I must admit that
I do not see how these answers would make the problem go away.

• Nobody can actually solve the Schrödinger equation for 1023 interacting particles.
(Sure, and we do not need to. If Ψα looks like a state including
P a pointer pointing
to α then we know by linearity that Ψt1 evolves to Ψt2 = cα Ψα , a superposition
of macroscopically different states.)

75
• Systems are never isolated. (If we cannot solve the problem for an isolated system,
what hope can we have to treat a non-isolated one? The way you usually treat a
non-isolated system is by regarding it as a subsystem of a bigger, isolated system,
maybe the entire universe.)

• Maybe there is no wave function of the universe. (It is up to Copenhagenists

to propose a formulation that applies to the entire universe. Bohm, GRW, and
many-worlds can do that.)

• Who knows whether the initial wave function is really a product as in Ψt1 = ψ ⊗ φ.
(It is not so important that it is precisely a product, but it is important that we
could perform a quantum measurement on any ψ.)

• The collapse of the wave function is like the collapse of a probability distribution:
as soon as I have more information, such as X ∈ B, I have to update my probability
distribution ρt− for X accordingly, namely to

ρt+ (x) = 1x∈B ρt− (x) . (13.19)

(The parallel is indeed striking. However, if we insist that the wave function is
complete, then there never is any new information, as there is nothing that we are
ignorant of.)
P
• Decoherence makes sure that you can replace the superposition Ψ = cα Ψα by
a mixture [i.e., a random one of the Ψα ]. (A super-observer cannot distinguish
between the superposition and the mixture, but we are asking whether in reality
it is a superposition or a mixture.)

76
14 Many Worlds
Put very briefly, Everett’s many-worlds theory is GRW∅ with λ = 0, and Schrödinger’s
many-worlds theory is GRWm with λ = 0.
The motivation for the many-worlds view comes from the wave function (11.3) of
object and apparatus together after a quantum measurement. It is a superposition of
macroscopically different terms. If we insist that the Schrödinger equation is correct
(and thus reject non-linear modifications such as GRW), and if we insist that the wave
function is complete, then we must conclude that there are different parts of reality,
each looking like our world but with a different measurement outcome, and without
any interaction between the different parts. They are parallel worlds. This view was
suggested by Hugh Everett III in 1957.19
Everett’s is not the only many-worlds theory, though. It is less well known that also
Schrödinger had a many-worlds theory in 1926, and it is useful to compare the two.20
Schrödinger, however, did not realize that his proposal was a many-worlds theory. He
thought of it as a single-world theory. He came to the conclusion that it was empirically
inadequate and abandoned it. Let us first try to get a good understanding of this theory.

14.1 Schrödinger’s Many-Worlds Theory

According to Schrödinger’s 1926 theory, matter is distributed continuously in space with
density
N Z
2
X
m(x, t) = mi d3 x1 · · · d3 xN δ 3 (xi − x) ψt (x1 , . . . , xN ) , (14.1)
i=1
R3N

and ψt evolves according to the Schrödinger equation. The equation for m is exactly
the same as in GRWm, except that ψ is not the same wave function. (Actually, Schrö-
dinger replaced the mass factor mi by the electric charge ei , but this difference is not
crucial. It amounts to a different choice of weights in the weighted average over i. In
fact, Schrödinger’s choice has the disadvantage that the different signs of charges will
lead to partial cancellations and thus to an m function that looks less plausible as the
density of matter. Nevertheless, the two choices turn out to be empirically equivalent,
i.e., lead to the same predictions.)
19
H. Everett: The Theory of the Universal Wavefunction. Ph. D. thesis, Department of Physics,
Princeton University (1955). Reprinted on page 3–140 in B. DeWitt and R.N. Graham (editors): The
Many-Worlds Interpretation of Quantum Mechanics. Princeton: University Press (1973)
H. Everett: Relative State Formulation of Quantum Mechanics. Reviews of Modern Physics 29:
454–462 (1957)
20
E. Schrödinger: Quantisierung als Eigenwertproblem (Vierte Mitteilung). Annalen der Physik 81:
109–139 (1926). English translation by J.F. Shearer in E. Schrödinger: Collected Papers on Wave
Mechanics. New York: Chelsea (1927).
See also V. Allori, S. Goldstein, R. Tumulka, and N. Zanghı̀: Many-Worlds and Schrödinger’s First
Quantum Theory. British Journal for the Philosophy of Science 62(1): 1–27 (2011) http://arxiv.
org/abs/0903.2211

77
In analogy to GRWm, we may call this theory Sm (where S is for the Schrödin-
ger equation). Consider a double-slit experiment in this theory. Before the arrival at
the detection screen, the contribution to the m function coming from the electron sent
through the double slit (which is the only contribution in the region of space between
the double-slit and the detection screen) is a lump of matter smeared out over rather
large distances (as large as the interference pattern). This lump is not homogeneous, it
has interference fringes. And the overall amount of matter in this lump is tiny: If you
integrate m(x, t) over x in the region between the double-slit and the detection screen,
the result is 10−30 kg, the mass of an electron. But focus now on the fact that the
matter is spread out. Schrödinger incorrectly thought that this fact must lead to the
wrong prediction that the entire detection screen should glow faintly instead of yielding
one bright spot, and that was why he thought Sm was empirically inadequate.
To understand why this reasoning was incorrect, consider a post-measurement situ-
ation (e.g., Schrödinger’s
P cat). The wave function is a superposition of macroscopically
different terms, Ψ = α cα Ψα . The Ψα do not overlap; i.e., where one Ψα is significantly
nonzero, the others are near zero. Thus, when we compute |Ψ|2 there are no (significant)
cross terms; that is, for each q there is at most one α contributing, so

|Ψ(q)|2 = |cα |2 |Ψα (q)|2 . (14.2)

Define mα (x) as what m would be according to (14.1) with ψ = Ψα . Then we obtain

(to an excellent degree of approximation)
X
m(x) = |cα |2 mα (x) . (14.3)
α

In words, the m function is a linear combination of m functions corresponding to the

macroscopically different terms in Ψ. So, for Schrödinger’s cat in Sm, there is a dead
cat and there is a live cat, each with half the mass. However, they do not notice they
have only half the mass, and they do not notice the presence of the other cat. That is
because, if we let the time evolve, then each Ψα (t) evolves in a way that corresponds
to a reasonable story of just one cat; after all, it is how the wave function would evolve
according to the projection postulate after a measurement P of the cat had collapsed the
superposition to one of the Ψα . Furthermore, Ψ(t) = α cα Ψα (t) by linearity, and since
the Ψα (t) remain non-overlapping, we have that (14.3) applies to every t from now on,
that is X
m(x, t) = |cα |2 mα (x, t) . (14.4)
α

Each mα (t) looks like the reasonable story of just one cat that Ψα (t) corresponds to.
Thus, the two cats do not interact with each other; they are causally disconnected. After
all, the two contributions mα come from Ψα that are normally thought of as alternative
outcomes of the experiment. So the two cats are like ghosts to each other: they can see
and walk through each other.
And not just the cat has split in two. If a camera takes a photograph of the cat
then Ψ must be taken to be a wave function of the cat and the camera together (among

78
other things). Ψ1 may then correspond to a dead cat and a photo of a dead cat, Ψ2 to
a live cat and a photo of a live cat. If a human being interacts with the cat (say, looks
at it), then Ψ1 will correspond to a brain state of seeing a dead cat and Ψ2 to one of
seeing a live cat. That is, there are two copies of the cat, two copies of the photo, two
copies of the human being, two copies of the entire world. That is why I said that Sm
has a many-worlds character. In each world, though, things seem rather ordinary: Like
a single cat in an ordinary (though possibly pitiful) state, and all records and memories
are consistent with each other and in agreement with the state of the cat.

14.2 Everett’s Many-Worlds Theory

Everett’s many-worlds theory, which could be called S∅ (S for the Schrödinger equation
and ∅ for the empty primitive ontology) is based on the idea that the same picture would
arise if we dispense with the m function. Frankly, I do not see how it would; I actually
cannot make sense of S∅ as a physical theory. But I would say the more basic problem is
not how to obtain probabilities, but how to obtain things such as cats, chairs, pointers.
The primitive ontology is missing. And that problem is solved in Sm. Note, though,
that for a person who believes that S∅ makes sense, this theory would seem like the
simplest possible coherent theory that would account for quantum mechanics. To such
a person it would seem that the existence of many worlds is a necessary consequence of
the Schrödinger
P equation, which, after all, leads to macroscopic superpositions such as
the Ψ = α cα Ψα above. In contrast, a person who believes that S∅ does not make sense
while Sm does, will not have such a sense of necessity, as the many-worlds character
of the theory does not come from Ψ but from m, and if we had postulated a different
primitive ontology (say, Bohmian particles instead of (14.1)), then no many-worlds
character would have arisen.
While there is disagreement in the literature about the relevance of a primitive
ontology, many authors argue that S∅ has a preferred basis problem: If there exists
nothing more than Ψ, and if Ψ is just a vector in Hilbert space H , then how do we
know which basis to choose in H to obtain the different worlds? For example, if

Ψ= √1 |deadi + √1 |alivei , (14.5)

2 2

then we could also write

79
14.3 Bell’s First Many-Worlds Theory
Bell also made a proposal (first formulated in 1971, published21 in 1981) adding a prim-
itive ontology to Everett’s S∅; Bell did not seriously propose or defend the resulting
theory, he just regarded it as an ontological clarification of Everett’s theory. According
to this theory, at every time t there exists an uncountably infinite collection of universes,
each of which consists of N material points in Euclidean 3-space. Thus, each world has
its own configuration Q, but some configurations are more frequent in the ensemble of
worlds than others, with |Ψt |2 distribution across the ensemble. At every other time t0 ,
there is again an infinite collection of worlds, but there is no fact about which world at
t0 is the same as which world at t.

14.4 Bell’s Second Many-Worlds Theory

Another variant of this theory, considered by Bell in 1976,22 supposes that there is really
a single world at every time t consisting of N material points in Euclidean 3-space. The
configuration Qt chosen with |Ψt |2 distribution indepedently at every time. Although
this theory has a definite Qt at every t, it also has a many-worlds character because in
every arbitrarily short time interval, configurations from all over configuration space are
realized, in fact with distribution roughly equal to |Ψt |2 (if the interval is short enough
and Ψt depends continuously on t) across the ensemble of worlds existing at different
times. This theory seems rather implausible compared to Bohmian mechanics, as it
implies that our memories are completely wrong: after all, it implies that one minute
ago the world was not at all like what we remember it to be like a minute ago. Given that
all of our reasons for believing in the Schrödinger equation and the Born rule are based
on memories of reported outcomes of experiments, it seems that this theory undercuts
itself: if we believe it is true then we should conclude that our belief is not justified.
It is not very clear to me whether the same objection applies to Bell’s first many-
worlds theory. But certainly, both theories have, due to their radically unusual idea of
what reality is like, a flavor of skeptical scenarios (such as the brain in the vat), in fact
a stronger such flavor than Sm.

14.5 Probabilities in Many-World Theories

Maudlin expressed in his article on the measurement problem a rather negative opinion
about many-worlds theories; I think a bit too negative. His objection was, if every
outcome α of an experiment is realized, what could it mean to say that outcome α has
probability |cα |2 to occur? If, as in Sm and in S∅, all the equations are deterministic,
21
J.S. Bell: Quantum Mechanics for Cosmologists. Pages 611–637 in C. Isham, R. Penrose and
D. Sciama (editors), Quantum Gravity 2, Oxford: Clarendon Press (1981). Reprinted as chapter 15 of
J.S. Bell: Speakable and Unspeakable in Quantum Mechanics. Cambridge: University Press (1987)
22
J.S. Bell: The Measurement Theory of Everett and de Broglie’s Pilot Wave. Pages 11–17 in M. Flato
et al. (editors): Quantum Mechanics, Determinism, Causality, and Particles, Dordrecht: Reidel (1976).
Reprinted as chapter 11 of J.S. Bell: Speakable and Unspeakable in Quantum Mechanics. Cambridge:
University Press (1987)

80
then there is nothing random; and in the situation of the measurement problem, there
is nothing that we are ignorant of. So what could talk of probability mean?
Here is what it could mean in Sm: Suppose we have a way of counting worlds.
And suppose we repeat a quantum experiment (say, a Stern–Gerlach experiment with
|cup |2 = |cdown |2 = 1/2) many times (say, a thousand times). Then we obtain in each
world a sequence of 1000 ups and downs such as

↑↓↑↑↓↑↓↓↓ . . . . (14.8)

Note that there are 21000 ≈ 10300 such sequences. The statement that the fraction of
ups lies between 47% and 53% is true in some worlds and false in others. Now count
the worlds in which the statement is true. Suppose that the statement is true in the
overwhelming majority of worlds. Then that would explain why we find ourselves in
such a world. And that, in turn, would explain why we observe a relative frequency
of ups of about 50%. And that is what we needed to explain for justifying the use of
probabilities.
Now consider |cup |2 = 1/3, |cdown |2 = 2/3. Then the argument might seem to break
down, because it is then still true that in the overwhelming majority of sequences such
as (14.8) the frequency of ups is about 50%. But consider the following

Rule for counting worlds. The “fractionPof worlds” f (P ) with property P in the
splitting given by Ψ = α cα Ψα and m(x) = α |cα |2 mα (x) is
P

X
f (P ) = |cα |2 , (14.9)
α∈M

where M is the set of worlds α with property P .

Note that f (P ) lies between 0 and 1 because α |cα |2 = 1. It is not so clear whether
P
this rule makes sense—whether there is room in physics for such a law. But let us
accept it for the moment and see what follows. Consider the property P that the
relative frequency of ups lies between 30% and 36%. Then f (P ) is actually the same
value as the probability of obtaining a frequency of ups between 30% an 36% in 1000
consecutive independent random tossings of a biased coin with P(up) = 1/3 and P(down)
= 2/3. And in fact, this value is very close to 1. Thus, the above rule for counting worlds
implies the frequency of ups lies between 30% and 36% in the overwhelming majority
of worlds. This reasoning was essentially developed by Everett.
A comparison with Bohmian mechanics is useful. The initial configuration of the
lab determines the precise sequence such as (14.8). If the initial configuration is chosen
with |Ψ0 |2 distribution, then with overwhelming probability the sequence will have a
fraction of ups between 30% and 36%. That is, if we count initial conditions with
the |Ψ0 |2 distribution,R that is, if we say that the fraction of initial conditions lying
in a set B ⊆ R3N is B |Ψ0 |2 , then we can say that for the overwhelming majority of
Bohmian worlds, the observed frequency is about 33%. Now to make the connection
with many-worlds, note that the reasoning does not depend, in fact, on whether all of

81
the worlds are realized or just one. That is, imagine many Bohmian worlds with the
same initial wave function Ψ0 but different initial configurations, distributed across the
ensemble according to |Ψ0 |2 . Then there is an explanation for why inhabitants should
see a frequency of about 33%.
The problem that remains is whether there is room for a rule for counting worlds.
In terms of a creation myth, suppose God created the wave function Ψ and made it a
law that Ψ evolves according to the Schrödinger equation; then he created matter in
3-space distributed with density m(x, t) and made it a law that m is given by (14.1).
Now what would God need to do in order to make the rule for counting worlds a law?
He does not create anything further, so in which way would two universes with equal Ψ
and m but different rules for counting worlds differ? That is a reason for thinking that
ultimately, Sm fails to work (though in quite a subtle way).
Various authors have proposed other reasonings for justifying probabilities in many-
worlds theories; they seem less relevant to me, but let me mention a few. David
Deutsch23 proposed that it is rational for inhabitants of a universe governed by a many-
worlds theory (a “multiverse,” as it is often called) to behave as if the events they
perceive were random with probabilities given by the Born rule; he proposed certain
principles of rational behavior from which he derived this. (Of course, this reasoning
does not provide an explanation of why we observe frequencies in agreement with Born’s
rule.) Lev Vaidman24 proposed that in a many-worlds scenario, I can be ignorant of
which world I am in: before the measurement, I know that there will be a copy of me
in each post-measurement world, and afterwards, I do not know which worlds I am in
until I look at the pointer position. And I could try to express my ignorance through
a probability distribution, although it is not clear why the Born distribution would be
correct and other distributions would not.
For comparison, in Bell’s many-worlds theories it is not hard to make sense of prob-
abilities. In Bell’s first theory, there is an ensemble of worlds at every time t, and clearly
most of the worlds have configurations that look as if randomly chosen with |Ψ|2 distri-
bution, in particular with a frequency of ups near 33% in the example described earlier.
In Bell’s second theory, Qt is actually random with |Ψt |2 distribution, and although the
recorded sequence of outcomes fluctuates within every fraction of a second, the sequence
in our memories and records at time t has, with probability near 1, a frequency of ups
near 33%.

23
D. Deutsch: Quantum theory of probability and decisions. Proceedings of the Royal Society of
London A 455: 3129–3137 (1999) http://arxiv.org/abs/quant-ph/9906015
24
L. Vaidman: On Schizophrenic Experiences of the Neutron or Why We should Believe in the
Many-Worlds Interpretation of Quantum Theory. International Studies in the Philosophy of Science
12: 245–261 (1998) http://arxiv.org/abs/quant-ph/9609006

82
15 The Einstein–Podolsky–Rosen Argument
In the literature, the “EPR paradox” is often mentioned. It is clear from EPR’s article
that they did not intend to describe a paradox (as did, e.g., Wheeler when describing
the delayed-choice experiment), but rather to describe an argument. The argument
supports the conclusion that there are additional variables beyond the wave function.
I now explain their reasoning in my own words, partly in preparation for Bell’s 1964
argument ,which builds on EPR’s argument.

15.1 The EPR Argument

EPR consider 2 particles in 1 dimension with entangled wave function
Ψ(x1 , x2 ) = δ(x1 − x2 + x0 ) , (15.1)
with x0 a constant. (We ignore the fact that this wave function is unphysical because it
does not lie in Hilbert space; the same argument could be made with square-integrable
functions but would become less transparent.) An observer, let us call her Alice, mea-
sures the position of particle 1. The outcome X1 is uniformly distributed, and the wave
function collapses to
Ψ0 (x1 , x2 ) = δ(x1 − X1 )δ(x2 − X1 − x0 ) , (15.2)
so that another observer, Bob, measuring the position of particle 2, is certain to obtain
X2 = X1 + x0 . It follows that particle 2 had a position even before Bob made his
experiment. Now EPR make the assumption that
no real change can take place in the second system in
(15.3)
consequence of [a measurement on] the first system.
They take it as obviously true, but it is worthy of a closer examination. We will come
back to it in the next chapter. It then follows that particle 2 had a definite position
even before Alice made her experiment, despite the fact that Ψ is not an eigenfunction
of x2 -position. Quod erat demonstrandum.

EPR draw further conclusions from their example by considering also momentum.
Note that the Fourier transform of Ψ is
b 1 , k2 ) = e−ik1 x0 δ(k1 + k2 ) .
Ψ(k (15.4)
Alice could measure either the position or the momentum of particle 1, and Bob either
the position or the momentum of particle 2. If Alice measures position then, as seen
above, the outcome X1 is uniformly distributed and Bob, if he chooses to measure
position, finds X2 = X1 + x0 with certainty. If, alternatively, Alice measures momentum
then the outcome K1 will be uniformly distributed and the wave function in momentum
representation collapses from Ψ
b to
b 00 (k1 , k2 ) = e−iK1 x0 δ(k1 − K1 ) δ(k2 + K1 )
Ψ (15.5)

83
so that Bob, if he chooses to measure momentum, is certain to find K2 = −K1 . In
the same way as above, it follows that Bob’s particle had a position before any of the
experiments, and that it had a momentum!
There even arises a way of simultaneously measuring the position and momentum of
particle 2: Alice measures position X1 and Bob momentum K2 . Since particle 2 has, as
just proved, a well-defined position and a well-defined momentum, and since, by (15.3),
Alice’s measurement did not influence particle 2, K2 must be the original momentum
of particle 2. Likewise, if Bob had chosen to measure position, his result would have
agreed with the original position, and since it would have obeyed X2 = X1 + x0 , we can
infer from Alice’s result what the original position must have been.

15.2 Bohm’s Version of the EPR Argument Using Spin

In 1951, before he discovered Bohmian mechanics, Bohm wrote a textbook about quan-
tum mechanics in which he followed the orthodox view. In it, he also described the
following useful variant of the EPR argument.
Consider two spin- 12 particles with joint spinor in C4 given by the singlet state

1
φ = √ |z-upi|z-downi − |z-downi|z-upi . (15.6)
2
Alice measures σ3 on particle 1. The outcome Z1 is ±1, each with probability 1/2. If
Z1 = +1 then the wave function collapses to

φ0+ = |z-upi|z-downi , (15.7)

and Bob, measuring σ3 on particle 2, is certain to obtain Z2 = −1. If, however, Z1 = −1

then the wave function collapses to

φ0− = |z-downi|z-upi , (15.8)

and Bob is certain to obtain Z2 = +1. Thus, always Z2 = −Z1 ; one speaks of perfect
anti-correlation. As a consequence, particle 2 had a definite value of z-spin even before
Bob’s experiment. Now, from the assumption (15.3) it follows that it had that value
even before Alice’s experiment. Likewise, particle 1 had a definite value of z-spin before
any attempt to measure it.
Again as in EPR’s reasoning, we can consider other observables, say σ1 and σ2 . In
homework Exercise 30 of Assignment 7, we checked that the singlet state has the same
form relative to the x-spin basis or the y-spin basis. It follows that if Alice and Bob both
measure x-spin then their outcomes are also perfectly anti-correlated, and likewise for
y-spin. It can be inferred that each spin component, for each particle, has a well-defined
value before any experiment.
Moreover, Alice and Bob together can measure σ1 and σ3 of particle 2: Alice measures
σ1 of particle 1 and Bob σ3 of particle 2. By (15.3) and the perfect anti-correlation, the
negative of Alice’s outcome is what Bob would have obtained had he measured σ1 ; and
by (15.3), Bob’s outcome is not affected by Alice’s experiment.

84
15.3 Einstein’s Boxes Argument
We have seen that EPR’s argument yields more than just the incompleteness of the
wave function. It also yields that particles have well-defined positions and momenta.
If we only want to establish the incompleteness of the wave function, which seems like
a worthwhile goal for a proof, a simpler argument will do. Einstein developed such an
argument already in 1927 (before the EPR paper), presented it at a conference but never
published it.25
Consider a single particle whose wave function ψ(x) is confined to a box B with
impermeable walls and (more or less) uniform in B. Now split B (e.g., by inserting a
partition) into two boxes B1 and B2 , move one box to Tokyo and the other to Paris.
There is some nonzero amount of the particle’s wave function in Paris and some in
Tokyo. Carry out a detection in Paris. Let us assume that
no real change can take place in Tokyo in consequence
(15.9)
of a measurement in Paris.
If we believed that the wave function was a complete description of reality, then there
would be no fact of the matter, before the detection experiment, about whether the
particle is in Paris or Tokyo, but afterwards there would be. This contradicts (15.9), so
the wave function cannot be complete.
The assumption (15.9) is intended as allowing changes in Tokyo after a while, such
as the while it would take a signal to travel from Paris to Tokyo at the speed of light.
That is, (15.9) (and similarly (15.3)) is particularly motivated by the theory of relativity,
which strongly suggests that signals cannot propagate faster than at the speed of light.
On one occasion, Einstein wrote that the faster-than-light effect entailed by insisting
on completeness of the wave function was “spukhafte Fernwirkung” (spooky action-at-
a-distance).

15.4 Too Good To Be True

EPR’s argument is, in fact, correct. Nevertheless, it may strike you that its conclusion,
the incompleteness of the wave function, is very strong—maybe too strong to be true.
After all, it is not true in GRW or many-worlds! How can this be: that EPR proved
something that is not true?
This can happen only because the assumption (15.3) is actually not true in these
theories. And in Bohmian mechanics, where the wave function is in fact incomplete,
it is not true that all spin observables have pre-existing actual values, as would follow
from EPR’s reasoning. Thus, also in Bohmian mechanics (15.3) is not true. We will see
in the next chapter that (15.3) is problematical in any version of quantum mechanics.
This fact was discovered 30 years after EPR’s paper by John Bell.

25
It has been reported by, e.g., L. de Broglie: The Current Interpretation of Wave Mechanics: A
Critical Study. Elsevier (1964). A more detailed discussion is given by T. Norsen: Einstein’s Boxes,
American Journal of Physics 73(2): 164–176 (2005) http://arxiv.org/abs/quant-ph/0404016

85
16 Nonlocality
Two space-time points x = (s, x) and y = (t, y) are called spacelike separated iff no
signal propagating at the speed of light can reach x from y or y from x. This occurs iff

|x − y| > c|s − t| , (16.1)

with c = 3 × 108 m/s the speed of light. Einstein’s theory of relativity strongly suggests
that signals cannot propagate faster than at the speed of light (superluminally). That
is, if x and y are spacelike separated then no signal can be sent from x to y or from y
to x. This in turn suggests that
If x and y are spacelike separated then events at x cannot
(16.2)
influence events at y.
This statement is called locality. It is true in relativistic versions of classical physics
(mechanics, electrodynamics, and also in Einstein’s relativistic theory of gravity he
called the general theory of relativity). Bell proved in 1964 that locality is false if certain
empirical predictions of the quantum formalism are correct; this analysis is often called
Bell’s theorem.26 The relevant predictions have since been experimentally confirmed;
the first convincing tests were carried out by Alain Aspect in 1982.27 Thus, locality is
false in our world; this fact is often called quantum nonlocality. Our main goal in this
chapter is to understand Bell’s proof.
Some remarks.

• Einstein believed in locality until his death in 1955. Locality is very closely related
to (almost the same as) the EPR assumption (15.3): If Alice’s measurement takes
place at x and Bob’s at y, and if x and y are spacelike separated, then locality
implies that Alice’s measurement on particle 1 at x cannot affect particle 2 at y.
Conversely, the only situation in which we can be certain that the two particles
cannot interact occurs if Alice’s and Bob’s experiments are spacelike separated
and locality holds true. Ironically, EPR were wrong even though their argument
was correct: The premise (15.3) is false. They took locality for granted. Likewise
in Einstein’s boxes argument, the assumption (15.9) is equivalent to locality: The
point of talking about Tokyo and Paris is that these two places are distant, and
since there clearly can be influences if we allow more time than distance/c, the
assumption is that there cannot be an influence between spacelike separated events.

• Despite nonlocality, it is not possible to send messages faster than light, according
to the appropriate relativistic version of the quantum formalism; this fact is often
called the no-signalling theorem. We will prove it in great generality in a later
26
J. S. Bell: On the Einstein-Podolsky-Rosen Paradox. Physics 1: 195–200 (1964) Reprinted as
chapter 2 of J. S. Bell: Speakable and unspeakable in quantum mechanics. Cambridge University Press
(1987)
27
A. Aspect, J. Dalibard, G. Roger: Experimental Test of Bell’s Inequalities using Time-Varying
Analyzers. Physical Review Letters 49: 1804–1807 (1982)

86
chapter. Put differently, the superluminal influences cannot be used by agents for
sending messages.

• Does nonlocality prove relativity wrong? That statement would be too strong.
Nonlocality proves a certain understanding of relativity wrong. Much of relativity
theory, however, remains untouched by nonlocality.

• If x and y are spacelike separated then relativistic Hamiltonians contain no inter-

action term between x and y.
Let me explain this statement. The Schrödinger equation is non-relativistic and
needs to be replaced, in a relativistic theory, by a relativistic equation. The latter
is different from the non-relativistic Schrödinger equation in two ways: (i) Instead
of interaction potentials, interaction arises from the creation and annihilation of
particles. For example, an electron can create a photon, which travels to another
electron and is annihilated there. Potentials can only be used as an approximation.
(ii) Even leaving interaction aside, relativity requires a modification of the Schrö-
dinger equation. The best known such modification is the Dirac equation for
electrons. It entails that the wave function can propagate no faster than at the
speed of light c. Since also photon wave functions propagate no faster than at c,
and since potentials are absent, there is no interaction term in the Hamiltonian
between particles at x and at y.
So there are two meanings to the word “interaction”: first, an interaction term in
the Hamiltonian; second, any influence. Bell’s proof shows that in the absence of
the first type of interaction, the second type can still be present.

• Bell’s proof shows for a certain experiment that either events at x must have
influenced events at y or vice versa, but does not tell us who influenced whom.

16.1 Bell’s Experiment

As in Bohm’s version of the EPR example, consider two spin- 21 particles in the singlet
state
1
φ = √ |z-upi|z-downi − |z-downi|z-upi . (16.3)
2
While keeping their spinor constant, the two particles are brought to distant places.
Alice makes an experiment on particle 1 at (or near) space-time point x and Bob one on
particle 2 at y; x and y are spacelike separated. Each experimenter chooses a direction
in space, corresponding to a unit vector n ∈ R3 , and carries out a Stern–Gerlach exper-
iment in that direction, i.e., a quantum measurement of n · σ. The difference to Bohm’s
example is that Alice and Bob can choose different directions. I write α for Alice’s unit
vector, β for Bob’s, Z 1 for the random outcome ±1 of Alice’s experiment, and Z 2 for
that of Bob’s. Let us compute the joint distribution µα,β of Z 1 and Z 2 .

87
Fact 1. For any unit vector n ∈ R3 ,

φ ∝ |n-upi|n-downi − |n-downi|n-upi . (16.4)

Sketch of proof: Consider first the case that n is infinitesimally close to the z-direction,
arising from (0, 0, 1) by a rotation around the axis along the unit vector m = (cos γ, sin γ, 0)
through an infinitesimal angle δϕ. Then

σm++ σm+− 0 cos γ − i sin γ
σm = m · σ = = (16.5)
σm−+ σm−− cos γ + i sin γ 0

and

1 δϕ 1 1 δϕ 0
|n-upi = + m·σ = + (16.6)
0 2 0 0 2 σm−+

0 δϕ 0 0 δϕ σm+−
|n-downi = + m·σ = + (16.7)
1 2 1 1 2 0

because spinors rotate through half the angle δϕ. As a consequence, to first order in δϕ,

0 1 δϕ σm+− 0
|n-upi|n-downi = + (16.8)
0 0 2 0 σm−+

and
|n-upi|n-downi − |n-downi|n-upi =

0 1 δϕ σm+− 0 δϕ σm+− 0
+ − . (16.9)
−1 0 2 0 σm−+ 2 0 σm−+
This proves (16.4) for an infinitesimal change in n. Now think of a finite change in n as
partitioned into infinitely many infinitesimal changes. This proves (16.4) for arbitrary
n.

Fact 2. Independently of whether Alice’s or Bob’s experiment occurs first, the joint
distribution of Z 1 , Z 2 is

P(up,up) P(up,down)
µα,β := (16.10)
P(down,up) P(down,down)
1
!
4
− 41 α · β 14 + 14 α · β
= 1 1 (16.11)
4
+ 4 α · β 14 − 41 α · β
!
1
2
sin2 (θ/2) 21 cos2 (θ/2)
= 1 , (16.12)
2
cos2 (θ/2) 21 sin2 (θ/2)

with θ the angle between α and β.

88
Proof: Assume that Alice’s experiment occurs first and write the initial spinor as
φ = c|α-upi|α-downi − c|α-downi|α-upi (16.13)
√
with c a complex constant with |c| = 1/ 2. According to Born’s rule, Alice obtains +1
or −1, each with probability 1/2. In case Z 1 = +1, φ collapses to
φ0+ = |α-upi|α-downi . (16.14)
2
According to Born’s rule, the probability that Bob obtains Z = +1 is
2 2
P(Z 2 = +1|Z 1 = +1) = hβ-up|α-downi = 1 − hβ-up|α-upi . (16.15)
Since the angle in Hilbert space between |β-upi and |α-upi is half the angle between β
and α, and since they are unit vectors in Hilbert space, we have that
hβ-up|α-upi = cos(θ/2) (16.16)
and thus
P(Z 2 = +1|Z 1 = +1) = 1 − cos2 (θ/2) = sin2 (θ/2) (16.17)
and
1 2
P(Z 1 = +1, Z 2 = +1) = sin (θ/2) . (16.18)
2
1
Since cos2 x = 2
+ 12 cos(2x), this value can be rewritten as
1 1 1 1 1 1 1
P(Z 1 = +1, Z 2 = +1) = − cos2 (θ/2) = − − cos θ = − α · β . (16.19)
2 2 2 4 4 4 4
The other three matrix elements can be computed in the same way. Assuming that
Bob’s experiment occurs first leads to the same matrix.
Remarks.
• Note that the four entries in µα,β are nonnegative and add up to 1, as they should.
• In the case α = β corresponding to Bohm’s version of the EPR example,
!
0 12
µα,α = 1 , (16.20)
2
0
implying the perfect anti-correlation Z 2 = −Z 1 .
• The marginal distribution is the distribution of Z 1 alone, irrespective of Z 2 . It is
1/2, 1/2. Likewise for Z 2 . Let us assume that Alice’s experiment occurs first. Then
the fact that the marginal distribution for Z 2 is 1/2, 1/2 amounts to a no-signalling
theorem for Bell’s experiment: Bob cannot infer from Z 2 any information about
Alice’s choice α because the distribution of Z 2 does not depend on α. (The general
no-signalling theorem that we will prove later covers all possible experiments.)
• The fact that the joint distribution of the outcomes does not depend on the order
of experiments means that the observables measured by Alice and Bob can be
simultaneously measured. What are these observables, actually? Alice’s is the
matrix σα ⊗ I with components σαs1 s01 δs2 s02 , and Bob’s is I ⊗ σβ with components
δs1 s01 σβs2 s02 .

89
16.2 Bell’s 1964 Proof of Nonlocality
Let us recapitulate what needs to be shown in Bell’s theorem. The claim is that the
joint distribution µα,β of Z 1 and Z 2 , as a function of α and β, is such that it cannot
be created in a local way (i.e., in the absence of influences) if no information about α
and β is available beforehand. We can also put it this way: it is impossible for two
computers A and B to be set up in such a way that, upon input of α into A and β into
B, A produces a random number Z 1 and B Z 2 with joint distribution µα,β if A and B
cannot communicate (while they can use prepared random bits that both have copies
of).28 To put this yet differently, two suspects interrogated separately by police cannot
provide answers Z 1 and Z 2 with distribution µα,β when asked the questions α and β,
no matter which prior agreement they took.
Bell’s proof involves two parts. The first part is the EPR argument (in Bohm’s ver-
sion), applied to all directions α; it shows that if locality is true then the values of Z 1
and Z 2 must have been determined in advance. Thus, in every run of the experiment,
there exist well-defined values Zα1 for every α and Zα2 = −Zα1 even before any measure-
ment. Moreover, Alice’s outcome will be Zα1 for the α she chooses; also Bob’s outcome
will be Zβ2 = −Zβ1 for the β he chooses, also if β 6= α and independently of whether
Alice’s or Bob’s experiment occurs first. (Put differently, the two suspects must have
agreed in advance on the answer to every possible question.)
In other words, locality implies the existence of random variables Zαi , i = 1, 2 and
|α| = 1, such that Alice’s outcome is Zα1 and Bob’s is Zβ2 . In particular, focusing on
components in only 3 directions a, b and c, locality implies the existence of 6 random
variables Zαi , i = 1, 2, α = a, b, c such that
Zαi = ±1 (16.21)
Zα1 = −Zα2 (16.22)
and, more generally,
P(Zα1 6= Zβ2 ) = qαβ , (16.23)
2
where the qαβ = µα,β (+−)+µα,β (−+) = (1+α·β)/2 = cos (θ/2) are the corresponding
quantum mechanical probabilities.
The second part of the proof involves only very elementary mathematics. Clearly,
P {Za1 = Zb1 } ∪ {Zb1 = Zc1 } ∪ {Zc1 = Za1 } = 1 ,

(16.24)
since at least two of the three (2-valued) variables Zα1 must have the same value. Hence,
by elementary probability theory,
P Za1 = Zb1 + P Zb1 = Zc1 + P Zc1 = Za1 ≥ 1,

(16.25)
and using the perfect anti-correlations (16.22) we have that
P Za1 = −Zb2 + P Zb1 = −Zc2 + P Zc1 = −Za2 ≥ 1.

(16.26)
28
This statement is perhaps a bit less general than Bell’s theorem because computers always work
in either a deterministic or a stochastic way, while Bell’s theorem would apply even to a theory, if it
exists, that is neither deterministic nor stochastic.

90
(16.26) is equivalent to the celebrated Bell inequality. It is incompatible with (16.23).
For example, when the angles between a, b and c are 120◦ , the 3 relevant qαβ are all
1/4, implying a value of 3/4 for the left hand side of (16.26).

16.3 Bell’s 1976 Proof of Nonlocality

Here is a different proof of nonlocality, first published by Bell in 1976;29 it is also
described in Bell’s article “Bertlmann’s socks.” It was designed for the purpose of
allowing small experimental errors in all probabilities, so that the perfect anti-correlation
in the case θ = 0 becomes merely a near-perfect anti-correlation, and the conclusion of
pre-existing values cannot be drawn.30
Suppose that two computers produce outcomes Z 1 , Z 2 , each either +1 or −1, with
joint distribution P(Z 1 , Z 2 |α, β) when given the input α respectively β. Let λ be the
information given in advance to both computers, such as an algorithm and random bits,
and let ρ be the probability distribution of λ. Then
Z
P(Z , Z |α, β) = dλ ρ(λ) P(Z 1 , Z 2 |α, β, λ) ,
1 2
(16.27)

where the last factor is the conditional distribution of the outcomes, given λ.
What is the condition on P that characterizes the absence of communication? Sup-
pose computer 1 makes its decision about Z 1 first. In the absence of communication,
it has only λ and α as the basis of its decision (which may still be random); thus, the
(marginal) distribution of Z 1 does not depend on β:

P(Z 1 |α, β, λ) = P(Z 1 |α, λ) . (16.28)

Computer 2 has only λ and β as the basis of its decision; thus, the (conditional) distri-
bution of Z 2 does not depend on α or Z 1 :

P(Z 2 |Z 1 , α, β, λ) = P(Z 2 |β, λ) . (16.29)

From these two equations together, we obtain

P(Z 1 , Z 2 |α, β, λ) = P(Z 1 |α, λ) P(Z 2 |β, λ) (16.30)

as the characterization of locality (i.e., the absence of communication). Note that Z 1

and Z 2 can very well be dependent (correlated), like Bertlmann’s socks or the glove
left at home and the glove in my pocket, if the mutual dependence is based on their
dependence on the common cause λ.
29
J. S. Bell: The theory of local beables. Epistemological Letters 9: 11 (1976)
30
The advantage of robustness of the argument under small errors comes at the price that the argu-
ment needs to assume that the true theory of quantum mechanics is either deterministic or stochastic.
I am unable to provide an example of a theory that is neither, but some authors (e.g., John H. Con-
way and Simon Kochen) have conjectured that the true laws of nature be neither; and Bell’s original
nonlocality proof, presented in Section 16.2 above, would apply also in that case.

91
Now we want to know how the locality condition (16.30) restricts the possibility of
functions to occur as P(Z 1 , Z 2 |α, β). To this end, we introduce the correlation coefficient
defined by X X
κ(α, β) = z1 z2 P(Z 1 = z1 , Z 2 = z2 |α, β) . (16.31)
z1 =±1 z2 =±1

Proposition 16.1. Locality implies the following version of Bell’s inequality known as
the CHSH inequality31 :

κ(α, β) + κ(α, β 0 ) + κ(α0 , β) − κ(α0 , β 0 ) ≤ 2 . (16.32)

Proof. Locality (16.30) implies that

Since Z i ∈ {1, −1}, we have that

E(Z 1 |α, λ) ≤ 1 and E(Z 2 |β, λ) ≤ 1 . (16.36)

Now for any u, v ∈ [−1, 1],

|u + v| + |u − v| ≤ 2 (16.39)
because

(u + v) + (u − v) = 2u ≤ 2 (−u − v) + (u − v) = −2v ≤ 2 (16.40)

(u + v) + (v − u) = 2v ≤ 2 (−u − v) + (v − u) = −2u ≤ 2 . (16.41)
31
This version (though with a different derivation making stronger assumptions) first appeared in
J. F. Clauser, R. A. Holt, M. A. Horne, A. Shimony: Proposed Experiment to Test Local Hidden-
Variable Theories. Physical Review Letters 23: 880–884 (1969)

92
Hence, setting u = E(Z 2 |β, λ) and v = E(Z 2 |β 0 , λ),

κ(α, β) + κ(α, β 0 ) + κ(α0 , β) − κ(α0 , β 0 )

≤ κ(α, β) + κ(α, β 0 ) + κ(α0 , β) − κ(α0 , β 0 ) (16.42)

(16.38)
Z
≤ dλ ρ(λ) E(Z 2 |β, λ) + E(Z 2 |β 0 , λ) + E(Z 2 |β, λ) − E(Z 2 |β 0 , λ) (16.43)
(16.39)
≤ 2. (16.44)

Since the quantum mechanical prediction µα,β for the Bell experiment has

κ(α, β) = µα,β (++) − µα,β (+−) − µα,β (−+) + µα,β (−−) = −α · β = − cos θ , (16.45)

setting (in some plane)

α = 0◦ , α0 = 90◦ , β = 45◦ , β 0 = −45◦ (16.46)

leads to √
κ(α, β) + κ(α, β 0 ) + κ(α0 , β) − κ(α0 , β 0 ) = −2 2 , (16.47)
violating (16.32).
Now if the values of P(Z 1 = z1 , Z 2 = z2 |α, β) are known only with some inaccuracy
(because they were obtained experimentally, not from the quantum formalism) then also
the κ(α, β) are subject to some inaccuracy. But if (16.32) is violated by more than the
inaccuracy, then locality is refuted.

16.4 Photons
Experimental tests of Bell’s inequality are usually done with photons instead of electrons.
For photons, spin is usually called polarization, and the Stern–Gerlach magnets are
replaced with polarization analyzers (also known as polarizers), i.e., crystals that are
transparent to the |z-upi part of the wave but reflect (or absorb) the |z-downi part.
Like the Stern–Gerlach magnets, the analyzers can be rotated into any direction. Since
photons have spin 1, θ/2 needs to be replaced by θ.

93
17 Further Discussion of Nonlocality
17.1 Nonlocality in Bohmian Mechanics, GRW, Copenhagen,
Many-Worlds
Since we have considered only non-relativistic formulations of these theories, we cannot
directly analyze spacelike separated events, but instead we can analyze the case of two
systems (e.g., Alice’s lab and Bob’s lab) without interaction (i.e., without an interaction
term between them in the Hamiltonian).

• Bohmian mechanics is explicitly nonlocal, as the velocity of particle 2 depends

on the position of particle 1, no matter how distant and no matter whether there is
interaction. That is where the superluminal influence occurs. (Historically, Bell’s
nonlocality analysis was inspired by the examination of Bohmian mechanics.)
This influence depends on entanglement: In the absence of entanglement, the ve-
locity of particle 2 is independent of the position of particle 1. The fact that
Bohmian mechanics is local for disentangled wave functions shows that it was nec-
essary for proving non-locality to consider at least two particles and an entangled
wave function (such as the singlet state). It can be shown that any entangled wave
function violates Bell’s inequality for some observables.
Furthermore, the position of particle 1 will depend on the external fields at work
near particle 1. That is, for any given initial position of particle 1, its later position
will depend on the external fields. An example of an external field is the field of
the Stern–Gerlach magnet. To a large extent, we can control external fields at our
whim; e.g., we can rotate the Stern–Gerlach magnet. Bohm’s equation of motion
implies that these fields have an instantaneous influence on the motion of particle
2.

• In GRW theory, nonlocality comes in at the point when the wave function
collapses, as then it does so instantaneously over arbitrary distances.
At least, this trait of the theory suggests that GRW is nonlocal, and in fact that is
the ultimate source of the nonlocality. Strictly speaking, however, the definition of
nonlocality, i.e., the negation of (16.2), requires that events at x and at y influence
each other, and the value of the wave function ψt (x1 , x2 ) is linked to several space-
time points, (t, x1 ) and (t, x2 ), and thus is not an example of an “event at x.”
So we need to formulate the proof that GRW theory is nonlocal more carefully;
of course, Bell’s proof achieves this, but we can give a more direct proof. Since
the “events at x” are not given by the wave function itself but by the primitive
ontology, we need to consider GRWf and GRWm separately.
In GRWf, consider Einstein’s boxes example. The wave function of a particle
is half in a box in Paris and half in a box in Tokyo. Let us apply detectors to
both boxes at time t, and consider the macroscopic superposition of the detectors
arising from the Schrödinger equation. It is random whether the first flash (in

94
any detector) after t occurs in Paris or in Tokyo. Suppose it occurs in Tokyo, and
suppose it can occur in one of two places in Tokyo, corresponding to the outcomes
0 or 1. If it was 1, then after the collapse the wave function of the particle is
100% in Tokyo, and later flashes in Paris are certain to occur in a place where
they indicate the outcome 0—that is a nonlocal influence of a flash in Tokyo on
the flashes in Paris.
Likewise in GRWm: If, after the first collapse, the pointer of the detector in Tokyo,
according to the m function, points to 1 then the pointer in Paris immediately
points to 0. (You might object that the Tokyo pointer position according to
the m function was not the cause of the Paris pointer position, but rather both
pointer positions were caused by the collapse of the wave function. However, this
distinction is not relevant to whether the theory is nonlocal.)
Note that while Bell’s proof shows that any version of quantum mechanics must
be nonlocal, for proving that GRWf and GRWm are nonlocal it is sufficient to
consider a simpler situation, that of Einstein’s boxes.
Both GRWf and GRWm are already nonlocal when governing a universe containing
only one particle; thus, their nonlocality does not depend on the existence of a
macroscopic number of particles, and they are even nonlocal in a case (one particle)
in which Bohmian mechanics is local. For example, consider a particle with wave
function
1
ψ = √ |herei + |therei (17.1)
2
at time t, as in Einstein’s boxes example. Suppose that |herei and |therei are
two narrow wave packets separated by a distance of 500 million light years. The
distance is so large that the first collapse is likely to occur before a light signal can
travel between the two places. For GRWf, a flash here precludes a flash there—
that is a nonlocal influence. For GRWm, if the wave function collapses to |herei
then m(here) doubles and m(there) instantaneously goes to zero—that is a nonlo-
cal influence. (There is a relativistic version of GRWm32 in which m(there) goes to
zero only after a delay of distance/c, or when a collapse centered “there” occurs.
Nevertheless, also this theory is nonlocal even for one particle because when a col-
lapse centered “there” occurs, which can happen any time, then m(there) cannot
double (as it could in a local theory) but must go to zero.)

• That orthodox quantum mechanics (OQM) is nonlocal can also be seen from
Einstein’s boxes argument: OQM says the outcomes of the detectors are not pre-
determined. (That is, there is no fact about where the particle really is before
any detectors are applied.) Thus, the outcome of the Tokyo detector must have
influenced the Paris detector, or vice versa.
32
D. Bedingham, D. Dürr, G.C. Ghirardi, S. Goldstein, R. Tumulka, and N. Zanghı̀: Matter Density
and Relativistic Models of Wave Function Collapse. Journal of Statistical Physics 154: 623–631 (2014)
http://arxiv.org/abs/1111.1425

95
This, of course, was the point of Einstein’s boxes argument: He objected to OQM
because it is nonlocal.

• Many-worlds is nonlocal, too. This is not obvious from Bell’s argument because
the latter is formulated in a single-world framework. Here is why Sm is nonlocal.33
After Alice carries out her Stern–Gerlach experiment, there are two pointers in her
lab, one pointing to +1 and the other to −1. Then Bob carries out his experiment,
and there are two pointers in his lab. Suppose Bob chose the same direction as
Alice. Then the world in which Alice’s pointer points to +1 is the same world as
the one in which Bob’s pointer points to −1, and this nonlocal fact was created
in a nonlocal way by Bob’s experiment. The same kind of nonlocality occurs in
Sm already in Einstein’s boxes experiment: The world in which a particle was
detected in Paris is the same as the one in which no particle was detected in
Tokyo—a nonlocal fact that arises as soon as both experiments are completed,
without the need to wait for the time it takes light to travel from Paris to Tokyo.
How about Bell’s many-worlds theories? The second theory, involving a random
configuration selected independently at every time, is very clearly nonlocal, for
example in Einstein’s boxes experiment: At every time t, nature makes a random
decision about whether the particle is in Paris, and if it is, nature ensures imme-
diately that there is no particle in Tokyo. A local theory would require that the
particle has a continuous history of traveling, at a speed less than that of light,
to either Paris or Tokyo, and this history is missing in Bell’s second many-worlds
theory. Bell’s first many-worlds theory is even more radical, in fact in such a way
that the concept of locality is not even applicable. The concept of locality requires
that at every point in space, there are local variables whose changes propagate at
most at the speed of light. Since in Bell’s first many-worlds theory, no association
is made between worlds at different times, one cannot even ask how any local
variables would change with time. Thus, this theory is nonlocal as well.

Another remark concerns the connection between Bell’s 1976 nonlocality proof and
the theories mentioned above. In physical theories, λ represents the information located
at all space-time points from which light signals can reach both x and y. In orthodox
quantum mechanics and GRW theory, λ is the wave function ψ; in Bohmian mechanics,
λ is ψ together with the initial configuration of the two particles.

17.2 Popular Myths About Bell’s Proof

Let P be the hypothesis that, prior to any experiment, there exist values Zni (for all
i = 1, 2 and n ∈ R3 with |n| = 1) such that Alice and Bob obtain as outcomes Zα1 and
Zβ2 . These values are often called hidden variables. Then Bell’s nonlocality argument,
33
The argument is taken from V. Allori, S. Goldstein, R. Tumulka, and N. Zanghı̀: Many-Worlds
and Schrödinger’s First Quantum Theory. British Journal for the Philosophy of Science 62(1): 1–27
(2011) http://arxiv.org/abs/0903.2211

96
described in Section 16.2, has the following structure:

Part 1: quantum mechanics + locality ⇒ P (17.2)

Part 2: quantum mechanics ⇒ not P (17.3)
Conclusion: quantum mechanics ⇒ not locality (17.4)

For this argument what is relevant about “quantum mechanics” is merely the predictions
concerning experimental outcomes corresponding to (16.21)–(16.23) (with part 1 using
in fact only (16.22)).
Certain popular myths about Bell’s proof arise from missing part 1 and noticing only
part 2 of the argument. (In Bell’s 1964 paper, part 1 is formulated in 3 lines, part 2 in
2.5 pages.) Bell, Speakable and unspeakable, p. 143:
It is important to note that to the limited degree to which determinism plays
a role in the EPR argument, it is not assumed but inferred. What is held
sacred is the principle of ‘local causality’ – or ‘no action at a distance’. [. . . ]
It is remarkably difficult to get this point across, that determinism is not a
presupposition of the analysis.
Here, “determinism” means P. What Bell writes about the EPR argument is true in
spades about his own nonlocality argument: P plays a “limited role” because it is only
an auxiliary statement, and non-P is not the upshot of the argument.
The mistake of missing part 1 leads to the impression that Bell proved that

hidden variables are impossible, (17.5)

or that
hidden variables, while perhaps possible, must be nonlocal. (17.6)
These claims are still widespread, and were even more common in the 20th century.34
They are convenient for Copenhagenists, who tend to think that coherent theories of
the microscopic realm are impossible (see Section 13.3). Let me explain what is wrong
about (17.5) and (17.6).
Statement (17.5) is plainly wrong, since a deterministic hidden-variables theory exists
and works, namely Bohmian mechanics. The hidden variables that Bohmian mechanics
provides35 for the Bell experiment are of the form Zα,β i
, as the outcome according to
Bohmian mechanics depends on both parameter choices (at least for one i, namely for the
second Stern–Gerlach experiment). Considering the three directions relevant to Bell’s
i
inequality, the Zα,β are 18 random variables instead of 6 Zαi , and the dependence on
both α and β reflects the nonlocality of Bohmian mechanics. Bell did not establish the
impossibility of a deterministic reformulation of quantum theory, nor did he ever claim
to have done so.
34
For example, recall the title of Clauser et al.’s paper: Proposed Experiment to Test Local Hidden-
Variable Theories. Other authors claimed that Bell’s argument excludes “local realism.”
35
We assume a fixed temporal order of the two spin measurements, and that each is carried out as a
Stern–Gerlach experiment.

97
Statement (17.6) is true and non-trivial but nonetheless rather misleading. It follows
from (17.2) and (17.3) that any (single-world) account of quantum phenomena must be
nonlocal, not just any hidden-variables account. Bell’s argument shows that nonlocality
is implied by the predictions of standard quantum theory itself. Thus, if nature is
governed by these predictions (as has been confirmed in experiment), then nature is
nonlocal.

98
18 POVMs: Generalized Observables
18.1 Definition
An observable is mathematically represented by a self-adjoint operator. A generalized ob-
servable is mathematically represented by a positive-operator-valued measure (POVM).
Definition 18.1. An operator is called positive iff it is self-adjoint and all (generalized)
eigenvalues are greater than or equal to zero. (In linear algebra, a positive operator is
commonly called “positive semi-definite.”) Equivalently, a bounded operator A : H →
H is positive iff
hψ|A|ψi ≥ 0 for every ψ ∈ H . (18.1)
The sum of two positive operators is again a positive operator, whereas the product
of two positive operators is in general not even self-adjoint. Note that every projection
is a positive operator.
As a first, rough definition, we can say the following: A POVM is a family of positive
operators Ez such that X
Ez = I . (18.2)
z

(Refined definition later.)

so η ≤ 1.

1 0
2. E1 = , E2 = . In the special case in which all operators Ez are
0 1
projection operators, E is called a projection-valued measure (PVM). In this case,
the subspaces to which Ez and Ez0 (z 6= z 0 ) project must be mutually orthogonal
(homework problem).

3. Every self-adjoint matrix defines a PVM: Let z = α run through the eigenvalues
of A and let Eα be the projection to the eigenspace of A with eigenvalue α,
X
Eα = |φα,λ ihφα,λ | . (18.4)
λ

Then their sum is I, as easily seen from the point of view of an orthonormal basis
of eigenvectors of A. So E is a PVM, the spectral PVM of A. Example 2 above is
of this form for A = σ3 .

99
4. A POVM E and a vector ψ ∈ H with kψk = 1 together define a probability
distribution over z as follows:

Pψ (z) = hψ|Ez |ψi . (18.5)

To see this, note that hψ|Ez |ψi is a nonnegative real number since Ez is a positive
operator, and
X X
Pψ (z) = hψ|Ez |ψi = hψ|I|ψi = kψk2 = 1. (18.6)
z z

5. Fuzzy position observable:

1 (x−z)2
Ez ψ(x) = √ e− 2σ 2 ψ(x) . (18.7)
2πσ 2
Each Ez is a positive operator (but not a projection) because
Z
1 (x−z)2
hψ|Ez |ψi = dx ψ ∗ (x) √ e− 2σ2 ψ(x) ≥ 0 . (18.8)
2πσ 2
The Ez add to unity in the continuous sense:
Z
Ez dz = I . (18.9)

Indeed, Z Z
1 (x−z)2
dz Ez ψ(x) = √ ψ(x) dz e− 2σ 2 = ψ(x) . (18.10)
2πσ 2
The case of a continuous variable z brings us to the general definition of a POVM,
which I will formulate rigorously although we do not aim at rigor in general. The defini-
tion is, in fact, quite analogous to the rigorous definition of a probability distribution in
measure theory: A measure associates a value (i.e., a number or an operator) not with
a point but with a set: E(B) instead of Ez , where B ⊆ Z and Z is the set of all z’s.
More precisely, let Z be a set and B a σ-algebra of subsets of Z ,36 the family of the
“measurable sets.” A probability measure is a mapping µ : B → [0, 1] such that for any
B1 , B2 , . . . ∈ B with Bi ∩ Bj = ∅ for i 6= j,
∞
[ ∞
X
µ Bn = µ(Bn ) . (18.11)
n=1 n=1
36
A σ-algebra is a family B of subsets of Z such that ∅ ∈ B and, for every B1 , B2 , B3 , . . . in A also
B1c:= Z \ B1 ∈ B and B1 ∪ B2 ∪ . . . ∈ B. It follows that Z ∈ B and B1 ∩ B2 ∩ . . . ∈ B. A set Z
equipped with a σ-algebra is also called a measurable space. The σ-algebra usually considered on Rn
consists of the “Borel sets” and is called the “Borel σ-algebra.”

100
Definition 18.3. A POVM on the measurable space (Z , B) acting on the Hilbert
space H is a mapping E from B to the set of bounded operators on H such that each
E(B) is positive, E(Z ) = I, and for any B1 , B2 , . . . ∈ B with Bi ∩ Bj = ∅ for i 6= j,
∞
[ ∞
X
E Bn = E(Bn ) , (18.12)
n=1 n=1

where the series on the right-hand side converges in the operator norm.37

It follows that a POVM E and a vector ψ ∈ H with kψk = 1 together define a

probability measure on Z as follows:

µψ (B) = hψ|E(B)|ψi . (18.13)

(Verify the definition of a probability measure.) Again, one defines a PVM to be a

POVM such that every E(B) is a projection. In the special case in which Z is a
countable set and B consists of all subsets, any POVM satisfies
X
E(B) = Ez (18.14)
z∈B

with Ez = E({z}), so in that case Definition 18.3 boils down to the earlier definition
around (18.2). The fuzzy position observable of Example 5 corresponds to Z = R, B
the Borel sets, and E(B) the multiplication operator
Z
1 (x−z)2
E(B)ψ(x) = dz √ e− 2σ2 ψ(x) , (18.15)
B 2πσ 2
which multiplies by the function 1B ∗ g, where 1B is the characteristic function of B, g
is the Gaussian density function, and ∗ means convolution.
It turns out that every observable is a generalized observable; that is, every self-
adjoint operator A defines a PVM E with E(B) the projection to the so-called spectral
subspace of B. If there is an ONB of eigenvectors of A, then the spectral subspace of B
is the closed span of all eigenspaces with eigenvalues in B; that is, in that case E({z}) is
the projection to the eigenspace of eigenvalue z (and 0 if z is not an eigenvalue). In the
case of a general self-adjoint operator A, the following is a reformulation of the spectral
theorem:

Theorem 18.4. For every self-adjoint operator A there is a uniquely defined PVM E
on the real line with the Borel σ-algebra (the “spectral PVM” of A) such that
Z
A= α E(dα) . (18.16)
R
37
P It is equivalent to merely demand that the series on the right-hand side converges weakly, i.e., that
n hψ|E(Bn )|ψi converges for every ψ ∈ H .

101
R
To explain the last equation: In the same way as one can define the integral Z f (z) µ(dz)
of a measurable Rfunction f : Z → R relative to a measure µ, one can define an operator-
valued integral Z f (z) E(dz) relative to a POVM E. Eq. (18.16) is a generalization of
the relation X
A= α Eα (18.17)
α

for self-adjoint matrices A. If several self-adjoint operators A1 , . . . , An commute pair-

wise, then they can be diagonalized simultaneously, i.e., there is a PVM E on Rn such
that for every k ∈ {1, . . . , n},
Z
Ak = αk E(dα) . (18.18)
Rn

Example 18.5. The PVM diagonalizing the three position operators X1 , X2 , X3 on

L2 (R3 ) is (
ψ(x) if x ∈ B
E(B)ψ(x) = (18.19)
0 if x ∈
/ B,
mentioned before in (10.16). Equivalently, E(B) is the multiplication by the character-
istic function of B.

Example 18.6. It follows from the quantum formalism that if we make consecutive
ideal quantum measurements of observables A1 , . . . , An (which need not commute with
each other) at times 0 < t1 < . . . < tn respectively on a system with initial wave function
ψ0 ∈ H with kψ0 k = 1, then the joint distribution of the outcomes Z1 , . . . , Zn is of the
form
P (Z1 , . . . , Zn ) ∈ B = hψ0 |E(B)|ψ0 i (18.20)

for all (Borel) subsets B ⊆ Rn , where E is a POVM on Rn . The precise version of this
statement requires that each Ak has purely discrete spectrum (or, equivalently, an ONB
of eigenvectors in H ).
Derivation: In that case, the spectrum is at most countable, and the spectral de-
composition can be written in the form
X
Ak = αk Pk,αk . (18.21)
αk

The probability of Z1 = α1 is kP1,α1 e−iHt1 ψ0 k2 ; the conditional probability of Z2 = α2 ,

given that Z1 = α1 , is kP2,α2 e−iH(t2 −t1 ) ψt1 + k2 with ψt1 + = P1,α1 e−iHt1 ψ0 /kP1,α1 e−iHt1 ψ0 k.
Putting these formulas together (and extending to n measurements), we obtain that

P (Z1 , . . . , Zn ) = (α1 , . . . , αn )
2
= Pn,αn e−iH(tn −tn−1 ) · · · P1,α1 e−iH(t1 −t0 ) ψ0 (18.22)

102
with t0 = 0 (and units of measurement chosen so that ~ = 1), so (18.20) holds with

E {(α1 , . . . , αn )} =
eiH(t1 −t0 ) P1,α1 · · · eiH(tn −tn−1 ) Pn,αn Pn,αn e−iH(tn −tn−1 · · · P1,α1 e−iH(t1 −t0 ) . (18.23)
It becomes clear that E(B) is, in general not a projection but still a positive operator.
One easily verifies that E is a POVM.
Example 18.7. In GRWf, the joint distribution of all flashes is of the form
P(F ∈ B) = hΨ0 |G(B)|Ψ0 i (18.24)
for all sets B ⊆ Z , with Ψ0 the initial wave function and G a POVM on the history
space Z of flashes,
n oN
Z = (t1 , x1 ), (t2 , x2 ), . . . ∈ (R ) : 0 < t1 < t2 < . . .
4 ∞

. (18.25)

Derivation: Consider first the joint distribution of the first two flashes for N = 1
particle: The probability of T1 ∈ [t1 , t1 + dt1 ] is 1t1 >0 e−λt1 λ dt1 ; given T1 , the probability
of X 1 ∈ d3 x1 is, according to (12.11), kC(x1 )ΨT1 − k2 with ΨT1 − = e−iHT1 Ψ0 and C(x1 )
the collapse operator defined in (12.9). Given T1 and X 1 , the probability of T2 ∈
[t2 , t2 + dt2 ] is 1t2 >t1 e−λ(t2 −t1 ) λ dt2 ; given T1 , X 1 , and T2 , the probability of X 2 ∈ d3 x2
is kC(x2 )e−iH(T2 −T1 ) ΨT1 + k2 with ΨT1 + = C(X 1 )ΨT1 − . Putting these formulas together,
the joint distribution of T1 , x1 , T2 , and X 2 is given by

P T1 ∈ [t1 , t1 + dt1 ], X 1 ∈ d3 x1 , T2 ∈ [t2 , t2 + dt2 ], X 2 ∈ d3 x2
2
= 10<t1 <t2 e−λt2 λ2 C(x2 )e−iH(t2 −t1 ) C(x1 )e−iHt1 Ψ0 dt1 d3 x1 dt2 d3 x2 (18.26)

= hΨ0 |G(dt1 × d3 x1 × dt2 × d3 x2 )|Ψ0 i (18.27)

with
G(dt1 × d3 x1 × dt2 × d3 x2 ) = 10<t1 <t2 e−λt2 λ2 ×

× eiHt1 C(x1 )eiH(t2 −t1 ) C(x2 )2 e−iH(t2 −t1 ) C(x1 )e−iHt1 dt1 d3 x1 dt2 d3 x2 , (18.28)
which is self-adjoint and positive because (18.27) is always real and ≥ 0. It follows
that also G(B), obtained by summing (that is, integrating) over all infinitesimal vol-
ume elements in B, is self-adjoint and positive. Additivity holds by construction, and
G(Z ) = I because (18.27) is a probability distribution (so hΨ0 |G(Z )|Ψ0 i = 1 for ev-
ery Ψ0 with kΨ0 k = 1). Thus, G is a POVM. For the joint distribution of more than
two flashes or more than one particle, the reasoning proceeds in a similar way. For the
joint distribution of all (infinitely many) flashes, the rigorous proof requires some more
technical steps38 but bears no surprises.
38
carried out in R. Tumulka: A Kolmogorov Extension Theorem for POVMs. Letters in Mathematical
Physics 84: 41–46 (2008) http://arxiv.org/abs/0710.3605

103
18.2 The Main Theorem about POVMs
It says: For every quantum physical experiment E on a quantum system S whose possible
outcomes lie in a space Z , there exists a POVM E on Z such that, whenever S has
wave function ψ at the beginning of E , the random outcome Z has probability distribution
given by
P(Z ∈ B) = hψ|E(B)|ψi . (18.29)
We will prove this statement in Bohmian mechanics and GRWf. It plays the role of
Born’s rule for POVMs. The experiment E consists of coupling S to an apparatus A at
some initial time ti , letting S ∪ A evolve up to some final time tf , and then reading off
the result Z from A. It is assumed that S and A are not entangled at the beginning of
E:
ΨS∪A (ti ) = ψS (ti ) ⊗ φA (ti ) (18.30)
with φA the ready state of A. (The main theorem of POVMs can also be proven for the
case in which tf is itself chosen by the experiment; e.g., the experiment might wait for a
detector to click, and the outcome Z may be the time of the click. I give the proof only
for the simpler case in which tf is fixed in advance.) I will further assume that E has
only finitely many possible outcomes Z; actually, this assumption is not needed for the
proof, but it simplifies the consideration a bit and is satisfied in every realistic scenario.

Proof from Bohmian mechanics. Since the outcome is read off from the pointer
position,
Z = ζ Q(tf ) , (18.31)
where Q is the Bohmian configuration and ζ is called the calibration function. (In
practice, the function ζ depends only on the configuration of the apparatus, in fact only
on its macroscopic features, not on microscopic details. However, the arguments that
follow apply to arbitrary calibration functions.) Let

U = e−iHS∪A (tf −ti ) (18.32)

and
Bz = {q ∈ R3N : ζ(q) = z} . (18.33)
Then, using the projection operator PB defined in (10.16),

P(Z = z) = P Q(tf ) ∈ Bz (18.34)
Z
= |Ψ(q, tf )|2 dq (18.35)
Bz
= hΨ(tf )|PBz |Ψ(tf )i (18.36)

= hψ ⊗ φ|U † PBz U |ψ ⊗ φi (18.37)

= hψ|Ez |ψiS , (18.38)

104
where h·|·iS denotes the inner product in the Hilbert space of the system S alone (as
opposed to the Hilbert space of S ∪ A), and Ez is defined as follows: For given ψ, form
ψ ⊗ φ, then apply the operator U † PBz U , and finally take the partial inner product with
φ. The partial inner product of a function Ψ(x, y) with the function φ(y) is a function
of x defined as Z
hφ|Ψiy (x) = dy φ∗ (y) Ψ(x, y) . (18.39)

Thus,
Ez ψ = hφ|U † PBz U (ψ ⊗ φ)iy . (18.40)
We now verify that E is a POVM. First, Ez is a positive operator because

hψ|Ez |ψi = hΨ(tf )|PBz |Ψ(tf )i ≥ 0 (18.41)

P
for every ψ. Second, z Ez = I because
X X
Ez ψ = hφ|U † PBz U (ψ ⊗ φ)iy (18.42)
z z
X
= hφ|U † PBz U (ψ ⊗ φ)iy (18.43)
z
= hφ|U † IU (ψ ⊗ φ)iy (18.44)

= hφ|I(ψ ⊗ φ)iy = ψ . (18.45)

Here, we have used that X

PBz = I , (18.46)
z

that U † U = I, and that the partial inner product of ψ ⊗ φ with φ returns ψ. Eq. (18.46)
follows from the fact that the sets Bz form a partition of configuration space R3N (i.e.,
they are mutually disjoint and together cover the entire configuration space, ∪z Bz =
R3N ). This, in turn, follows from the assumption that the calibration function ζ is
defined everywhere in R3N .39 Thus, the proof is complete.

Proof from GRWf. Let F = {(T1 , X 1 ), (T2 , X 2 ), . . .} be the set of flashes (of both S
and A) from ti onwards. We know from Example 18.7 that the distribution of F (i.e.,
the joint distribution of all flashes after ti ) is given by Ψ(ti ) and some POVM G:

P(F ∈ B) = hΨ(ti )|G(B)|Ψ(ti )i . (18.47)

Since the outcome Z of the experiment is read off from A after ti , it is a function of F ,

Z = ζ(F ) . (18.48)
39
The physical meaning of this asumption is that the experiment always has some outcome. You
may worry about the possibility that the experiment could not be completed as planned due to power
outage, meteorite impact, or whatever. This possibility can be taken into account by introducing a
further element f for “failed” into the set Z of possible outcomes.

105
(Z is a function of F because the flashes define where the pointers point, and what the
shape of the ink on a sheet of paper is. It would even be realistic to assume that Z
depends only on the flashes of the apparatus, but this restriction is not needed for the
further argument.)
Let Bz = {f : ζ(f ) = z}, the set of flash patterns having outcome z. Then,

P(Z = z) = P F ∈ Bz (18.49)
= hΨ(ti )|G(Bz )|Ψ(ti )i (18.50)
= hψ|EzGRW |ψi (18.51)

with
EzGRW ψ = hφ|G(Bz )|ψ ⊗ φiy . (18.52)
In fact, EzGRW may be different from Ez obtained from Bohmian mechanics as in (18.40),
in agreement with the fact that the same experiment (using the same initial wave func-
tion of the apparatus, etc.) may yield different outcomes in GRW than in Bohmian
mechanics. (However, since we know the two theories make very very similar predic-
tions, EzGRW will usually be very very close to Ez .) To see that EzGRW is a POVM, we
note that
hψ|EzGRW |ψi = hΨ(t1 )|G(Bz )|Ψ(t1 )i ≥ 0 (18.53)
and
X X
EzGRW ψ = hφ| G(Bz )|ψ ⊗ φiy (18.54)
z z
= hφ|G(∪z Bz )|ψ ⊗ φiy (18.55)

= hφ|I|ψ ⊗ φiy = ψ (18.56)

using ∪z Bz = Z . This completes the proof.

The main theorem about POVMs is equally valid in orthodox quantum mechanics
(OQM). However, since OQM does not permit a coherent analysis of measurement
processes (as it suffers from the measurement problem), we cannot give a complete
proof of the main theorem from OQM, but the same reasoning as given in the proof
from Bohmian mechanics would be regarded as compelling in OQM. At the same time,
the main theorem undercuts the spirit of OQM, which is to leave the measurement
process unanalyzed and to introduce observables by postulate. Put differently, the main
theorem about POVMs makes it harder to ignore the measurement problem.

18.3 Limitations to Knowledge

Corollary 18.8. There is no experiment with Z = ψ or Z = Cψ. That is, one cannot
measure the wave function of a given system, not even up to a global phase.

106
Proof. Suppose there was an experiment with Z = ψ. Then, for any given ψ, Z is
deterministic, i.e., its probability distribution is concentrated on a single point, P(Z =
φ) = δ(φ − ψ). The dependence of this distribution on ψ is not quadratic, and thus not
of the form hψ|Eφ |ψi for any POVM E. The argument remains valid when we replace
ψ by Cψ.
This fact amounts to a limitation of knowledge in any version of quantum mechanics
in which wave functions are part of the ontology, which includes all interpretations of
quantum mechanics that we have talked about: Suppose Alice chooses a direction in
space n, prepares a spin- 12 particle in the state |n-upi, and hands that particle over to
Bob. Then, by Corollary 18.8, Bob has no way of discovering n if Alice does not give the
information away. The best thing Bob can do is, in fact, a Stern–Gerlach experiment in
any direction he likes, say in the z-direction; then he obtains one bit of information, up
or down; if the result was “up” then it is more likely that n lies on the upper hemisphere
than on the lower.

Corollary 18.9. There is no experiment in Bohmian mechanics that can measure the
instantaneous velocity of a particle with unknown wave function.

Proof. Again, the distribution of the velocity Im∇ψ/ψ(Q) with Q ∼ |ψ|2 is not quadratic
in ψ.
In contrast, the asymptotic velocity can be measured, and its probability distribution
is in fact quadratic in ψ: Recall from (7.39) that it is given by (m/~)3 |ψ(mu/~)|
b 2
.
The impossibility of measuring instantaneous velocity goes along with the impossi-
bility to measure the entire trajectory without disturbing it. If we wanted to measure
the trajectory, for example by repeatedly measuring the positions every ∆t with inaccu-
racy ∆x, then the measurements will collapse the wave function, with the consequence
that the observed trajectory is very different from what the trajectory would have been
had we not intervened. Some authors regard this as an argument against Bohmian me-
chanics. Bell disagreed (Speakable and unspeakable in quantum mechanics, page 202):

To admit things not visible to the gross creatures that we are is, in my
opinion, to show a decent humility, and not just a lamentable addiction to
metaphysics.

So, Bell criticized the positivistic idea that anything real can always be measured. In-
deed, this idea seems rather dubious in view of Corollary 18.8. We will sharpen this
consideration in Section 20.3.

18.4 The Concept of Observable

The main theorem about POVMs suggests that POVMs form the natural generalization
of the notion of observables. It also allows us to explain what an observable ultimately
is. Here is the natural general definition:

107
Definition 18.10. Two experiments (that can be carried out on arbitrary wave func-
tions ψ ∈ H with norm 1) are equivalent in law iff for every ψ ∈ H with kψk = 1,
they have the same distribution of the outcome. (Thus, they are equivalent in law iff
they have the same POVM.) A corresponding equivalence class of experiments is called
an observable.
If E1 and E2 are equivalent in law and a particular run of E1 has yielded the outcome
z1 , it cannot be concluded that E2 would have yielded z1 as well. The counterfactual
question, “what would z2 have been if we had run E2 ?” cannot be tested empirically, but
it can be analyzed in Bohmian mechanics; there, one sometimes finds z2 6= z1 (for the
same QS and ψ in both experiments, but different QA and φ). For example, let E1 be a
Stern–Gerlach experiment in the z direction and E2 a Stern–Gerlach experiment in the
−z direction with the outcome called +1 if the particle is detected in the down channel
and −1 if the particle is detected in the up channel. Then E1 and E2 are equivalent
in law, although in Bohmian mechanics, the two experiments will often yield different
results when applied to the same 1-particle wave function and position.
This situation illustrates why the term “observable” can be rather misleading: It is
intended to suggest “observable quantity,” but an observable is not even a well-defined
quantity to begin with (as the outcome Z depends on QA and φ), it is a class of
experiments with equal probability distributions.
This point is connected to Wheeler’s fallacy. Recall the delayed choice experiment,
but now consider detecting the particle either directly at the slits or far away, ignoring
the interference region. As E1 , we put detectors directly at the slits and say that the
outcome is Z1 = +1 if the particle was detected in the left slit and Z1 = −1 if in the
right. This is a kind of position measurement that can be represented in the 2d Hilbert
space formed by wave functions of the form

ψ = c1 |left sliti + c2 |right sliti , (18.57)

so P(Z1 = +1) = |c1 |2 . Relative to the basis {|left sliti, |right sliti}, the POVM is the
spectral PVM of σ3 . As E2 , we put the detectors far away and say that Z2 = +1 if the
particle was detected in the far right and Z2 = −1 if in the far left. ψ evolves to

ψ 0 = c1 |far righti + c2 |far lefti , (18.58)

so P(Z2 = +1) = |c1 |2 . So, Z1 and Z2 have the same distribution, E1 and E2 have the
same POVM, and the two experiments are equivalent in law, although we know that the
Bohmian particle often passes through the right slit and still ends up on the far right.
Now comes the point that has confused a number of authors40 : Since E1 measures the
“position observable,” and since E1 and E2 “measure” the same observable, it is clear
that E2 also measures the position observable. People concluded that E2 “measures
through which slit the particle went”—Wheeler’s fallacy! People concluded further that
since the Bohmian trajectory may pass through the left slit while Z2 = −1, Bohmian
40
For example (using a different but similar setup), B.-G. Englert, M.O. Scully, G. Süssmann, and
H. Walther: Surrealistic Bohm Trajectories. Zeitschrift für Naturforschung A 47: 1175–1186 (1992)

108
mechanics must somehow disagree with measured facts about which slit the particle
went through. Bad, bad Bohm!

109
19 Time of Detection
19.1 The Problem
Suppose we set up a detector, wait for the arrival of the particle at the detector, and
measure the time T at which the detector clicks. What is the probability distribution
of T ? This is a natural question not covered by the usual quantum formalism because
there is no self-adjoint operator for time. But from the main theorem about POVMs it
is clear that there must be a POVM E such that

P(T ∈ B) = hψ0 |E(B)|ψ0 i . (19.1)

That is, time of detection is a generalized observable. In this section we take a look at
this POVM E.

ψ0

Figure 4: A quantum particle in a region Ω surrounded by a surface Σ = ∂Ω made

out of detectors (symbolized by ⊥’s), each of which is connected to a pointer. In part,
the figure depicts the situation before the experiment, as the initial wave function ψ0 is
symbolized by a wave, and in part the situation after the experiment, as the location
of detection is indicated by one pointer in the triggered position. Figure adapted from
page 347 of D. Dürr and S. Teufel: Bohmian mechanics, Springer-Verlag (2009)

Suppose that we form a surface Σ ⊂ R3 out of little detectors so we can measure the
time and the location at which the quantum particle first crosses Σ. Suppose further
that, as depicted in Figure 4, Σ divides physical space R3 into two regions, Ω and
its complement, and the particle’s initial wave function ψ0 is concentrated in Ω. The
outcome of the experiment is the pair Z = (T, X) of the time T ∈ [0, ∞) of detection

110
and the location X ∈ Σ of detection; should no detection ever occur, then we write
Z = ∞. So the value space of E is Z = [0, ∞) × Σ ∪ {∞}. We want to compute the
distribution of Z from ψ0 .
Let us compare the problem to Born’s rule. In Born’s rule, we choose a time t0
and measure the three position coordinates at time t0 ; here, if we take Ω to be the half
space {(x, y, z) : x > x0 } and Σ its boundary plane {(x, y, z) : x = x0 }, then we choose
the value of one position coordinate (x0 ) and measure the time as well as the other
two position coordinates when the particle reaches that value. Put differently in terms
of space-time R4 = {(t, x, y, z)}, Born’s rule concerns measuring where the particle
intersects the spacelike hypersurface {t = t0 }, and our problem concerns measuring
where the particle intersects the timelike hypersurface {x = x0 }. We could say that we
need a Born rule for timelike hypersurfaces.
I should make three caveats, though.

• I have used language such as “particle arriving at a surface” that presupposes the
existence of trajectories although we know that some theories of quantum me-
chanics (GRWm and GRWf) claim that there are no trajectories, and still these
theories are approximately empirically equivalent to Bohmian mechanics, so the
time and location of the detector click would have approximately the same dis-
tribution as in Bohmian mechanics. Our problem really concerns the distribution
of the detection events, and we should keep in mind that in some theories the
trajectory language cannot be taken seriously.

• Even in Bohmian mechanics, there is a crucial difference between the case with
the spacelike hypersurface and the one with the timelike hypersurface: The point
where the particle arrives on the timelike hypersurface {x = x0 } may depend on
whether or not detectors are present on that hypersurface. A detector that does
not click may still affect ψ and thus the future particle trajectory. That is why
I avoid the expression “time of arrival” (which is often used in the literature) in
favor of “time of detection.” In contrast, the point where the particle arrives at
the spacelike hypersurface {t = t0 } does not depend on whether or not detectors
are placed along {t = t0 }.

• The exact POVM E is given by (18.40) (with tf some late time at which we read
off the values of T and X recorded by the apparatus) and will depend on the exact
wave function of the detectors, so different detectors will lead to slightly different
POVMs. Of course, we expect that these differences are negligible. What we want
is a simple rule defining the POVM for an ideal detector, Eideal . That, of course,
involves making a definition of what counts as an ideal detector. So the formula
for Eideal is in part a matter of definition, as long as it fits well with the POVMs
E of real detectors.

111
19.2 The Absorbing Boundary Rule
The question of what Eideal is is not fully settled; I will describe the most plausible
proposal, the absorbing boundary rule.41 Such a rule was for a long time believed to be
impossible because of the quantum Zeno effect and Allcock’s paradox (see homework
exercises). Henceforth I will write E instead of Eideal . Let Σ = ∂Ω, ψ0 be concentrated
in Ω, kψ0 k = 1, and let κ > 0 be a constant of dimension 1/length (it will be a parameter
of the detector). Here is the rule:

Solve the Schrödinger equation

∂ψ ~2 2
i~ =− ∇ ψ+Vψ (19.2)
∂t 2m
in Ω with potential V : Ω → R and boundary condition
∂ψ
(x) = iκψ(x) (19.3)
∂n
at every x ∈ Σ, with ∂/∂n the outward normal derivative on the surface, ∂ψ/∂n :=
n(x) · ∇ψ(x) with n(x) the outward unit normal vector to Σ at x ∈ Σ. Then, the rule
asserts,
Zt2 Z
Pψ0 t1 ≤ T < t2 , X ∈ B = dt d2 x n(x) · j ψt (x) (19.4)
t1 B

for any 0 ≤ t1 < t2 and any set B ⊆ Σ, with d2 x the surface area element and j ψ the
probability current vector field (2.16). In other words, the joint probability density of T
and X relative to dt d2 x is the normal component of the current across the boundary,
jnψt (x) = n(x) · j ψt (x). Furthermore,
Z∞ Z
Pψ0 (Z = ∞) = 1 − dt d2 x n(x) · j ψt (x) . (19.5)
0 Ω

This completes the statement of the rule.

Let us study the properties of the rule. To begin with, the boundary condition (19.3)
implies that the current vector j at the boundary is always outward-pointing: For every
x ∈ Σ,

∗ ∂ψ
~κ
∗
n(x) · j(x) = ~
m
Im ψ(x) (x) = ~
m
Im ψ(x) iκψ(x) = |ψ(x)|2 ≥ 0 . (19.6)
∂n m
41
R. Werner: Arrival time observables in quantum mechanics. Annales de l’Institut Henri Poincaré,
section A 47: 429–449 (1987)
R. Tumulka: Distribution of the Time at Which an Ideal Detector Clicks. (2016) http://arxiv.
org/abs/1601.03715

112
For this reason, (19.3) is called an absorbing boundary condition: It implies that there
is never any current coming out of the boundary. In particular, the right-hand side of
(19.4) is non-negative.
So the rule invokes a new kind of time evolution for a 1-particle wave function as
an effective treatment of the whole system formed by the 1 particle and the detec-
tors together. It is useful to picture the Bohmian trajectories for this time evolution.
Eq. (19.6) implies that the Bohmian velocity field v(x) is always outward-pointing at
the boundary, n(x) · v(x) > 0 for all x ∈ Σ; in fact, the normal velocity is prescribed,
n(x)·v(x) = ~κ/m. In particular, Bohmian trajectories can cross Σ only in the outward
direction; when they do, they end on Σ, as ψ is not defined behind Σ. Put differently,
no Bohmian trajectories begin on Σ, they all begin at t = 0 in Ω with |ψ0 |2 distribu-
tion. In fact, the right-hand side of (19.4) is exactly the probability distribution of the
space-time point at which the Bohmian trajectory reaches the boundary. That is not
surprising, as in a Bohmian world we would expect the detector to click when and where
the particle reaches the detecting surface. As a further consequence, the right-hand side
of (19.5) is exactly the probability that the Bohmian trajectory never reaches Σ. In
particular, (19.4) and (19.5) together define a probability distribution on Z . Had we
evolved ψ0 with the Schrödinger equation on R3 without boundary condition on Σ, then
some Bohmian trajectories may cross Σ several times in both directions; this illustrates
that the trajectory in the presence of detectors can be different from what it would have
been in the absence of detectors.
Since probability can only be lost at the boundary, never gained,
Z
2
kψt k = d2 x |ψt (x)|2 (19.7)
Ω

can only decrease with t, never increase. So here we are dealing with a new kind
of Schrödinger equation whose time evolution is not unitary as the norm of ψ is not
conserved. The time evolution operators Wt , defined by the property Wt ψ0 = ψt , have
the following properties: First, they are not unitary but satisfy kWt ψk ≤ kψk; such
operators are called contractions. Second, Ws Wt = Ws+t and W0 = I; a family (Wt )t≥0
with this property is called a semigroup. Thus, the Wt form a contraction semigroup.
In fact, kψt k2 is the probability that the Bohmian particle is still somewhere in Ω
at time t, that is, has not reached the boundary yet. In particular, as an alternative to
(19.5) we can write
P(Z = ∞) = lim kψt k2 . (19.8)
t→∞

The conclusions from our considerations about Bohmian trajectories can also be
obtained from the Ostrogradskii–Gauss integral theorem (divergence theorem) in 4 di-
mensions: The 4-vector field j = (ρ, j) has vanishing 4-divergence, as that is exactly
what the continuity equation (2.16) expresses. Integrating the divergence over [0, t] × Ω

113
yields
Z t Z
0
0= d3 x div j(t0 , x)
dt (19.9)
Z0 Ω
Z Z t Z
= 3 3
d x ρ(t, x) − d x ρ(0, x) + dt0 d2 x n(x) · j(t0 , x) (19.10)
Ω
Z t ΩZ 0 Σ

= kψt k2 − 1 + dt0 d2 x n(x) · j(t0 , x) . (19.11)

0 Σ

Since the last integrand is non-negative, kψt k2 is decreasing with time and equals 1−
the flux of j into the boundary during [0, t]. In particular,
Z∞ Z
0
2
lim kψt k = 1 − dt d2 x n(x) · j(t0 , x) , (19.12)
t→∞
0 Σ

so (19.5) is non-negative, and (19.4) and (19.5) together define a probability distribution.
So what is the POVM E? It is given by
~κ †
E dt × d2 x = Wt |xihx| Wt dt d2 x (19.13)
m
E({∞}) = lim Wt† Wt . (19.14)
t→∞

Since the E(dt) are not projections, there are in general no eigenstates of detection time.
Variants of the absorbing boundary rule have been developed for moving surfaces,
systems of several detectable particles, and particles with spin.42

42
R. Tumulka: Detection Time Distribution for Several Quantum Particles. http://arxiv.org/
abs/1601.03871

114
20 Density Matrix and Mixed State
In this chapter we prove a limitation to knowledge in quantum mechanics that follows
from the main theorem about POVMs. Let

S(H ) = {ψ ∈ H : kψk = 1} (20.1)

denote the unit sphere in Hilbert space. Suppose that we have a mechanism that gener-
ates random wave functions Ψ ∈ S(H ) with probability distribution µ on S(H ). Then
it is impossible to determine µ empirically. In fact, there exist different distributions
µ1 6= µ2 that are empirically indistinguishable, i.e., they lead to the same distribution of
outcomes Z for any experiment. We call such distributions empirically equivalent (which
is an equivalence relation) and show that the equivalence classes are in one-to-one cor-
respondence with certain operators known as density matrices or density operators.
To describe these matters, we need the mathematical concept of trace.

20.1 Trace
Definition 20.1. The trace of a matrix A = (Amn ) is the sum of its diagonal elements.
The trace of an operator T is defined to be the sum of the diagonal elements of its
matrix representation Tnm = hn|T |mi relative to an arbitrary ONB {|ni},
∞
X
tr T = hn|T |ni . (20.2)
n=1

Every positive operator either has finite trace or has trace +∞, and the value of the
trace does not depend on the choice√of ONB. The trace class is the set of those operators
T for which the positive operator T † T has finite trace. For every operator from the
trace class, the trace is finite and does not depend on the ONB.
The trace has the following properties for all operators A, B, . . . from the trace class:
(i) The trace is linear:

tr(A + B) = tr A + tr B , tr(λA) = λ tr A (20.3)

for all λ ∈ C.

(ii) The trace is invariant under cyclic permutation of factors:

tr(AB · · · Y Z) = tr(ZAB · · · Y ) . (20.4)

In particular tr(AB) = tr(BA) and tr(ABC) = tr(CAB), which is, however, not
always the same as tr(CBA).

(iii) If an operator T can be diagonalized, i.e., if there exists an orthonormal basis of

eigenvectors, then tr(T ) is the sum of the eigenvalues, counted with multiplicity
(= degree of degeneracy).

115
(iv) The trace of the adjoint operator T † is the complex-conjugate of the trace of T :
tr(T † ) = tr(T )∗ .

(v) The trace of a self-adjoint operator T is real.

(vi) If T is a positive operator then tr(T ) ≥ 0.

20.2 The Trace Formula in Quantum Mechanics

Suppose that (by whatever mechanism) we have generated a random wave function
Ψ ∈ S(H ) with probability distribution µ on S(H ). Then for any experiment E with
POVM E, the probability distribution of the outcome Z is
Z
P(Z ∈ B) = EhΨ|E(B)|Ψi = µ(dψ) hψ|E(B)|ψi = tr(ρµ E(B)) , (20.5)
S(H )

where E means expectation, and

Z
ρµ = E|ΨihΨ| = µ(dψ) |ψihψ| (20.6)
S(H )

is called the density operator or density matrix (rarely: statistical operator ) of the
distribution µ. Eq. (20.5) is called the trace formula. It was discovered by John von
Neumann in 1927,43 except that von Neumann did not know POVMs and considered
only PVMs. In case the distribution µ is concentrated on discrete points on S(H ),
(20.6) becomes X
ρµ = E|ΨihΨ| = µ(ψ) |ψihψ| . (20.7)
ψ

In order to verify (20.5), note first that

tr |ψihψ| E = hψ|E|ψi (20.8)

because, if we choose the basis {|ni} in (20.2) such that |1i = ψ, then the summands
in (20.2) are hn|ψihψ|E|ni, which for n = 1 is hψ|E|ψi and for n > 1 is zero because
hn|1i = 0. By linearity, we also have that
X X
tr µ(ψj )|ψj ihψj | E = µ(ψj ) hψj |E|ψj i , (20.9)
j j

which yields (20.5) for any µ that is concentrated on finitely many points ψj on S(H ).
One can prove (20.5) for arbitrary probability distribution µ by considering limits.
43
J. von Neumann: Wahrscheinlichkeitstheoretischer Aufbau der Quantenmechanik. Göttinger
Nachrichten 1(10): 245–272 (1927). Reprinted in John von Neumann: Collected Works Vol. I,
A.H. Taub (editor), Oxford: Pergamon Press (1961)

116
Now let us draw conclusions from the formula (20.5). It implies that the distribution
of the outcome Z depends on µ only through ρµ . Different distributions µa , µb can
have the same ρ = ρµa = ρµb ; for example, if H = C2 then the uniform distribution
over S(H ) has ρ = 12 I, and for every orthonormal basis |φ1 i, |φ2 i of C2 the probability
distribution
1
δ + 21 δφ2
2 φ1
(20.10)
also has ρ = 12 I. Such two distributions µa , µb will lead to the same distribution of
outcomes for any experiment, and are therefore empirically indistinguishable.

20.3 Limitations to Knowledge

We can turn this result into an argument showing that there must be facts we cannot
find out by experiment: Suppose I choose between two options, I choose µ to be either
µa or µb . Suppose that each µ is of the form (20.10), µa for the eigenbasis of σ3 and µb
for that of σ1 . Then I choose n = 10, 000 points ψi on S(H ) at random independently
with µ, then I prepare n systems with wave functions ψi , and then I hand these systems
to you with the challenge to determine whether µ = µa or µ = µb . As a consequence
of (20.5), you cannot determine that by means of experiments on the n systems. On
the other hand, nature knows the right answer, as I will argue now. I have kept records
of each ψi , so I can make a list of the m ≈ 5, 000 systems that I prepared in φ1 . I tell
you that I did choose µb , I give you the list and predict that for all m systems on the
list, a quantum measurement of σ1 will yield +1, while for all others it will yield −1.
By the laws of quantum mechanics, you will find my prediction confirmed. But had I
prepared half of all systems in |z-upi and the other half in |z-downi, then all outcomes
of σ1 -measurements would have had to be random with equal probability for +1 and −1,
so my predictions would have been wrong in about half of the cases. Thus, nature must
remember at least whether it was a mixture of σ1 -eigenvectors or of σ3 -eigenvectors. (In
fact, nature must remember much more, viz., which systems exactly must yield +1 upon
measurement of σ1 .) There is a fact in nature (viz., whether µ = µa or µ = µb ) that we
cannot discover empirically. Nature can keep a secret. Limitations to knowledge are a
fact of quantum mechanics, regardless of which interpretation we prefer.

20.4 Density Matrix and Dynamics

If the random vector Ψ evolves according to the Schrödinger equation, Ψt = e−iHt/~ Ψ,
the distribution changes into µt and the density matrix into

ρt = e−iHt/~ ρeiHt/~ . (20.11)

In analogy to the Schrödinger equation, this can be written as a differential equation,

dρt
= − ~i [H, ρt ] , (20.12)
dt

117
known as the von Neumann equation. The step from (20.11) to (20.12) is based on the
fact that
d At
e = AeAt = eAt A . (20.13)
dt
A density matrix is also often called a quantum state. If ρ = |ψihψ| with kψk = 1,
then ρ is usually called a pure quantum state, otherwise a mixed quantum state. A
probability distribution µ has ρµ = |ψihψ| if and only if µ is concentrated on Cψ, i.e.,
Ψ = eiΘ ψ with a random global phase factor.
As we have seen, a density matrix ρ is always a positive operator with tr ρ = 1.
Conversely, every positive operator ρ with tr ρ = 1 is a density matrix, i.e., ρ = ρµ for
some probability distribution µ on S(H ). Here is one such µ: find an orthonormal basis
{|φn i : n ∈ N} of eigenvectors of ρ with eigenvalues pn ∈ [0, ∞). Then
X
pn = tr ρ = 1 . (20.14)
n

Now let µ be the distribution that gives probability pn to φn ; its density matrix is just
the ρ we started with.

118
21 Reduced Density Matrix and Partial Trace
There is another way in which density matrices arise, leading to what is called the
reduced density matrix. Suppose that the system under consideration consists of two
parts, system a and system b, so that its Hilbert space is H = Ha ⊗ Hb .

Theorem 21.1. In Bohmian mechanics, an experiment in which the apparatus interacts

only with system a but not with system b has a POVM of the form

E(B) = Ea (B) ⊗ Ib , (21.1)

where Ib is the identity on Hb .

Proof: homework exercise. A similar theorem holds in GRW theory.

In the case (21.1), the distribution of the outcome is

P(Z ∈ B) = hψ|E(B)|ψi = tr ρψ Ea (B) (21.2)

with the reduced density matrix of system a

ρψ = trb |ψihψ| , (21.3)

where trb means the partial trace over Hb . The reduced density matrix and the trace
formula for it were discovered by Lev Landau in 1927.44

21.1 Partial Trace

This means the following. Let {φan } be an orthonormal basis of Ha and {φbn } an or-
thonormal basis of Hb . Then {φan ⊗ φbm } is an orthonormal basis of H = Ha ⊗ Hb . If T
is an operator on H then the operator S = trb T on Ha is characterized by its matrix
elements ∞
X
hφan |S|φak i = hφan ⊗ φbm |T |φak ⊗ φbm i , (21.4)
m=1

where the inner products on the right hand side are inner products in Ha ⊗ Hb . We
will sometimes write ∞
X
S= hφbm |T |φbm i , (21.5)
m=1

where the inner products are partial inner products.

The partial trace has the following properties:

(i) It is linear:
trb (S + T ) = trb (S) + trb (T ) , trb (λT ) = λ trb (T ) (21.6)
44
L. Landau: Das Dämpfungsproblem in der Wellenmechanik. Zeitschrift für Physik 45: 430–441
(1927)

119
(ii) tr(trb (T )) = tr(T ). Here, the first tr symbol means the trace in Ha , the second
one the partial trace, and the last one the trace in Ha ⊗ Hb . This property follows
from (21.4) by setting k = n and summing over n.

(iii) trb (T † ) = (trb T )† . The adjoint of the partial trace is the partial trace of the
adjoint. In particular, if T is self-adjoint then so is trb T .

(iv) trb (Ta ⊗ Tb ) = (tr Tb )Ta .

(v) If T is a positive operator then so is trb T .

(vi) trb S(Ta ⊗ Ib ) = (trb S)Ta .

(vii) trb S(Ia ⊗ Tb ) = trb (Ia ⊗ Tb )S .

21.2 The Trace Formula (21.2)

From properties (vi) and (ii) we obtain that

tr S(Ta ⊗ Ib ) = tr (trb S)Ta . (21.7)

Setting S = |ψihψ| and Ta = Ea (B), we find that trb S = ρψ and

hψ|Ea (B) ⊗ Ib |ψi = tr |ψihψ|(Ea (B) ⊗ Ib ) = tr ρψ Ea (B) , (21.8)

which proves (21.2).

From properties (ii) and (v) it follows also that ρψ is a positive operator with trace
P operator ρ on Ha with trP
1. Conversely, every positive ρ = 1 arises as a reduced density
matrix. Indeed, if ρ = n pn |φn ihφn | with pn ≥ P 0,√ n pn = 1 and orthonormal φn ,
then choose any ONB {χm } of Hb and set ψ = n pn φn ⊗ χn . Then ψ ∈ Ha ⊗ Hb ,
kψk = 1, and trb |ψihψ| = ρ.

21.3 Statistical Reduced Density Matrix

Statistical density matrices as in (20.6) and reduced density matrices can be combined:
If Ψ ∈ Ha ⊗ Hb is random then set

ρ = E trb |ΨihΨ| = trb E |ΨihΨ| . (21.9)

21.4 The Measurement Problem Again

Statistical and reduced density matrices sometimes get confused; here is an example.
Consider again the wave function of the measurement problem,
X
Ψ= Ψα , (21.10)
α

120
the wave function ofPan object and an apparatus after a quantum measurement of
the observable A = αPα . Suppose that Ψα , the contribution corresponding to the
outcome α, is of the form
Ψα = cα ψα ⊗ φα , (21.11)
where cα = kPα ψk, ψ is the initial object wave function ψ, ψα = Pα ψ/kPα ψk, and φα
with kφα k = 1 is a wave function of the apparatus after having measured α. Since the
φα have disjoint supports in configuration space, they are mutually orthogonal; thus,
they are a subset of some orthonormal basis {φn }. The reduced density matrix of the
object is X X
ρΨ = trb |ΨihΨ| = hφn |ΨihΨ|φn i = |cα |2 |ψα ihψα | . (21.12)
n α

This is the same density matrix as the statistical density matrix associated with the
probability distribution µ of the collapsed wave function ψ 0 ,
X
µ= |cα |2 δψα , (21.13)
α

since X
ρµ = |cα |2 |ψα ihψα | . (21.14)
α

It is sometimes claimed that this fact solves the measurement problem. The argu-
ment is this: From (21.10) we obtain (21.12), which is the same as (21.14), which means
that the system’s wave function has distribution (21.13), so we have a random outcome
α. This argument is incorrect, as the mere fact that two situations—one with Ψ as
in (21.10), the other with random ψ 0 —define the same density matrix for the object
does not mean the two situations are physically equivalent. And obviously from (21.10),
the situation after a quantum measurement involves neither a random outcome nor a
random wave function. As John Bell once put it, “and is not or.”

It is sometimes taken as the definition of decoherence that the reduced density matrix
is (approximately) diagonal in the eigenbasis of the relevant operator A. In a previous
lecture I had defined decoherence as the situation that two or more wave packets Ψα are
macroscopically disjoint in configuration space (and thus remain disjoint for the relevant
future). The connection between the two definitions is that the latter implies the former
if Ψα is of the form (21.11).

It is common to call a density matrix that is a 1-dimensional projection a pure state

and otherwise a mixed state, even if it is a reduced density matrix and thus does not
arise from a mixture (i.e., from a probability distribution µ). A reduced density matrix
ρψ is pure if and only if ψ is a tensor product, i.e., there are χa ∈ Ha and χb ∈ Hb such
that ψ = χa ⊗ χb .

121
21.5 The No-Signaling Theorem
The no-signaling theorem is a consequence of the quantum formalism: If system a is
located in Alice’s lab and system b in Bob’s, and if the two labs do not interact, then the
statistical reduced density matrix of system a is (i) not affected by any measurement
Bob performs, and (ii) does not depend on the Hamiltonian of system b.
To verify (i), suppose that systems a and b together have wave function ψ ∈ Ha ⊗Hb ,
and that Bob measures the observable B, which is a self-adjoint operator on Hb . Let
β denote the eigenvalues of B and Pβ the projection to the eigenspace of eigenvalue β.
The probability that Bob obtains the outcome β is

P(Z = β) = hψ|Ia ⊗ Pβ |ψi . (21.15)

= trb |ψihψ| = ρψ . (21.20)

To verify (ii), note that in the absence of interaction the unitary time evolution
operator is Ut = Ua,t ⊗ Ub,t . Thus, the reduced density matrix evolves according to

ρt = trb |Ut ψihUt ψ| (21.21)

which does not depend on Ub,t . The argument extends without difficulty to statistical
reduced density matrices.

122
21.6 Canonical Typicality
This is an application of reduced density matrices in quantum statistical mechanics.
The main goal of quantum statistical mechanics is to derive facts of thermodynamics
from a quantum mechanical analysis of systems with a macroscopic number of particles
(say, N > 1020 ). One of the rules of quantum statistical mechanics asserts that if a
quantum system S is in thermal equilibrium at absolute temperature T ≥ 0, then it has
density matrix
1
ρcan = e−βHS , (21.27)
Z
where Hs is the system’s Hamiltonian, β = 1/kT with k = 1.38 · 10−23 J/K the
Boltzmann constant, and Z = tre−βH the normalizing factor; ρcan is called the canonical
density matrix with inverse temperature β.
While this rule has long been used, its justification is rather recent (2006) and goes
as follows. Suppose that S is coupled to another system B (the “heat bath”), and
suppose that S and B together have wave function ψ ∈ HS ⊗ HB and Hamiltonian H
with pure point spectrum (this comes out for systems confined to finite volume). Let
Imc = [E, E + ∆E] be an energy interval whose length ∆E is small on the macroscopic
scale but large enough for Imc to contain very many eigenvalues of H; Imc is called a
micro-canonical energy shell. Let Hmc be the corresponding spectral subspace, i.e., the
range of 1Imc (H), and umc the uniform probability distribution over S(Hmc ).

Theorem 21.2. (canonical typicality, informal statement) If B is sufficiently “large,”

and if the interaction between S and B is negligible,

H ≈ HS ⊗ IB + IS ⊗ HB , (21.28)

then for most ψ relative to umc , the reduced density matrix of S is approximately canon-
ical for some value of β, i.e.,
trB |ψihψ| ≈ ρcan . (21.29)

In order to arrive at a typical ψ ∈ S(Hmc ) (and thus at thermal equilibrium between

S and B), it will be relevant to have some interaction between S and B. Large interaction
terms in H, however, will lead to deviations from the form (21.27). It is relevant for
(21.29) that S and B are entangled: If they were not, then the reduced density matrix
of S would be pure, whereas ρcan is usually highly mixed (i.e., has many eigenvalues
that are significantly nonzero).
Canonical typicality explains why we see canonical density matrices: Because “most”
wave functions of S ∪ B lead to a canonical density matrix for S.

123
22 Quantum Logic
The expression “quantum logic” is used in the literature for (at least) three different
things:

• a certain piece of mathematics that is rather pretty;

• a certain analogy between two formalisms that is rather limited;

• a certain philosophical idea that is rather silly.

Logic is the collection of those statements and rules that are valid in every conceivable
universe and every conceivable situation. Some people have suggested that logic simply
consists of the rules for the connectives “and”, “or,” and “not”, with “∀x ∈ M ” an
extension of “and” and “∃x ∈ M ” an extension of “or” to (possibly infinite) ranges M .
I would say that viewpoint is not completely right (because of Gödel’s theorem45 ) and
not completely wrong. Be that as it may, let us focus for a moment on the operations
“and” (conjunction A ∧ B), “or” (disjunction A ∨ B), and “not” (negation ¬A), and let
us ignore infinite conjunctions or disjunctions.
A Boolean algebra is a set A of elements A, B, C, . . . of which we can form A ∧ B,
A ∨ B, and ¬A, such that the following rules hold:

• ∧ and ∨ are associative, commutative, and idempotent (A∧A = A and A∨A = A).

• Absorption laws: A ∧ (A ∨ B) = A and A ∨ (A ∧ B) = A.

• There are elements 0 ∈ A (“false”) and 1 ∈ A (“true”) such that for all A ∈ A ,
A ∧ 0 = 0, A ∧ 1 = A, A ∨ 0 = A, A ∨ 1 = 1.

• Complementation laws: A ∧ ¬A = 0, A ∨ ¬A = 1.

• Distributive laws: A∧(B∨C) = (A∧B)∨(A∧C) and A∨(B∧C) = (A∨B)∧(A∨C).

It follows from these axioms that ¬(¬A) = A, and that de Morgan’s laws hold, ¬A ∨
¬B = ¬(A ∧ B) and ¬A ∧ ¬B = ¬(A ∨ B).
The laws of logic for “and,” “or,” and “not” are exactly the laws that hold in
every Boolean algebra, with A, B, C, . . . playing the role of statements or propositions
or conditions. Another case in which these axioms are satisfied is that A, B, C, . . . are
sets, more precisely subsets of some set Ω, A ∧ B means the intersection A ∩ B, A ∨ B
means the union A ∪ B, ¬A means the complement Ac = Ω \ A, 0 means the empty set
∅, and 1 means the full set Ω. That is, every family A of subsets of Ω that contains
Ω and is closed under complement and intersection (in particular, every σ-algebra) is
a Boolean algebra. (It turns out that also, conversely, every Boolean algebra can be
realized as a family of subsets of some set Ω.)
45
Gödel provides an exampe of a statement that is true about the natural numbers, so it follows from
the Peano axioms, but cannot be derived from the Peano axioms using the standard rules of logic, thus
showing that these rules are incomplete.

124
Now let A, B, C, . . . be subspaces of a Hilbert space H (more precisely, closed sub-
spaces, which makes no difference in finite dimension where every subspace is closed); let
A ∧ B := A ∩ B, A ∨ B := span(A ∪ B) (the smallest closed subspace containing both A
and B), and ¬A := A⊥ = {ψ ∈ H : hψ|φi = 0 ∀φ ∈ A} be the orthogonal complement
of A; let 0 = {0} be the 0-dimensional subspace and 1 = H the full subspace. Then
all axioms except distributivity are satisfied. So this structure is no longer a Boolean
algebra; it is called an orthomodular lattice or simply lattice. Hence, a distributive lattice
is a Boolean algebra, and the closed subspaces form a non-distributive lattice L(H ).
That is nice mathematics, and we will see more of that in a moment. The analogy I
mentioned holds between L(H ) and Boolean algebras, often understood as representing
the rules of logic. The analogy is that both are lattices. In order to emphasize the
analogy, some authors call the elements of L(H ) “propositions” and the operations
∧, ∨, and ¬ “and,” “or,” and “not.” They call L(H ) the “quantum logic” and say
things like, A ∈ L(H ) is a yes-no question that you can ask about a quantum system,
as you can carry out a quantum measurement of the projection to A and get result 0
(no) or 1 (yes).
Here is why the analogy is rather limited. Let me give two examples.

• First, consider a spin- 21 particle with spinor ψ ∈ C2 , and consider the words “ψ
lies in C|upi.” These words sound very much like a proposition, let me call it P,
and indeed they naturally correspond to a subspace of H = C2 , viz., C|upi. Now
the negation of P is, of course, “ψ lies in H \ C|upi,” whereas the orthogonal
complement of C|upi is C|downi. Let me say that again in different words: The
negation of “spin is up” is not “spin is down,” but “spin is in any direction but
up.”

• Second, consider the delayed-choice experiment in the form discussed at the end of
Section 18.4: forget about the interference region and consider just the two options
of either putting detectors in the two slits or putting detectors far away. The first
option has the PVM Pleft + Pright = I, the second to the PVM U † Pfar right U +
U † Pfar left U = I, where U is the unitary time evolution from the slits to the
far regions where the detectors are placed. The two PVMs are identical, as
U † Pfar right U = Pleft (and likewise for the other projection); that is, we have two
experiments associated with the same observable. If we think of subspaces as
propositions, then it is natural to think of the particle passes through the left slit
as a proposition and identify it with the subspace A that is the range of Pleft . But
if we carry out the second option, detect the particle on the far right, and say that
we have confirmed the proposition A and thus that the particle passed through
the left slit, then we have committed Wheeler’s fallacy.

The philosophical idea that I mentioned is that logic as we know it is false, that
it applies in classical physics but not in quantum physics, and that a different kind of
logic with different rules applies in quantum physics—a quantum logic. Why did I call
that a rather silly idea? Because logic is, by definition, what is true in every conceivable

125
situation. So logic cannot depend on physical laws and cannot be revised by empirical
science. As Tim Maudlin once nicely said:

There is no point in arguing with somebody who does not believe in logic.

Bell wrote in Against “measurement” (1989, page 216 in the 2nd edition of Speakable
and unspeakable in quantum mechanics):

When one forgets the role of the apparatus, as the word “measurement”
makes all too likely, one despairs of ordinary logic—hence “quantum logic.”
When one remembers the role of the apparatus, ordinary logic is just fine.

Nevertheless, there is more mathematics relevant to L(H ), something analogous to

probability theory. Recall that a probability distribution on a set Ω is a normalized
measure, that is, a mapping µ from subsets of Ω to [0, 1] that is σ-additive and satisfies
µ(1) = µ(Ω) = 1. The domain of definition of µ is a σ-algebra, which is a Boolean
algebra with slightly stronger requirements. By analogy, we define that a normalized
quantum measure is a mapping µ̂ : L(H ) → [0, 1] that satisfies µ̂(1) = µ̂(H ) = 1 and
is σ-additive, i.e.,
∞
_ X ∞
µ̂ An = µ̂(An ) (22.1)
n=1 n=1

whenever An ⊥ Am for all n 6= m. (The relation A ⊥ B can be expressed through lattice

operations as A ≤ (¬B), with A ≤ C defined to mean A ∨ C = C or, equivalently,
A ∧ C = A. In L(H ), A ≤ B ⇔ A ⊆ B.)

Theorem 22.1. (Gleason’s theorem46 ) Suppose the dimension of H is at least 3 and

at most countably infinite. Then the normalized quantum measures are exactly the map-
pings µ̂ of the form
µ̂(A) = tr(ρPA ) ∀A ∈ L(H ) , (22.2)
where PA denotes the projection to A and ρ is a density matrix (i.e., a positive operator
with trace 1).

This amazing parallel between probability measures and density matrices has led
some authors to call elements of L(H ) “events” (as one would call subsets of Ω). Again,
this is a rather limited analogy, for the same reasons as above.

46
A.M. Gleason: Measures on the closed subspaces of a Hilbert space. Indiana University Mathe-
matics Journal 6: 885–893 (1957)

126
23 No-Hidden-Variables Theorems
This name refers to a collection of theorems that aim at proving the impossibility of
hidden variables. This aim may seem strange in view of the fact that Bohmian mechan-
ics is a hidden-variable theory, is consistent and makes predictions in agreement with
quantum mechanics. So how could hidden variables be impossible? A first observa-
tion concerns what is meant by “hidden variables.” Most no-hidden-variable theorems
(NHVTs) address the idea that every observable A (a self-adjoint operator) has a true
value vA in nature (the “hidden variable”), and that a quantum measurement of A yields
vA as its outcome. This idea should sound dubious to you because we have discussed
already that observables are really equivalence classes of experiments, not all of which
yield the same value. Moreover, we know that in Bohmian mechanics, a true value
is associated with position but not with every observable, in particular not with spin
observables. Hence, in this sense of “hidden variables,” Bohmian mechanics is really a
no-hidden-variables theory.
But this is not the central reason why the NHVTs do not exclude Bohmian mechan-
ics. Suppose we choose, in Bohmian mechanics, one experiment from every equivalence
class. (The experiment could be specified by specifying the wave function and configu-
ration of the apparatus together with the joint Hamiltonian of object and apparatus as
well as the calibration function.) For example, for every spin observable n · σ we could
say we will measure it by a Stern-Gerlach experiment in the direction n and subsequent
detection of the object particle. Then the outcome Zn of the experiment is a function
of the object wave function ψ and the objection configuration Q, so we have associated
with every observable n · σ a “true value” which comes out if we choose to carry out the
experiment associated with n · σ. And it is this situation that NHVTs claim to exclude!
So we are back at an apparent conflict between Bohmian mechanics and NHVTs.
It may occur to you that even a much simpler example than Bohmian mechanics will
prove the possibility of hidden-variable theories. Suppose we choose, as a trivial model,
for every self-adjoint operator A a random value vA independently of all other vA0 with
the Born distribution,
P(vA = α) = kPα ψk2 . (23.1)
Then we have not provided a real theory of quantum mechanics as Bohmian mechanics
provides, but we have provided a clearly consistent possibility for which values the
variables vA could have that agrees with the probabilities seen in experiment. Therefore,
all NHVTs must make some further assumptions about the hidden variables vA that are
violated in the trivial model as well as in Bohmian mechanics. We now take a look at
several NHVTs and their assumptions.

23.1 Bell’s NHVT

Bell’s theorem implies a NHVT, or rather, the second half of Bell’s 1964 proof is a
NHVT. Let me explain. In the trivial model, we have not specified how the vA change
with time. They may change according to some law under the unitary time evolution;

127
more importantly for us now, they may change whenever ψ collapses. That is, when
a quantum measurement of A is carried out, we should expect the vA0 (A0 6= A) to
change. However, there is an exception if we believe in locality. Then we should expect
that Alice’s measurement of α · σ a (on her particle a) will not alter the value of any
spin observable β · σ b acting on Bob’s particle. But Bell’s analysis shows that this is
impossible. To sum up:
Theorem 23.1. (Bell’s NHVT, 1964) Consider a joint distribution of random variables
vA , where A runs through the collection of observables
A ∪ B = α · σ a : α ∈ S(R3 ) ∪ β · σ b : β ∈ S(R3 ) .

(23.2)
Suppose that a quantum measurement of A ∈ A yields vA and does not alter the value
of vB for any B ∈ B, and that a subsequent quantum measurement of B ∈ B yields
vB . Then the joint distribution of the outcomes satisfies Bell’s inequality (16.32). In
particular, it disagrees with the distribution predicted by quantum mechanics.
In short, local hidden variables are impossible.

23.2 Kochen and Specker’s NHVT

This theorem addresses the idea that while in general a quantum measurement of A
may change the values of vA0 for A0 6= A, this should not happen if A and A0 can be
“simultaneously measured.”
Theorem 23.2. (Kochen and Specker’s NHVT47 , 1967) Suppose 3 ≤ dim H < ∞ and
ψ ∈ S(H ), let A be the set of all self-adjoint operators on H , and consider a joint
distribution of random variables vA for all A ∈ A . Suppose that whenever A, B ∈ A
commute, then a quantum measurement of A yields vA and does not alter the value of vB ,
and that a subsequent quantum measurement of B yields vB . Then the joint distribution
of the outcomes disagrees, at least for some A and B, with the distribution predicted by
quantum mechanics using ψ.
This result is actually weaker than Bell’s (although it was proved later), as it makes
a stronger assumption: indeed, any α · σ a commutes with any β · σ b , so Kochen and
Specker’s assumption implies that of Bell’s NHVT. In particular, the assumption is
violated in Bohmian mechanics.

23.3 Von Neumann’s NHVT

John von Neumann presented a NHVT in his 1932 book.48 It is clear that for a hidden-
variable model to agree with the predictions of quantum mechanics, every vA can only
47
S. Kochen and E.P. Specker: The Problem of Hidden Variables in Quantum Mechanics. Journal
of Mathematics and Mechanics 17: 59–87 (1967)
48
J. von Neumann: Mathematische Grundlagen der Quantenmechanik. Berlin: Springer-Verlag
(1932). English translation by R. T. Beyer published as J. von Neumann: Mathematical Foundation of
Quantum Mechanics. Princeton: University Press (1955)

128
have values that are eigenvalues of A, and its marginal distribution must be the Born
distribution. Von Neumann assumed in addition that whenever an observable C is a
linear combination of observables A and B,

C = αA + βB , α, β ∈ R , (23.3)

then vC is the same linear combination of vA and vB ,

vC = αvA + βvB . (23.4)

Theorem 23.3. (von Neumann’s NHVT, 1932) Suppose 2 ≤ dim H < ∞ and ψ ∈
S(H ), let A be the set of all self-adjoint operators on H , and consider a joint dis-
tribution of random variables vA for all A ∈ A . Suppose that (23.3) implies (23.4).
Then for some A the marginal distribution of vA disagrees with the Born distribution
associated with A and ψ.

As emphasized by Bell,49 there is no reason to expect (23.4) to hold. For example,

let H = C2 , A = σ1 , B = σ3 , and C the spin observable in the direction at 45◦ between
the x- and the z-direction; then C = √12 A + √12 B. However, the obvious experiment
for C is the Stern-Gerlach experiment in direction n = ( √12 , 0, √12 ), whereas those for A
and B would have the magnetic field point in the x- and the z-direction. Of course, the
experiment for C is not based on measuring A and B and then combining their results,
but is a completely different experiment. Thus, there is no reason to expect that its
outcome was a linear combination of what we would have obtained, had we applied
a magnetic field in the x- or the z-direction. So von Neumann’s assumption is not a
reasonable one.

49
J.S. Bell: On the problem of hidden variables in quantum mechanics. Reviews of Modern Physics
38: 447–452 (1966)

129
Contents
1 Course Overview 2

2 The Schrödinger Equation 5

3 Unitary Operators in Hilbert Space 9

3.1 Existence and Uniqueness of Solutions of the Schrödinger Equation . . . 9
3.2 The Time Evolution Operators . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Unitary Matrices and Rotations . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 Abstract Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Classical Mechanics 14
4.1 Definition of Newtonian Mechanics . . . . . . . . . . . . . . . . . . . . . 14
4.2 Properties of Newtonian Mechanics . . . . . . . . . . . . . . . . . . . . . 15
4.3 Hamiltonian Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 The Double-Slit Experiment 18

6 Bohmian Mechanics 20
6.1 Definition of Bohmian Mechanics . . . . . . . . . . . . . . . . . . . . . . 20
6.2 Historical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3 Equivariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.4 The Double-Slit Experiment in Bohmian Mechanics . . . . . . . . . . . . 24
6.5 Delayed Choice Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 25

7 Fourier Transform and Momentum 28

7.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.2 Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.3 Momentum Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.4 Tunneling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

8 Operators and Observables 37

8.1 Heisenberg’s Uncertainty Relation . . . . . . . . . . . . . . . . . . . . . . 37
8.2 Self-adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
8.3 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

9 Spin 43
9.1 Spinors and Pauli Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.2 The Pauli Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.3 The Stern–Gerlach Experiment . . . . . . . . . . . . . . . . . . . . . . . 45
9.4 Bohmian Mechanics with Spin . . . . . . . . . . . . . . . . . . . . . . . . 46
9.5 Is an Electron a Spinning Ball? . . . . . . . . . . . . . . . . . . . . . . . 47
9.6 Many-Particle Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

130
9.7 Representations of SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

10 The Projection Postulate 50

10.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
10.2 The Projection Postulate . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
10.3 Projection and Eigenspace . . . . . . . . . . . . . . . . . . . . . . . . . . 51
10.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

11 The Measurement Problem 54

11.1 What the Problem Is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
11.2 How Bohmian Mechanics Solves the Problem . . . . . . . . . . . . . . . . 55
11.3 Schrödinger’s Cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
11.4 Positivism and Realism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

12 The GRW Theory 59

12.1 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
12.2 Definition of the GRW Process . . . . . . . . . . . . . . . . . . . . . . . 60
12.3 Definition of the GRW Process in Formulas . . . . . . . . . . . . . . . . . 61
12.4 Primitive Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
12.5 The GRW Solution to the Measurement Problem . . . . . . . . . . . . . 63
12.6 Empirical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
12.7 The Need for a Primitive Ontology . . . . . . . . . . . . . . . . . . . . . 66

13 The Copenhagen Interpretation 70

13.1 Two Realms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
13.2 Positivism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
13.3 Impossibility of Non-Paradoxical Theories . . . . . . . . . . . . . . . . . 71
13.4 Completeness of the Wave Function . . . . . . . . . . . . . . . . . . . . . 72
13.5 Language of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 72
13.6 Complementarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
13.7 Reactions to the Measurement Problem . . . . . . . . . . . . . . . . . . . 75

14 Many Worlds 77
14.1 Schrödinger’s Many-Worlds Theory . . . . . . . . . . . . . . . . . . . . . 77
14.2 Everett’s Many-Worlds Theory . . . . . . . . . . . . . . . . . . . . . . . 79
14.3 Bell’s First Many-Worlds Theory . . . . . . . . . . . . . . . . . . . . . . 80
14.4 Bell’s Second Many-Worlds Theory . . . . . . . . . . . . . . . . . . . . . 80
14.5 Probabilities in Many-World Theories . . . . . . . . . . . . . . . . . . . . 80

15 The Einstein–Podolsky–Rosen Argument 83

15.1 The EPR Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
15.2 Bohm’s Version of the EPR Argument Using Spin . . . . . . . . . . . . . 84
15.3 Einstein’s Boxes Argument . . . . . . . . . . . . . . . . . . . . . . . . . . 85
15.4 Too Good To Be True . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

131
16 Nonlocality 86
16.1 Bell’s Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
16.2 Bell’s 1964 Proof of Nonlocality . . . . . . . . . . . . . . . . . . . . . . . 90
16.3 Bell’s 1976 Proof of Nonlocality . . . . . . . . . . . . . . . . . . . . . . . 91
16.4 Photons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

17 Further Discussion of Nonlocality 94

17.1 Nonlocality in Bohmian Mechanics, GRW, Copenhagen, Many-Worlds . . 94
17.2 Popular Myths About Bell’s Proof . . . . . . . . . . . . . . . . . . . . . . 96

18 POVMs: Generalized Observables 99

18.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
18.2 The Main Theorem about POVMs . . . . . . . . . . . . . . . . . . . . . 104
18.3 Limitations to Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 106
18.4 The Concept of Observable . . . . . . . . . . . . . . . . . . . . . . . . . . 107

19 Time of Detection 110

19.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
19.2 The Absorbing Boundary Rule . . . . . . . . . . . . . . . . . . . . . . . . 112

20 Density Matrix and Mixed State 115

20.1 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
20.2 The Trace Formula in Quantum Mechanics . . . . . . . . . . . . . . . . . 116
20.3 Limitations to Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 117
20.4 Density Matrix and Dynamics . . . . . . . . . . . . . . . . . . . . . . . . 117

21 Reduced Density Matrix and Partial Trace 119

21.1 Partial Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
21.2 The Trace Formula (21.2) . . . . . . . . . . . . . . . . . . . . . . . . . . 120
21.3 Statistical Reduced Density Matrix . . . . . . . . . . . . . . . . . . . . . 120
21.4 The Measurement Problem Again . . . . . . . . . . . . . . . . . . . . . . 120
21.5 The No-Signaling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 122
21.6 Canonical Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

22 Quantum Logic 124

23 No-Hidden-Variables Theorems 127

23.1 Bell’s NHVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
23.2 Kochen and Specker’s NHVT . . . . . . . . . . . . . . . . . . . . . . . . 128
23.3 Von Neumann’s NHVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

132

Lecture Notes On Foundations of Quantum Mechanics: Roderich Tumulka Winter Semester 2019/20
No ratings yet
Lecture Notes On Foundations of Quantum Mechanics: Roderich Tumulka Winter Semester 2019/20
181 pages
QM PDF
100% (3)
QM PDF
341 pages
Phys 454
No ratings yet
Phys 454
75 pages
Quantum Theory
100% (2)
Quantum Theory
124 pages
Lecture Notes On Quantum Physics
No ratings yet
Lecture Notes On Quantum Physics
109 pages
Schrodinger Equation Derivation
100% (1)
Schrodinger Equation Derivation
12 pages
2019 Book BasicQuantumMechanics
90% (10)
2019 Book BasicQuantumMechanics
516 pages
Lecture 2
No ratings yet
Lecture 2
172 pages
(David J. Griffiths) Introduction To Quantum Mecha (BookFi) PDF
100% (1)
(David J. Griffiths) Introduction To Quantum Mecha (BookFi) PDF
408 pages
8.04: Quantum Physics I (OCW) : Lecturer: Professor Barton Zwiebach
No ratings yet
8.04: Quantum Physics I (OCW) : Lecturer: Professor Barton Zwiebach
119 pages
Notes 311219
No ratings yet
Notes 311219
82 pages
Quantum 20078
No ratings yet
Quantum 20078
91 pages
Wa0002
No ratings yet
Wa0002
117 pages
Introduction To Quantum Mechanics
91% (11)
Introduction To Quantum Mechanics
484 pages
Quantum Mechanics Historical Overview
No ratings yet
Quantum Mechanics Historical Overview
41 pages
Advanced Quantum Mechanics: 6ccm436a/7ccmms31
No ratings yet
Advanced Quantum Mechanics: 6ccm436a/7ccmms31
105 pages
Advanced Quantum Mechanics: 6ccm436a/7ccmms31
No ratings yet
Advanced Quantum Mechanics: 6ccm436a/7ccmms31
105 pages
Quantum Mechanics I Guide
100% (2)
Quantum Mechanics I Guide
130 pages
Quantum Mechanics for Beginners
No ratings yet
Quantum Mechanics for Beginners
26 pages
Quantum Mechanics Lecture Notes
No ratings yet
Quantum Mechanics Lecture Notes
136 pages
QT2024
No ratings yet
QT2024
90 pages
Introduction To Quantum Mechanics
100% (1)
Introduction To Quantum Mechanics
80 pages
C. David Sherrill - A Brief Review of Elementary Quantum Chemistry
100% (3)
C. David Sherrill - A Brief Review of Elementary Quantum Chemistry
50 pages
Quantum Mechanics Live Notes
No ratings yet
Quantum Mechanics Live Notes
44 pages
Aqm2024 25
No ratings yet
Aqm2024 25
131 pages
Gerard Watts
No ratings yet
Gerard Watts
105 pages
PIIA - Quantum Physics - Horgan (2004) 56pg PDF
No ratings yet
PIIA - Quantum Physics - Horgan (2004) 56pg PDF
56 pages
Graduate Quantum Mechanics
No ratings yet
Graduate Quantum Mechanics
138 pages
Intro Fields
No ratings yet
Intro Fields
28 pages
Relativistic Waves and Quantum Fields: Andreas Brandhuber
No ratings yet
Relativistic Waves and Quantum Fields: Andreas Brandhuber
56 pages
Applied Physics
No ratings yet
Applied Physics
19 pages
Physics Course For Students
No ratings yet
Physics Course For Students
87 pages
Foundations of Quantum Mechanics
No ratings yet
Foundations of Quantum Mechanics
35 pages
Foundations of Quantum Mechanics
100% (1)
Foundations of Quantum Mechanics
35 pages
Advanced Quantum Physics Course Overview
100% (1)
Advanced Quantum Physics Course Overview
31 pages
2005 by Joel A. Shapiro
No ratings yet
2005 by Joel A. Shapiro
14 pages
Advanced Math for Physics Students
No ratings yet
Advanced Math for Physics Students
2 pages
Quantum Mechanics for Physics Students
No ratings yet
Quantum Mechanics for Physics Students
36 pages
Adrian Stan 16apr08
No ratings yet
Adrian Stan 16apr08
58 pages
Advanced Quantum Mechanics Guide
No ratings yet
Advanced Quantum Mechanics Guide
92 pages
2023 212C Lectures
No ratings yet
2023 212C Lectures
171 pages
ICSE Class 10 Physics Question Paper Solution 2015
No ratings yet
ICSE Class 10 Physics Question Paper Solution 2015
17 pages
Lap Winding
No ratings yet
Lap Winding
13 pages
Wireless EV Charging Efficiency
No ratings yet
Wireless EV Charging Efficiency
28 pages
LDR Physics Project Class12
No ratings yet
LDR Physics Project Class12
3 pages
Cambridge IGCSE™: Physics 0625/42 May/June 2020
No ratings yet
Cambridge IGCSE™: Physics 0625/42 May/June 2020
9 pages
Electro - Pneumatic
No ratings yet
Electro - Pneumatic
84 pages
Detector de Humo Fotoelectrico CONV 71 - 55
No ratings yet
Detector de Humo Fotoelectrico CONV 71 - 55
4 pages
SSC JE Electrical Most Repeated 100 MCQ
No ratings yet
SSC JE Electrical Most Repeated 100 MCQ
2 pages
Pre Test and Post Test
No ratings yet
Pre Test and Post Test
12 pages
Planck's "Quantum of Action" and External Photoelectric Effect
No ratings yet
Planck's "Quantum of Action" and External Photoelectric Effect
8 pages
Light - Reflection and Refraction (CN)
No ratings yet
Light - Reflection and Refraction (CN)
22 pages
12.1 12.2 Phase and Interference
No ratings yet
12.1 12.2 Phase and Interference
39 pages
Electrician A2Z Wiring Book in Hindi X50azn
100% (1)
Electrician A2Z Wiring Book in Hindi X50azn
190 pages
مقايسة الكهرباء مجمع
No ratings yet
مقايسة الكهرباء مجمع
6 pages
Bukomeko Joseph Presentation
No ratings yet
Bukomeko Joseph Presentation
7 pages
Relativity & Conservation Laws
No ratings yet
Relativity & Conservation Laws
4 pages
Flourescence Presentation
No ratings yet
Flourescence Presentation
9 pages
EME-PG01: Pulse Generator Card For VFD-E Series Instruction Sheet
No ratings yet
EME-PG01: Pulse Generator Card For VFD-E Series Instruction Sheet
2 pages
DLL Science10
No ratings yet
DLL Science10
7 pages
Dielectric Permittivity Measuring Technique of Film-Shaped Materials at Low Microwave Frequencies From Open-End Coplanar Waveguide J. Hinojosa
No ratings yet
Dielectric Permittivity Measuring Technique of Film-Shaped Materials at Low Microwave Frequencies From Open-End Coplanar Waveguide J. Hinojosa
14 pages
2017 Physics Entrance Exam Answer
No ratings yet
2017 Physics Entrance Exam Answer
34 pages
Thyristor-Based Induction Motor Speed Control
No ratings yet
Thyristor-Based Induction Motor Speed Control
9 pages
Class 12 Physics Investigatory Project - 20250928 - 221616 - 0000
No ratings yet
Class 12 Physics Investigatory Project - 20250928 - 221616 - 0000
6 pages
PhysRevLett 55 1908
No ratings yet
PhysRevLett 55 1908
4 pages
IGCSE Physics Mark Scheme 0625/31
No ratings yet
IGCSE Physics Mark Scheme 0625/31
7 pages
Anisotropy in Architectural Glass
No ratings yet
Anisotropy in Architectural Glass
3 pages
BPSC ME Practice-Questions - SET 1
No ratings yet
BPSC ME Practice-Questions - SET 1
29 pages
IEC Iistado
No ratings yet
IEC Iistado
3 pages
Data Sheet: SXR 130-15-0.5 & 1.2 Dual Spot
No ratings yet
Data Sheet: SXR 130-15-0.5 & 1.2 Dual Spot
3 pages
Experiment No. 2: To Obtain V-I Characteristics of PN Junction Diode. Lab Objective
No ratings yet
Experiment No. 2: To Obtain V-I Characteristics of PN Junction Diode. Lab Objective
5 pages