Script 2017
Script 2017
  ∗
   Department of Mathematics, Eberhard-Karls University, Auf der Morgenstelle 10, 72076 Tübingen,
Germany. Email: roderich.tumulka@uni-tuebingen.de
                                                1
1     Course Overview
Learning goals of this course: To understand the rules of quantum mechanics; to
understand several important views of how the quantum world works; to understand
what is controversial about the orthodox interpretation and why; to be familiar with
the surprising phenomena and paradoxes of quantum mechanics.
    Quantum mechanics is the field of physics concerned with (or the post-1900 theory
of) the motion of electrons, photons, quarks, and other elementary particles, inside
atoms or otherwise. It is distinct from classical mechanics, the pre-1900 theory of the
motion of physical objects. Quantum mechanics forms the basis of modern physics and
covers most of the physics under the conditions on Earth (i.e., not-too-high temperatures
or speeds, not-too-strong gravitational fields). “Foundations of quantum mechanics” is
the topic concerned with what exactly quantum mechanics means and how to explain
the phenomena described by quantum mechanics. It is a controversial topic. Here are
some voices critical of the traditional, orthodox view:
         “With very few exceptions (such as Einstein and Laue) [...] I was the
      only sane person left [in theoretical physics].”
                                                   (E. Schrödinger in a 1959 letter)
    In this course we will be concerned with what kinds of reasons people have for
criticizing the orthodox understanding of quantum mechanics, what the alternatives are,
and which kinds of arguments have been put forward for or against important views.
We will also discuss the rules of quantum mechanics for making empirical predictions;
they are uncontroversial. The aspects of quantum mechanics that we discuss also apply
to other fields of quantum physics, in particular to quantum field theory.
    • Self-adjoint matrices, axioms of the quantum formalism, collapse of the wave func-
      tion, decoherence
                                            2
   • Spin, the Stern-Gerlach experiment, the Pauli equation, representations of the
     rotation group
• No-hidden-variables theorems
   • Identical particles and the non-trivial topology of their configuration space, bosons
     and fermions
• Multivariable calculus
• Projection operators
                                            3
   • Tensor product of vector spaces
   • Are there in principle limitations to what we can know about the world (its laws,
     its state)?
Physicists usually take math classes but not philosophy classes. That doesn’t mean,
though, that one doesn’t use philosophy in physics. It rather means that physicists
learn the philosophy they need in physics classes. Philosophy classes are not among the
prerequisites of this course, but we will sometimes make connections with the history of
philosophy.
                                           4
2     The Schrödinger Equation
One of the fundamental laws of quantum mechanics is the Schrödinger equation
                                          X ~2  N
                                    ∂ψ
                                 i~    =−         ∇2i ψ + V ψ .                          (2.1)
                                    ∂t    i=1
                                              2mi
                                          ψ : Rt × R3N
                                                    q → C.                               (2.2)
is the derivative operator with respect to the variable xi , ∇2i the corresponding Laplace
operator
                                 2     ∂ 2ψ ∂ 2ψ ∂ 2ψ
                               ∇i ψ =       + 2 + 2.                                 (2.4)
                                       ∂x2i    ∂yi    ∂zi
V is a given real-valued function on configuration space, called the potential energy or
just potential.
    Fundamentally, the potential in non-relativistic physics is
                                           X           ei ej     X Gmi mj
                 V (x1 , . . . , xN ) =                       −                   ,      (2.5)
                                          1≤i<j≤N
                                                    |xi − xj | 1≤i<j≤N |xi − xj |
where                                p
                             |x| =    x2 + y 2 + z 2 for x = (x, y, z)                   (2.6)
                                                     5
denotes the Euclidean norm in R3 , ei are constants called the electric charges of the
particles (which can be positive, negative, or zero); the first term is called the Coulomb
potential, the second term is called the Newtonian gravity potential, G is a constant of
nature called Newton’s constant of gravity G = 6.67 × 10−11 kg−1 m3 s−2 , and mi are
again the masses. However, when the Schrödinger equation is regarded as an effective
equation rather than as a fundamental law of nature then the potential V may contain
terms arising from particles outside the system interacting with particles belonging to
the system. That is why the Schrödinger equation is often considered for rather arbitrary
functions V , also time-dependent ones. The operator
                                      N
                                     X    ~2 2
                                 H=−          ∇i + V                                 (2.7)
                                     i=1
                                         2m i
Born’s rule. If we measure the system’s configuration at time t then the outcome is
random with probability density
                                                    2
                                     ρ(q) = ψt (q) .                                 (2.9)
   This rule refers to the concept of probability density, which means the following. The
probability that the random outcome X ∈ R3N is any particular point x ∈ R3N is zero.
However, the probability that X lies in a set B ⊆ R3N is given by
                                              Z
                                P(X ∈ B) =        ρ(q) d3N q                        (2.10)
                                                B
(a 3N -dimensional volume integral). Instead of d3N q, we will often just write dq. A
density function ρ must be non-negative and normalized,
                                        Z
                            ρ(x) ≥ 0 ,      ρ(q) dq = 1 .                       (2.11)
                                           R3N
                                            6
A random variable with Gaussian density is also called a normal (or normally dis-
tributed ) random variable. It has mean µ ∈ R and standard deviation σ > 0. The mean
value or expectation value EX of a random variable X is its average value
                                         Z
                                   EX =    x ρ(x) dx .                         (2.13)
                                                R
                                                        p
The standard deviation of X is defined to be                E(X − EX)2 .
And indeed, the Schrödinger equation guarantees this relation: If it holds for t = 0 then
it holds for any t ∈ R. More generally, the Schrödinger equation implies that
                                Z             Z
                                   dq |ψt | = dq |ψ0 |2
                                           2
                                                                                    (2.15)
                           R
for any ψ0 . One says that dq |ψt |2 satisfies a conservation law. Indeed, the Schrödin-
ger equation implies a local conservation law for |ψ|2 ; that is, it implies the continuity
equation 1
                       N
      ∂|ψ(t, q)|2    X                                                                        
                  =−     ∇i · j i (t, q) with j i (t, q) =         ~
                                                                   mi
                                                                      Im   ψ ∗ (t, q)∇i ψ(t, q) ,   (2.16)
         ∂t          i=1
                        XN                            
                               ~2   ∗  2             2
                = ~2 Im −     2mi
                                  ψ   ∇i ψ + V (q)|ψ|                                               (2.18)
                                             | {z }
                                  i=1
                                                        real
                           N
                           X                                       N
                                                                     X
                                        ∗ 2          ∗
                    =−           ~
                                 mi
                                    Im ψ ∇i ψ + (∇i ψ ) · (∇i ψ) = −   ∇i · j i .                   (2.19)
                                                |      {z      }
                           i=1                                                  i=1
                                                            real
The continuity equation expresses that the amount of |ψ|2 cannot be created or de-
stroyed, only moved around, and in fact flows with the current (j 1 , . . . , j N ). To see
this, note that it asserts that the (3N + 1)-dimensional (configuration-space-time) vec-
tor field j = (|ψ|2 , j 1 , . . . , j N ) has vanishing divergence. By the Ostrogradski–Gauss
integral theorem (divergence theorem), the surface integral of a vector field equals the
volume integral of its divergence, so the surface integral of a divergence-less vector field
  1
    I don’t know where this name comes from. It has nothing to do with being continuous. It should
be called conservation equation.
                                                    7
vanishes. Let the surface be the boundary of a (3N + 1)-dimensional cylinder [0, T ] × S,
where S ⊆ R3N is a ball or any set with smooth boundary ∂S. Then the surface integral
of j is                    Z         Z         Z     Z          T
                      0=−         |ψ0 |2 +       |ψt |2 +           dt        dA n∂S · j   (2.20)
                              S              S              0            ∂S
with n∂S the unit normal vector field in R3N on the boundary of S. That is, the amount
of |ψ|2 in S at time T differs from the initial amount of |ψ|2 in S by the flux of j across
the boundary of S during [0, T ]—a conservation law. If (and this is indeed the case)
there is no flux to infinity, i.e., if the last integral becomes arbitrarily small by taking S
to be a sufficiently big Rball, then the total amount of |ψ|2 remains constant, see (2.15).
    Since the quantity dq |ψ|2 occurs frequently, it is useful to abbreviate it: The L2
norm is defined to be                      Z                1/2
                                  kψk =           dq |ψ(q)|2      .                      (2.21)
                                                 R3N
Thus, kψt k = kψ0 k, and the Born rule is consistent with the Schrödinger equation,
provided the initial datum ψ0 has norm 1, which we will henceforth assume. The wave
function ψt will in particular be square-integrable, and this makes the space L2 (R3N )
of square-integrable functions a natural arena. It is also called the Hilbert space, and is
the space of all wave functions.
                                                       8
3       Unitary Operators in Hilbert Space
In the following, we will often simply write L2 for L2 (R3N ). We will leave out many
mathematical details (which will be discussed in the course Mathematical Quantum
Theory).
Theorem 3.1. 2 For a large class of potentials V (including Coulomb, Newton’s gravity,
every bounded measurable function, and linear combinations thereof ) and for every ψ0 ∈
L2 , there is a unique weak solution ψ(t, q) of the Schrödinger equation with potential V
and initial datum ψ0 . Moreover, at every time t, ψt lies again in L2 .
Ut ψ0 = ψt . (3.1)
Ut is called the time evolution operator or propagator. Often, it is not possible to write
down an explicit closed formula for Ut , but it is nevertheless useful to consider Ut . It
has the following properties.
    First, Ut is a linear operator, i.e.
for any ψ, φ ∈ L2 , z ∈ C. This follows from the fact that the Schrödinger equation
is a linear equation, or, equivalently, that H is a linear operator. It is common to say
operator for linear operator.
    Second, Ut preserves norms:
                                       kUt ψk = kψk .                              (3.4)
    2
    This follows from Stone’s theorem and Kato’s theorem together. See, e.g., Theorem VIII.8 in
M. Reed and B. Simon: Methods of Modern Mathematical Physics, Vol. 1 (revised edition), Academic
Press (1980), and Theorem X.16 in M. Reed and B. Simon: Methods of Modern Mathematical Physics,
Vol. 2, Academic Press (1975).
                                               9
This is just another way of expressing Eq. (2.15). Operators with this property are
called isometric.
    Third, they obey a composition law :
                                   Us Ut = Ut+s ,    U0 = I ,                        (3.5)
where I denotes the identity operator
                                           Iψ = ψ .                                  (3.6)
It follows from (3.5) that Ut−1 = U−t . In particular, Ut is a bijection. An isometric
bijection is also called a unitary operator ; so Ut is unitary. A family of operators
satisfying (3.5) is called a one-parameter group of operators. Thus, the propagators
form a unitary 1-parameter group.
    Fourth,
                                     Ut = e−iHt/~ .                              (3.7)
The exponential of an operator A can be defined by the exponential series
                                               ∞
                                               X An
                                        eA =                                         (3.8)
                                               n=0
                                                     n!
if A is a so-called bounded operator ; in this case, the series converges. Unfortunately,
the Hamiltonian of the Schrödinger equation (19.2) is unbounded. But mathematicians
agree about how to define eA for unbounded operators (of the type that H is); we will
not worry about the details of this definition.
    Eq. (3.7) is easy to understand: after defining
                                       φt := e−iHt/~ ψ0 ,                            (3.9)
one would naively compute as follows:
                                   d         d
                              i~      φt = i~ e−iHt/~ ψ0                           (3.10)
                                   dt        dt
                                              iH 
                                         = i~ −     e−iHt/~ ψ0                     (3.11)
                                                 ~
                                         = Hφt ,                                   (3.12)
so φt is a solution of the Schrödinger equation with φ0 = e0 ψ0 = ψ0 , and thus φt = ψt .
The calculation (3.10)–(3.12) can actually be justified for all ψ0 in the domain of H, a
dense set in L2 ; we will not go into details here.
                                               10
instead of the L2 norm                  Z                   1/2
                                                         2
                                kψk =        |ψ(q)| dq              .              (3.14)
                                             3
                                                         !1/2
                                             X
                                    |u| =          u2i          .                  (3.18)
                                             i=1
The norm-preserving operators in R3 are exactly the orthogonal matrices, i.e., those
matrices A with
                                   At = A−1 ,                                 (3.19)
where At denotes the transposed matrix, Atij = Aji . They have a geometric meaning:
Each orthogonal matrix is either a rotation around some axis passing through the origin,
or a reflection across some plane through the origin, followed by a rotation. The set of
orthogonal 3 × 3 matrices is denoted O(3). The set of those orthogonal matrices which
do not involve a reflection is denoted SO(3) for “special orthogonal matrices”; they
correspond to rotations and can be characterized by the condition det A > 0 in addition
to (3.19).
    In dimension d > 3, one can show that the special orthogonal matrices are still
compositions (i.e., products) of 2-dimensional rotation matrices such as (for d = 4)
                                                     
                                   cos α sin α
                                − sin α cos α        
                                                     .                          (3.20)
                                                1 
                                                     1
This rotation does not rotate around an axis, it rotates around a (d − 2)-dimensional
subspace (spanned by the 3rd and 4th axes). However, in d ≥ 4 dimensions, not every
  3
      iff = if and only if
                                              11
special orthogonal matrix is a rotation around a (d − 2)-dim. subspace through a certain
angle, but several such rotations can occur together, as the following example shows:
                                                          
                              cos α sin α
                           − sin α cos α                  
                                                          .                     (3.21)
                                             cos β sin β 
                                             − sin β cos β
  3. It is conjugate-symmetric,
                                           hφ|ψi = hψ|φi∗                             (3.26)
      for all ψ, φ ∈ L2 .
                                               12
   4. It is positive definite,4
                                         hψ|ψi > 0 for ψ 6= 0 .                              (3.27)
Note that the dot product in R3 has the same properties, the properties of an inner
product, except that the scalars involved lie in R, not C. Another inner product with
these properties is defined on Cn by
                                               n
                                               X
                                     hψ|φi =         ψ(i)∗ φ(i) .                            (3.28)
                                               i=1
Note that the radicand is ≥ 0. Conversely, the inner product can be expressed in terms
of the norm according to the polarization identity
                                                                      
            hψ|φi = 14 kψ + φk2 − kψ − φk2 − ikψ + iφk2 + ikψ − iφk2 .           (3.30)
Its proof is a homework exercise. It follows from the polarization identity that every
unitary operator U preserves inner products,
(Likewise, every A ∈ SO(3) preserves dot products, which has the geometrical meaning
that a rotation preserves the angle between any two vectors.)
   In analogy to the dot product, two functions ψ, φ with hψ|φi = 0 are said to be
orthogonal.
   4
    Another math subtlety: This will be true only if we identify two functions ψ, φ whenever the set
{q ∈ R3N : ψ(q) 6= φ(q)} has volume 0. It is part of the standard definition of L2 to make these
identifications.
                                                13
4     Classical Mechanics
Classical physics means pre-quantum (pre-1900) physics. I describe one particular ver-
sion that could be called Newtonian mechanics (even though certain features were not
discovered until after Isaac Newton’s death). This version is over-simplified in that
it leaves out magnetism, electromagnetic fields (which play a role for electromagnetic
waves and thus the classical theory of light), and relativity theory.
                                     d2 Qi
                                mi         = −∇i V (Q1 , . . . , QN )                   (4.1)
                                      dt2
with V the fundamental potential function of the universe as given in Eq. (2.5). This
completes the definition of Newtonian mechanics.
    The equation of motion (4.1) is an ordinary differential equation of second order
(i.e., involving second time derivatives). Once we specify, as initial conditions, the initial
positions Qi (0) and velocities (dQi /dt)(0) of every particle, the equation of motion (4.1)
determines Qi (t) for every i and every t.
    Written explicitly, (4.1) reads
                    d2 Qi     X             Qj − Qi      X            Qj − Qi
               mi      2
                          = −      e i e j           3
                                                       +      Gmi mj             .      (4.2)
                     dt       j6=i
                                           |Qj − Qi |    j6=i
                                                                     |Qj − Qi |3
The right hand side is called the force acting on particle i; the j-th term in the first
sum (with the minus sign in front) is called the Coulomb force exerted by particle j on
particle i; the j-th term in the second sum is called the gravitational force exerted by
particle j on particle i.
    Newtonian mechanics is empirically wrong. For example, it entails the absence of
interference fringes in the double-slit experiment (and entails wrong predictions about
everything that is considered a quantum effect). Nevertheless, it is a coherent theory, a
“theory of everything,” and often useful to consider as a hypothetical world to compare
ours to.
    Newtonian mechanics is to be understood in the following way: Physical objects such
as tables, baseballs, or dogs consist of huge numbers (such as 1024 ) of particles, and they
must be regarded as just such an agglomerate of particles. Since Newtonian mechanics
governs unambiguously the behavior of each particle, it also completely dictates the
behavior of tables, baseballs, and dogs. Put differently, after (4.1) has been given, there
                                                  14
is no need to specify any further laws for tables, baseballs, or dogs. Any physical law
concerning tables, baseballs, or dogs, is a consequence of (4.1). This scheme is called
reductionism. It makes chemistry and biology sub-fields of physics. (This does not
mean, though, that it would be of practical use to try to solve (4.1) for 1024 or 1080
particles in order to study the behavior of dogs.) Can everything be reduced to (4.1)?
It seems that conscious experiences are an exception—presumably the only one.
    When we consider a baseball, we are often particularly interested in the motion of
its center Q(t) because we are interested in the motion of the whole ball. It is often
possible to give an effective equation for the behavior of a variable like Q(t), for example
                                                        
                                  2                      0
                                 dQ          dQ
                               M 2 = −γ          − M g 0 ,                            (4.3)
                                  dt          dt
                                                         1
where M is the mass of the baseball, the first term on the right hand side is called the
friction force, the second the gravitional force of Earth, γ is the friction coefficient of
the baseball and g the gravitational field strength of Earth. The effective equation (4.3)
looks quite similar to the fundamental equation (4.1) but (i) it has a different status (it
is not a fundamental law), (ii) it is only approximately valid, (iii) it contains a term that
is not of the form −∇V (the friction term), (iv) forces that do obey the form −∇V (Q)
(such as the second force) can have other functions for V (such as V (x) = M gx3 ) instead
of (2.5).
    The theory I call Newtonian mechanics was never actually proposed to give the
correct and complete laws of physics (although we can imagine a hypothetical world
where it does); for example, it leaves out magnetism. An extension of this theory, which
we will not consider further but which is also considered “classical physics,” includes
electromagnetic fields (governed by Maxwell’s field equations) and gravitational fields
(governed by Einstein’s field equations, also known as the theory of general relativity).
    The greatest contributions from a single person to the development of Eq. (4.1) came
from Isaac Newton (1643–1727), who suggested (in his Philosophiae Naturalis Principia
Mathematica 1687) considering ODEs, in fact of second order, suggested “forces” and
              2
the form m ddtQ 2 = force, and introduced the form of the gravitational force, now known
as “Newton’s law of universal gravity.” Eq. (4.2) was first written down, without the
Coulomb term, by Leonhard Euler (1707–1783). The first term was proposed in 1784
by Charles Augustin de Coulomb (1736–1806). Nevertheless, we will call (4.1) and (4.2)
“Newton’s equation of motion.”
                                             15
the microscopic laws and irreversibility of macroscopic phenomena can be compatible,5
time reversal invariance has been widely accepted. This was also because time reversal
invariance also holds in other, more refined theories after Newtonian mechanics, such as
Maxwell’s equations of classical electromagnetism, general relativity, and the Schrödin-
ger equation.
Definition 4.1. Let v i (t) = dQi /dt denote the velocity of particle i. The energy, the
momentum, and the angular momentum of the universe are defined to be, respectively,
                         N                     N 
                         X mk                  X                ej ek   1
                    E=              v 2k   −           Gmj mk −                           (4.4)
                         k=1
                                2              j,k=1
                                                                4πε0 |Qj − Qk |
                                                j<k
                         N
                         X
                    p=         mk v k                                                     (4.5)
                         k=1
                         XN
                    L=         mk Qk × v k ,                                              (4.6)
                         k=1
where v 2 = v · v = |v|2 , and × denotes the cross product in R3 . The first term in (4.4)
is called kinetic energy, the second one potential energy.
Proposition 4.2. E, p, and L are conserved quantities, i.e., they are time independent.
Proof: exercise.
                                                       16
function H called the Hamiltonian function or simply the Hamiltonian. Namely, n is as-
sumed to be even, n = 2r, and denoting the n components of x by (q1 , . . . , qr , p1 , . . . , pr ),
the ODE is of the form
                                           dqi    ∂H
                                               =                                                (4.8)
                                           dt     ∂pi
                                           dpi    ∂H
                                               =−     .                                         (4.9)
                                           dt     ∂qi
Newtonian mechanics fits this definition with r = 3N , q1 , . . . , qr the 3N components
of q = (q 1 , . . . , q N ), p1 , . . . , pr the 3N components of p = (p1 , . . . , pN ) (the momenta
pk = mk v k ), and H = H(q, p) the energy (4.4) expressed as a function of q and p, that
is,
                                         N          N 
                                       X     p2k   X             ej ek      1
                       H(q, p) =                 −      Gmj mk −                     .          (4.10)
                                       k=1
                                            2mk j,k=1            4πε0 |q j − q k |
                                             j6=k
    For readers familiar with manifolds I mention that the natural definition of a Hamil-
tonian system on a manifold M is as follows. M plays the role of phase space. Let
the dimension n of M be even, n = 2r, and suppose we are given a symplectic form
ω on M , i.e., a non-degenerate differential 2-form whose exterior derivative vanishes.
(Non-degenerate means that it has full rank n at every point.) The equation of motion
for t 7→ x(t) ∈ M reads
                                       dx 
                                    ω      , · = dH ,                               (4.11)
                                        dt
where dH means the exterior derivative of H. To make the connection with the case
M = Rn just described, dH is then the gradient of H and ω the n × n matrix
                                                  
                                               0 I
                                     ω=                                             (4.12)
                                              −I 0
with I the r × r unit matrix and 0 the r × r zero matrix; ω(dx/dt, ·) becomes the
transpose of ω applied to the n-vector dx/dt, and (4.11) reduces to (4.8) and (4.9).
                                                    17
5     The Double-Slit Experiment
A few remarks about Feynman’s text:
      is a bit too strong. Other mysteries can claim to be on equal footing with this
      one. Feynman weakened his statement later.
• Feynman’s statements
      are too strong. We will see in Chapters 6, 13, and 15 that Bohmian mechanics
      and other theories provide some explanation of the double slit experiment.
                                             18
       Some illustrations I’m showing you, related to the double-slit experiment:
    Note that the observations in the double-slit experiment are in agreement with, and
in fact follow from, the Born rule and the Schrödinger equation: The relevant system
here consists of one electron, so ψt is a function in just 3 dimensions. The potential V
can be taken to be +∞ (or very large) at every point of the plate containing the two
slits—except in the slits themselves, where V = 0. Away from the plate, also V = 0.
The Schrödinger equation governs the behavior of ψt , with the initial wave function ψ0
being a wave packet, e.g., a Gaussian wave packet as in Exercise 4 of Assignment 1,
                                                                  x2
                                 ψ0 (x) = (2πσ 2 )−3/4 e−ik·x e− 4σ2 ,                            (5.1)
moving toward the double slit. According to the Schrödinger equation, part of ψ will
be reflected from the wall, part of it will pass through the two slits. The two parts
of the wave emanating from the two slits, ψ1 and ψ2 will overlap and thus interfere,
ψ = ψ1 + ψ2 . When we detect the electron, its probability density is given, according
to the Born rule, by
    What if we include a device (such as Feynman’s lamp) that will detect the electron
at one of the slits? Then we detect the electron twice: once at a slit and once at the
backdrop screen. Thus, we either have to regard it as a many-particle problem (involving
at least two particles, the electron and the photon), or we need a version of the Born
rule suitable for repeated detection. We will study both approaches in later lectures.
   6
    More precisely, electrons could pass right or left of a positively charged wire of diameter 1 µm.
Those passing on the right get deflected to the left, and vice versa. Thus, the arrangement leads to the
superposition of waves travelling in slightly different directions—just what is needed for interference.
                                                  19
6     Bohmian Mechanics
      “[Bohmian mechanics] exercises the mind in a very salutary way.”
      J. Bell, Speakable and Unspeakable in Quantum Mechanics, page 171
    The situation in quantum mechanics is that we have a set of rules, known as the
quantum formalism, for computing the possible outcomes and their probabilities for
(more or less) any conceivable experiment, and everybody agrees (more or less) about
the formalism. What the formalism doesn’t tell us, and what is controversial, is what
exactly happens during these experiments, and how nature arrives at the outcomes
whose probabilities the formalism predicts. There are different theories answering these
questions, and Bohmian mechanics is one of them.
    Let me elucidate my statements a bit. We have already learned part of the quantum
formalism: the Schrödinger equation and the Born rule. These rules have allowed us to
predict the possible outcomes of the double-slit experiment with a single electron (easy
here: a spot anywhere on the screen) and their probability distribution (here: a prob-
ability distribution corresponding to |ψ|2 featuring a sequence of maxima and minima
corresponding to interference fringes). What the rules didn’t tell us was what exactly
happens during this experiment (e.g., how the electron moves). Bohmian mechanics fills
this gap.
    We have not seen all the rules of the quantum formalism yet. We will later, in Lec-
tures 6 and 8. So far, we have formulated the Born rule only for position measurements,
and we have not considered repeated detections.
                                dQi   ~     ∇i Ψ
                                    =    Im      (t, Q(t)) .                            (6.1)
                                 dt   mi     Ψ
Here, Q(t) = (Q1 (t), . . . , QN (t)) is the configuration at time t, and Ψ is a wave function
that is called the wave function of the universe and evolves according to the Schrödinger
equation
                                               N
                                   ∂Ψ        X     ~2 2
                                i~      =−           ∇ Ψ+V Ψ                             (6.2)
                                    ∂t        i=1
                                                  2mi i
with V given by (2.5). The configuration Q(0) at the initial time of the universe (say,
right after the big bang) is chosen randomly by nature with probability density
                                             20
(We write capital Q for the configuration of particles and little q for the configuration
variable in either ρ or Ψ.) This completes the definition of Bohmian mechanics.
    The central fact about Bohmian mechanics is that its predictions agree exactly with
those of the quantum formalism (which so far have always been confirmed in experi-
ment). We will understand later why this is so.
    Eq. (6.1) is an ordinary differential equation of first order (specifying the velocity
rather than the acceleration). Thus, the initial configuration Q(0) determines Q(t) for
all t, so Bohmian mechanics is a deterministic theory. On the other hand, Q(t) is
random because Q(0) is. Note that this randomness does not conflict with determinism.
It is a theorem, the equivariance theorem, that the probability distribution of Q(t) is
given by |Ψt (q)|2 . We will prove the equivariance theorem later in this Lecture. As a
consequence, it is consistent to assume the Born distribution for every t. Note that due
to the determinism, the Born distribution can be assumed only for one time (say t = 0);
for any other time t, then, the distribution of Q(t) is fixed by (6.1). The state of the
universe at any time t is given by the pair (Q(t), Ψt ).
    Let us have a closer look at Bohm’s equation of motion (6.1). If we recall the formula
(2.16) for the probability current then we can rewrite Eq. (6.1) in the form
                           dQi   j    probability current
                               = i2 =                     .                          (6.4)
                            dt  |Ψ|   probability density
This is a very plausible relation because it is a mathematical fact about any particle
system with deterministic velocities that
                 probability current = velocity × probability density .              (6.5)
We will come back to this relation when we prove equivariance.
    Here is another way of re-writing (6.1). A complex number z can be charaterized by
its modulus R ≥ 0 and its phase S ∈ R, z = ReiS . It will be convenient in the following
to replace S by S/~ (but we will still call S the phase of z). Then a complex-valued
function Ψ(t, q) can be written in terms of the two real-valued functions R(t, q) and
S(t, q) according to
                               Ψ(t, q) = R(t, q) eiS(t,q)/~ .                       (6.6)
Let us plug this into (6.1): Since
                           ∇i Ψ = ∇i (ReiS/~ )                                       (6.7)
                                 = (∇i R)eiS/~ + R∇i eiS/~                           (6.8)
                                                   i∇i S iS/~
                                 = (∇i R)eiS/~ + R      e     ,                      (6.9)
                                                     ~
we have that
                                                               !
                          ~     ∇i Ψ   ~     ∇i R ∇i S
                             Im      =    Im      +i                               (6.10)
                          mi     Ψ     mi      R}
                                             | {z    ~
                                                  real
                                          ~ ∇i S   1
                                      =          =    ∇i S .                       (6.11)
                                          mi ~     mi
                                            21
Thus, (6.1) can be rewritten as
                                  dQi   1
                                      =    ∇i S(t, Q(t)) .                             (6.12)
                                   dt   mi
In words, the velocity is given (up to a constant factor involving the mass) by the
gradient of the phase of the wave function.
   A historical note. A few years before the development of the Schrödinger equation,
Louis de Broglie had suggested a quantitative rule-of-thumb for wave–particle duality:
A particle with momentum p = mv should “correspond” to a wave with wave vector k
according to the de Broglie relation
p = ~k . (6.13)
The wave vector is defined by the relation ψ = eik·x (so it is defined only for plane waves);
it is orthogonal to the wave fronts (surfaces of constant phase), and its magnitude is
|k| = 2π/(wave length). Now, if the wave is not a plane wave then we can still define
a local wave vector k(x) that is orthogonal to the surface of constant phase and whose
magnitude is 1/(rate of phase change). Some thought shows that k(x) = ∇S(x)/~. If
we use this expression on the right hand side of (6.13) and interpret p as mass times
the velocity of the particle, we obtain exactly Eq. (6.12), that is, Bohm’s equation of
motion.
6.3    Equivariance
The term “equivariance” comes from the fact that the two relevant quantities, ρt and
|ψt |2 , vary equally with t. (Here, ρt is the distribution arising from ρ0 by transport along
the Bohmian trajectories.) The equivariance theorem can be expressed by means of the
                                             22
following diagram:
                                         Ψ0 −→ ρ
                                                0
                                        Ut y                                           (6.14)
                                               
                                                y
                                         Ψt −→ ρt
The horizontal arrows mean taking | · |2 , the left vertical arrow means the Schrödinger
evolution from time 0 to time t, and the right vertical arrow means the transport of
probability along the Bohmian trajectories. The statement about this diagram is that
both paths along the arrows lead to the same result.
    As a preparation for the proof, we note that the equation of motion can be written
in the form
                                     dQ
                                         = vt (Q(t)) ,                               (6.15)
                                     dt
where vt : R3N → R3N is the vector field on configuration space vt = v = (v 1 , . . . , v N )
whose i-th component is
                                          ~     ∇i Ψ
                                    vi =     Im        .                             (6.16)
                                          mi      Ψ
We now address the following question: If vt is known for all t, and the initial probability
distribution ρ0 is known, how can we compute the probability distribution ρt at other
times? The answer is the continuity equation
                                   ∂ρt              
                                       = −div ρt vt .                                (6.17)
                                   ∂t
This follows from the fact that the probability current is given by ρt vt . In fact, in any
dimension d (d = 3N or otherwise) and for any density (probability density or energy
density or nitrogen density or . . . ) it is true that
As mentioned in (6.4), v i = j i /|ψt |2 . Thus, if ρt = |ψt |2 then Eq. (6.20) is true, which
completes the proof.
                                              23
6.4    The Double-Slit Experiment in Bohmian Mechanics
Let us apply what we know about Bohmian mechanics to N = 1 and the wave function
of the double-slit experiment. We assume that the particle in the experiment moves
as if it was alone in the universe, with the potential V representing the wall with two
slits. We will justify that assumption in a later Lecture. We know already what the
wave function ψ(t, x) looks like (remember the movie). Here is a picture of the possible
trajectories of the particle.
    We know from the equivariance theorem that the position will always have proba-
bility distribution |ψt |2 . Thus, if we detect the particle at time t we find its distribution
in agreement with the Born rule.
    Note that the particle moves not along straight lines, as it would according to classical
mechanics. Note that the wave passes through both slits, while the particle passes
through one only. Think about how that answers the paradoxes pointed out by Feynman.
Note that the particle trajectories would be different if one slit were closed. Note that
we can find out which slit the particle went through without disturbing the interference
pattern: check whether the particle arrived in the upper or lower half of the detection
screen.
                                              24
      “Is it not clear from the smallness of the scintillation on the screen that we
      have to do with a particle? And is it not clear, from the diffraction and
      interference patterns, that the motion of the particle is directed by a wave?
      De Broglie showed in detail how the motion of a particle, passing through
      just one of two holes in screen, could be influenced by waves propagating
      through both holes. And so influenced that the particle does not go where
      the waves cancel out, but is attracted to where they cooperate. This idea
      seems to me so natural and simple, to resolve the wave–particle dilemma in
      such a clear and ordinary way, that it is a great mystery to me that it was
      so generally ignored.”       J. Bell, Speakable and Unspeakable in Quantum
      Mechanics, page 191
    Coming back to Feynman’s description of the double-slit experiment, we see that his
statement that its outcome “cannot be explained” is not quite accurate. It is true that
it cannot be explained in Newtonian mechanics, but it can in Bohmian mechanics.
                                            25
    Bohmian mechanics illustrates that these conclusions don’t actually follow. Bell
described that in his article; here are some key points again. To begin with, there is
no retrocausation in Bohmian mechanics, as any intervention of observers will change
ψ only in the future, not in the past, of the intervention, and the particle trajectory
will correspondingly be affected also only in the future. Another basic observation is
that with the literal wave-particle dualism of Bohmian mechanics (there is a wave and
there is a particle), there is nothing left of the idea that the electron is sometimes a
wave and sometimes a particle, and hence even less of the idea that observers could
force an electron to become a wave or to become a particle. In detail: the wave passes
through both slits, the particle through one; in the overlap region, the two wave packets
interfere, and the particle’s |ψ|2 distribution features an interference pattern; if there
is no screen in the overlap region, then the particle moves on in such a way that the
interference pattern disappears and two separate spots form.
    After understanding the Bohmian picture of this experiment, some steps in Wheeler’s
reasoning appear strange. If one assumes that there are no particle trajectories in the
quantum world, as one usually does in orthodox quantum mechanics (recall Feynman’s
chapter), then it would seem natural to say that there is no fact about which slit the
electron went through, given that there was no attempt to detect the electron while
passing a slit. Surprising it is, then, that Wheeler claims that the detection on the far-
away screen reveals which slit it took! How can anything reveal which slit the electron
took if the electron didn’t take a slit?
    There is another interesting aspect to the story that I will call Wheeler’s fallacy.
When you analyze the Bohmian picture in the case of far-away screen, it turns out that
the trajectories passing through the left (right) slit end up in the left (right) region.
(We will discuss why in the exercises.) So Wheeler makes the wrong retrodiction of
which slit the electron passed through! How could this happen? Wheeler noticed that
if the right (left) slit is closed, so only one packet comes out, and it comes out of the
left (right) slit, then only detection events in the right (left) region occur. This is also
true in Bohmian mechanics. Now Wheeler concludes then when wave packets come out
of both slit, and if a detection occurs in the right region, then the particle must have
passed through the left slit. This is wrong in Bohmian mechanics, and once you realize
this, it is obvious that Wheeler’s conclusion is a non sequitur —a fallacy.
    Shahriar Afshar proposed and carried out a further variant of the experiment, known
as Afshar’s experiment.8 In this variant, one puts the screen in the far position, but one
adds obstacles (that would absorb or reflect electrons) in the overlap region, in fact in
those places where the interference is destructive. If an interference pattern occurs in the
overlap region, even if it is not observed, then almost no electrons arrive at the obstacles,
and almost no electrons get absorbed or reflected. Thus, if all electrons arrive on the far
screen in either the left or the right region, as in fact observed in the experiment, then
this is indicative that there was an interference pattern in the overlap region even if it
was not observed. Afshar argued that this shows that wave and particle must both have
   8
   S. S. Afshar: Violation of the principle of complementarity, and its implications. Proceedings of
SPIE 5866: 229–244 (2005) https://arxiv.org/abs/quant-ph/0701027
                                                26
existed. Again, Bohmian mechanics easily explains the outcome of this experiment.
                                         27
7     Fourier Transform and Momentum
7.1    Fourier Transform
We know from Exercise 2 of Homework 1 that the plane wave eik·x evolves according to
the free Schrödinger equation to
                                                      2
                                        eik·x e−i~k       t/2m
                                                                 .                     (7.1)
Since the Schrödinger equation is linear, any linear combination of plane waves with
different wave vectors k,              X
                                           ck eik·x                              (7.2)
with complex coefficients ck , will evolve to
                                                     2
                                    X
                                       ck eik·x e−i~k t/2m .                           (7.3)
                                               28
the smooth functions ψ : Rd → C such that for every n ∈ N and every α ∈ Nd0 there
is Cn,α > 0 such that |∂ α ψ(x)| < Cn,α |x|−n for all x ∈ Rd , where ∂ α := ∂1α1 · · · ∂dαd .
For example, every Gaussian wave packet lies in S ; note that S ⊂ L1 ∩ L∞ . It
turns out that Fourier transformation maps S bijectively to itself. Moreover, S is a
dense subspace in L2 , and F can be extended in a unique way to a bounded operator
F : L2 → L2 , even though the integral (7.6) exists only for ψ ∈ L1 ∩ L2 .
    Going back to Eq. (7.5) and taking c(k) = (2π)−3/2 ψb0 (k), we can express the solution
of the free Schrödinger equation as
                                       Z
                                 1          3
                                                
                                                   −i~k2 t/2m b
                                                                    
                     ψt (x) =      3/2
                                           d  k   e          ψ 0 (k)  eik·x .         (7.8)
                              (2π)      R3
In words, we can find ψt from ψ0 by taking its Fourier transform ψb0 , multiplying by a
                                     2
suitable function of k, viz., e−i~k t/2m , and taking the inverse Fourier transform.
    The same trick can be done for N particles. Then d = 3N , ψ = ψ(x1 , . . . , xN ),
ψb = ψ(k
     b 1 , . . . , kN ), and the factor to multiply by is
                        X  N
                                ~ 2                      ~ 2
                    exp −i         kj t instead of exp −i    k t .                     (7.9)
                           j=1
                               2mj                        2m
    Note that we take the Fourier transform only in the space variables, not in the time
variable. There are also applications in which it is useful to consider a Fourier transform
in t, but not here.
Example 7.3. The Fourier transform of a Gauss function. Let σ > 0 and
                                                     x2
                                      ψ(x) = C e− 4σ2                                 (7.10)
The evaluation of the last integral involves the Cauchy integral theorem, varying the
path of integration and estimating errors. Here, I just report that the outcome is the
constant π 3/2 , independently of σ and k. Thus,
                                                   2 2
                                      ψ(k)
                                      b    = C3 e−σ k                                 (7.15)
                                              29
with C3 = C2 π 3/2 . In words, the Fourier transform of a Gaussian function is another
Gaussian function, but with width 1/(2σ) instead of σ. (We see here shadows of the
Heisenberg uncertainty relation, which we will discuss in the next chapter.)
Rule 7.4.    (a)
                                        ∂ψ
                                        d
                                            (k) = ikj ψ(k)
                                                      b    .                       (7.16)
                                        ∂xj
     That is, differentiation of ψ corresponds to multiplication of ψb by ik.
 (b) Conversely,
                                                   ∂ ψb
                                          −ix
                                          \   jψ =      .                          (7.17)
                                                   ∂kj
 (c) Indeed,
                                             Z
                                         1
                        ĝ(k − k0 ) =            g(x) e−i(k−k0 )·x dd x            (7.25)
                                      (2π)d/2 Rd
                                             Z 
                                         1         ik0 ·x
                                                              
                                    =             e       g(x)  e−ik·x dd x .      (7.26)
                                      (2π)d/2 Rd
                                              30
 (d) This follows in much the same way.
which is again a general Gaussian packet with center k0 and width 1/(2σ).
                                                ∗∗∗
  Fourier transformation defines a unitary operator F : L2 (Rd ) → L2 (Rd ), F ψ = ψ.  b
We verify that kF ψkL2 = kψkL2 at least for nice ψ. Note first that, for f, g ∈ L ∩ L2 ,
                                                                                 1
        Z Z                              Z Z                 
                −ik·x
               e             d        d
                      f (k) d k g(x) d x =      e−ik·x g(x) dd x f (k) dd k        (7.29)
by changing the order of integration (which integral is done first). The theorem saying
that we are allowed to change the order of integration (for an integrable integrand f g)
is called Fubini’s theorem. From Eq. (7.29) we can conclude hg ∗ |fˆi = hĝ ∗ |f i. Since
                                      Z              ∗
                         ∗
                (F f )(k) = (2π) −d/2
                                          e−ik·x
                                                 f (x) dd x = F −1 (f ∗ )(k) ,          (7.30)
7.2     Momentum
“Position measurements” usually consist of detecting the particle. “Momentum mea-
surements” usually consist of letting the particle move freely for a while and then mea-
suring its position.9
    We now analyze this experiment using Bohmian mechanics. We define the asymptotic
velocity u to be
                                                dQ
                                       u = lim     (t)                             (7.31)
                                            t→∞ dt
                                                   Q(t)
                                          u = lim       .                                      (7.32)
                                               t→∞  t
   9
    Alternatively, one lets the particle collide with another particle, makes a “momentum measurement”
on the latter, and makes theoretical reasoning about what the momentum of the former must have been.
                                                 31
To understand this, note that (Q(t) − Q(0))/t is the average velocity during the time
interval [0, t]; if an asymptotic velocity exists (i.e., if the velocity approaches a constant
vector u) then the average velocity over a long time t will be close to u because for
most of the time the velocity will be close to u. The term Q(0)/t converges to zero as
t → ∞, so we obtain (7.32).
    We want the momentum measurement to measure p := mu for a free particle (V =
0). So we measure Q(t) for large t, divide by t, and multiply by m. We can and will
also take this recipe as the definition of a momentum measurement, independently of
whether we want to use Bohmian mechanics.
    How large do we need t to be? In practice, often not very. When thinking of a particle
emitted by a radioactive atom, or coming from a particle collision in an accelerator
experiment (such as the Large Hadron Collider LHC in Geneva), a millisecond is usually
enough for dQ/dt to become approximately constant.
    According to the Born rule, the outcome p is random, and its distribution can be
characterized by saying that, for any set B ⊂ R3 ,
                              P(u ∈ B) = lim P(Q(t)/t ∈ B)                             (7.33)
                                          t→∞
                                       = lim P(Q(t) ∈ tB)                              (7.34)
                                         t→∞
                                             Z
                                       = lim    |ψt (x)|2 d3 x ,                       (7.35)
                                          t→∞     tB
where
                                    tB = {tx : x ∈ B}                                  (7.36)
is the scaled set B.
Theorem 7.6. Let ψ(t, x) be a solution of the free Schrödinger equation and B ⊆ R3 .
Then                    Z                    Z
                                     2 3
                   lim      |ψ(t, x)| d x =        |ψb0 (k)|2 dk .             (7.37)
                        t→∞    tB                      mB/~
As a consequence, the probability density of p is
                                         1 b p 2
                                            ψ0       .                                (7.38)
                                        ~3      ~
    The theorem essentially says that when we think of ψ0 as a linear combination of
plane waves eik·x as in Eq. (7.4) or (7.7), then the contribution from a particular value of
k will move at a velocity of ~k/m (shadows of the de Broglie relation p = ~k!), and in
the long run these contributions will tend to separate in space (i.e., overlap no longer),
leaving the contribution from k in the region around ~kt/m. We see the de Broglie
relation again in (7.38) when we insert p/~ for k in ψ. b The upshot of this analysis can
be formulated as
Born’s rule for momentum. If we measure the momentum of a particle with wave
function ψ then the outcome is random with probability density
                                          1 b p  2
                               ρmom (p) = 3 ψ        .                 (7.39)
                                         ~      ~
                                             32
Likewise, if we measure the momenta of N particles with joint wave function ψ(x1 , . . . , xN ),
then the outcomes are random with joint probability density
                                                1    p          p 2
                      ρmom (p1 , . . . , pN ) = 3N ψb 1 , . . . , N   .             (7.40)
                                               ~      ~           ~
   For this reason, the Fourier transform ψb is also called the momentum representation
of ψ, while ψ itself is called the position representation of the wave function.
Example 7.7. The general Gaussian wave packet (7.27), whose Born distribution in
position space is a Gaussian distribution with mean x0 and width σ, has momentum
distribution
                                                     2        2
                         ρmom (p) = (const.) e−2(σ/~) (p−~k0 ) ,           (7.41)
that is, a Gaussian distribution with mean ~k0 and width
                                                 ~
                                         σP =      .                                  (7.42)
                                                2σ
In particular, if we want a momentum distribution that is sharply peaked around some
value p0 = ~k0 , that is, if we want σP to be small, then σ must be large, so ψ must be
wide, “close to a plane wave.”
This relation motivates calling Pj = −i~ ∂x∂ j the momentum operator in the xj -direction,
and (P1 , P2 , P3 ) the vector of momentum operators.
                                             33
       We note for later use that, by the same reasoning
                             Z                       D       ∂ 2 E
                      hpj i = (~kj )2 |ψb0 (k)|2 dk = ψ0 −i~
                        2
                                                                  ψ0 .                   (7.50)
                                                             ∂xj
7.4       Tunneling
The tunnel effect is another quantum effect that is widely perceived as paradoxical.
Consider the 1-d Schrödinger equation with a potential V that has the shape of a
potential barrier of height V0 > 0. As an idealized example, suppose
In particular, the particle can never reach a region in which V (x) > E; so, if E < V0 ,
then the particle will turn around at the barrier and move back to the left.
    That is different in quantum mechanics. Consider a Gaussian wave packet, initially
to the left of the barrier, with a rather sharp momentum distribution around a p0 > 0
with p20 /2m < V0 . Then part of the packet will be reflected, and part of it will pass
through the barrier! (And
                        p the part that passes through is much larger than just the
tail of ρmom with p ≥ V0 /2m.) I will show you another movie created by B. Thaller
(http://vqm.uni-graz.at/movies.html) with a numerical simulation of the Schrödin-
ger equation with potential (7.51). As a consequence, the Born rule predicts a substantial
probability for the particle to show up on the other side of the barrier (“tunneling
probability”). Figure 2 shows the Bohmian trajectories for such a situation (with only
a small tunneling probability).
    For computing the tunneling probability, an easy recipe is to assume that the initial
ψ is close to a plane wave consider only the interior part of it that actually looks like a
plane wave. One solves the Schrödinger equation for a plane wave arriving, computes
the amount of probability current through the barrier, and compares it to the current
associated with the arriving wave.10
    What is paradoxical about tunneling? Perhaps not so much, once we give up New-
tonian mechanics and accept that the equation of motion can be non-classical, such as
Bohm’s. Then it is only to be expected that the trajectories are different, and not sur-
prising that some barriers which Newton’s trajectories cannot cross, Bohm’s trajectories
  10
    For further discussion of why that yields a reasonable result, see T. Norsen: The Pilot-Wave
Perspective on Quantum Scattering and Tunneling. American Journal of Physics 81: 258 (2013)
http://arxiv.org/abs/1210.7265.
                                              34
Figure 2: Bohmian trajectories in a tunneling situation. Picture taken from D. Bohm
and B. J. Hiley: The Undivided Universe, London: Routledge (1993)
can. Part of the sense of paradox comes perhaps from a narrative that is often told when
the tunnel effect is introduced: that the particle can “borrow” some energy for a short
amount of time by virtue of an energy–time uncertainty relation. This narrative seems
not very helpful.
    The tunnel effect plays a crucial role in radioactive α-decay (where the α-particle
leaves the nucleus by means of tunneling) and scanning tunneling electron microscopy
(where the distance between a needle and a surface is measured by means of measuring
the tunneling probability).
    There are further related effects: anti-tunneling means that a particle gets reflected
by a barrier so low that a classical particle with the same initial momentum would
be certain to pass it; this happens because a solution of the Schrödinger equation will
partly be reflected even at a low barrier. Another effect has been termed paradoxical
reflection:11 Consider a downward potential step as in
Classically, a particle coming from the left has probability zero to be reflected back, but
according to the Schrödinger equation, wave packets will be partly reflected and partly
  11
    For detailed discussion, see P. L. Garrido, S. Goldstein, J. Lukkarinen, and R. Tumulka: Paradoxical
Reflection in Quantum Mechanics. American Journal of Physics 79(12): 1218–1231 (2011) http:
//arxiv.org/abs/0808.0610
                                                  35
transmitted. Remarkably, in the limit V0 → ∞, the reflection probability converges to
1. “A quantum ball can’t roll off a cliff!” On a potential plateau, surrounded by deep
downward steps, a particle can be confined for a long time, although finally, in the limit
t → ∞, all of the wave function will leave the plateau region and propagate to spatial
infinity.
                                           36
8     Operators and Observables
8.1    Heisenberg’s Uncertainty Relation
As before, hXi denotes the expectation of the random variable X. The variance of the
momentum distribution for the initial wave function ψ ∈ L2 (R) (in one dimension) is
                                  D         2 E
                            σP2 := p − hpi                                      (8.1)
                                  D                 E
                                = p2 − 2phpi + hpi2                             (8.2)
                                = hp2 i − 2hpi2 + hpi2                          (8.3)
                                = hp2 i − hpi2                                  (8.4)
                                = hψ|P 2 ψi − hψ|P ψi2                          (8.5)
                                  D                 2 E
                                = ψ P − hψ|P ψi ψ .                             (8.6)
Theorem 8.1. (Heisenberg uncertainty relation) For any ψ ∈ L2 (R) with kψk = 1,
                                               ~
                                     σX σP ≥     .                             (8.10)
                                               2
   This means that any wave function that is very narrow must have a wide Fourier
transform.
Example 8.2. Consider the general Gaussian wave packet (7.27), for simplicity in 1
dimension. The standard deviation of the position distribution is σX = σ, and we
computed the width of the momentum distribution in (7.42). We thus obtain for this ψ
that
                                               ~
                                      σX σP = ,                               (8.11)
                                               2
just the lowest value allowed by the Heisenberg uncertainty relation.
                                          37
Example 8.3. Consider a wave packet passing through a slit. Let us ignore the part of
the wave packet that gets reflected because it did not arrive at the slit, and focus on just
the part that makes it through the slit. That is a narrow wave packet, and its standard
deviation in position, σX , is approximately the width of the slit. If that is very small
then, by the Heisenberg uncertainty relation, σP must be large, so the wave packet must
spread quickly after passing the slit. If the slit is wider, the spreading is weaker.
                                            ∗∗∗
    In Bohmian mechanics, the Heisenberg uncertainty relation means that whenever
the wave function is such that we can know the position of a particle with (small)
inaccuracy σX then we are unable to know its asymptotic velocity better than with
inaccuracy ~/(2mσX ); thus, we are unable to predict its future position after a large
time t (for V = 0) better than with inaccuracy ~t/(2mσX ). This is a limitation to
knowledge in Bohmian mechanics.
    The Heisenberg uncertainty relation is often understood as excluding the possibility
of particle trajectories. If the particle had a trajectory, the reasoning goes, then it would
have a precise position and a precise velocity (and thus a precise momentum) at any
time, so the position uncertainty would be zero and the momentum uncertainty would
be zero, so σX = 0 and σP = 0, in contradiction with (8.10). We know already from
Bohmian mechanics that this argument cannot be right. It goes wrong by assuming
that if the particle has a precise position and a precise velocity then they can also be
precisely known and precisely controlled. Rather, inhabitants of a Bohmian universe,
when they know a particle’s wave function to be ϕ(x), cannot know its position more
precisely than the |ϕ|2 distribution allows.
    In the traditional, orthodox view of quantum mechanics, it is assumed that electrons
do not have trajectories. It is assumed that the wave function is the complete description
of the electron, in contrast to Bohmian mechanics, where the complete description is
given by the pair (Q, ψ), and ψ alone would only be partial information and thus an
incomplete description. From these assumptions, it follows that the electron does not
have a position before we attempt to detect it. Likewise, it does not have a momentum
before we attempt to measure it. Thus, in orthodox quantum mechanics the Heisenberg
uncertainty relation does not amount to a limitation of knowledge because there is
no fact in the world that we do not know about when we do not know its position.
Unfortunately, the uncertainty relation is often expressed by saying that it is impossible
to measure position and momentum at the same time with arbitrary accuracy; while
this would be appropriate to say in Bohmian mechanics, it is not in orthodox quantum
mechanics because this formulation presumes that position and momentum have values
that we could discover by measuring them.
    The uncertainty relation is also involved in the double slit experiment as follows. If
it did not hold, we could make the electron move exactly orthogonal to the screen after
passing through the narrow slits–and arrive very near the center of the screen. Thus, the
distribution on the detection screen could not have a second- or third-order maximum.
                                             38
Since in orthodox quantum mechanics the double-slit experiment is understood as in-
dicative of a paradoxical nature of reality, the uncertainty relation is then understood as
“protecting” the paradox from becoming a visible contradiction. A similar argument, as
pointed out by Feynman, applies to the photon colliding with the electron for detecting
which slit it went through, and its effect of destroying the interference.
On this list of bad words from good books, the worst of all is ‘measurement.’
But first let us get acquainted with the mathematics of self-adjoint operators.
For an unbounded operator A : D(A) → H with dense domain D(A) ⊂ H , the adjoint
operator A† is uniquely defined by the property (8.13) for all ψ ∈ D(A† ) and φ ∈ D(A)
on the domain
                       n                                               o
              D(A ) = ψ ∈ H : ∃χ ∈ H ∀φ ∈ D(A) : hψ|Aφi = hχ|φi .
                  †
                                                                                  (8.14)
Example 8.6.
                                            39
  and φ = (φ1 , . . . , φn ),
                                           n
                                           X
                                hψ|Aφi =         ψi∗ (Aφ)i                        (8.16)
                                           i=1
                                           XX
                                       =                  ψi∗ Aij φj              (8.17)
                                            i     j
                                           XX
                                       =     (A∗ij ψi )∗ φj                       (8.18)
                                            j     i
                                           X X                     ∗
                                       =                   Bji ψi        φj       (8.19)
                                            j         i
                                           X
                                       =         (Bψ)∗j φj                        (8.20)
                                            j
                                       = hBψ|φi .                                 (8.21)
  As a consequence, an operator A is self-adjoint iff Aij = A∗ji .
• A unitary operator is usually not self-adjoint.
• Let H = L2 (Rd ), and let A be a multiplication operator,
                                   Aψ(x) = f (x) ψ(x) ,                           (8.22)
  such as the potential in the Hamiltonian or the position operators. Then A† is the
  multiplication operator that multiplies by f ∗ . Indeed,
                                    Z
                           hψ|Aφi =      ψ(x)∗ f (x)φ(x) dx                  (8.23)
                                     R d
                                    Z
                                                     ∗
                                  =      f ∗ (x) ψ(x) φ(x) dx                (8.24)
                                    = hf ∗ ψ|φi .                                 (8.25)
  (This calculation is rigorous if f is bounded. If it is not, them some discussion of
  the domains of A and A† is needed.) Thus, A is self-adjoint iff f is real-valued.
• On H = L2 (Rd ), the momentum operators Pj = −i~ ∂x∂ j are self-adjoint with the
  domain given by the first Sobolev space, i.e., the space of functions ψ L2 whose
  Fourier transform ψb has the property that k 7→ |k| ψb is still square-integrable. The
  relation (8.15) can easily be verified on nice functions using integration by parts:
                                     Z
                                                     ∂φ
                         hψ|Pj φi = ψ ∗ (x)(−i~)         (x) dx                    (8.26)
                                                     ∂xj
                                           ∂ψ ∗
                                        Z
                                  =−            (x)(−i~)φ(x) dx                    (8.27)
                                           ∂xj
                                     Z              ∗
                                              ∂ψ
                                  =       −i~     (x) φ(x) dx                      (8.28)
                                              ∂xj
                                  = hPj ψ|φi .                                     (8.29)
                                           40
   • In H = L2 (Rd ), the Hamiltonian is self-adjoint for suitable potentials V on a
     suitable domain. By formal calculation (leaving aside questions of domains), since
                                         d
                                        X    1 2
                                     H=        Pj + V ,                             (8.30)
                                        j=1
                                            2m
      we have that
                                       DX 1              E
                                                    2
                            hψ|Hφi = ψ            P +V φ                            (8.31)
                                          j
                                              2m j
                                     X 1
                                   =        hψ|Pj Pj φi + hψ|V φi                   (8.32)
                                      j
                                        2m
                                     X 1
                                   =        hPj ψ|Pj φi + hV ψ|φi                   (8.33)
                                      j
                                        2m
                                     X 1
                                   =        hPj Pj ψ|φi + hV ψ|φi                   (8.34)
                                      j
                                        2m
                                     DX P 2            E
                                             j
                                   =            +V ψ φ                              (8.35)
                                        j
                                           2m
                                     = hHψ|φi .                                     (8.36)
                                            41
   An orthonormal basis (ONB) is a set {φn } elements of the Hilbert space H such
that (a) hφm |φn i = δmn and (b) every ψ ∈ H can be written as a linear combination of
the φn ,                                  X
                                     ψ=     cn φn .                             (8.40)
                                            n
then                                    X
                                 Aψ =           α cα,λ φα,λ .                      (8.44)
                                         α,λ
                                               42
9     Spin
The phenomenon known as spin does not mean that the particle is spinning around its
axis, though it is in some ways similar. The simplest description of the phenomenon
is to say that the wave function of an electron (at time t) is actually not of the form
ψ : R3 → C but instead ψ : R3 → C2 . The space C2 is called spin-space and its elements
spinors (short for spin-vectors). We will in the following write S for spin-space.
for the inner product in spin-space, Eq. (9.2) can be expressed more succinctly as
ω(φ) = φ∗ σφ . (9.5)
For example, the spinor φ = (1, 0) has ω(φ) = (0, 0, 1), which points in the +z-direction;
(1, 0) is therefore called a spin-up spinor. The spinor (0, 1) has ω(0, 1) = (0, 0, −1),
which points in the −z-direction; (0, 1) is therefore called a spin-down spinor. ω has
the properties
                                    ω(zφ) = |z|2 ω(φ)                                (9.6)
and (homework problem)
                                      |ω(φ)| = kφk2S = φ∗ φ ,                                                  (9.7)
so unit spinors are associated with unit vectors.
                                                      43
   Spinors have the curious property that if we rotate a spinor φ in spin-space through
an angle θ, with angles in Hilbert space defined by the relation
                                                      hφ|χi
                                           cos θ =          ,                                (9.8)
                                                     kφkkχk
the corresponding direction ω(φ) in real space rotates through an angle 2θ. For example,
(0, 1) can be obtained from (1, 0) by rotating through 90◦ , while the corresponding vector
is rotated from the +z to the −z-direction, and thus through 180◦ . Expressed the other
way around, spinors rotate by half the angle of vectors. That is way one says that
electrons have spin one half. As a consequence, a rotation in real space by 360◦ will
correspond to one by 180◦ in spin space and carry φ to −φ, whereas a rotation in real
space by 720◦ will carry φ to itself.
     There are also other types of spinors, other than spin- 12 : spin-1, spin- 32 , spin-2, spin-
5
2
  , etc. The space of spin-s spinors has complex dimension 2s + 1, and the analogs of
the Pauli matrices are (2s + 1) × (2s + 1) matrices. In this context, wave functions
ψ : R3 → C are said to have spin 0. Electrons, quarks, and all known species of matter
particles have spin 12 ; the photon has spin 1; all known species of force particles have
integer spin; the only elementary particle species with spin 0 in the standard model of
particle physics is the Higgs particle or Higgs boson, which was experimentally confirmed
in 2012 at the Large Hadron Collider (LHC) of CERN in Geneva, Switzerland.
B = ∇ × A. (9.10)
(In words, B is the curl of A. The vector potential is, in fact, not uniquely defined by
this property, but different vector potentials satisfying (9.10) for the same magnetic field
can be translated into each other by gauge transformations, i.e., by different x-dependent
choices of the orthonormal basis in spin-space S.)
    The Hilbert space of wave functions with spin is denoted L2 (R3 , C2 ) and contains
the square-integrable functions R3 → C2 . The inner product is
                            Z                          Z           2
                                                                   X
                                       ∗
                  hψ|φi =          3
                                  d x ψ (x) φ(x) =            3
                                                              dx         ψs∗ (x) φs (x) .   (9.11)
                             R3                          R3        s=1
                                                 44
9.3    The Stern–Gerlach Experiment
Let us write                                     
                                            ψ1 (x)
                                    ψ(x) =          .                               (9.12)
                                            ψ2 (x)
    In the first half of a Stern–Gerlach experiment (first done in 1927 with silver atoms),
a wave packet moves through a magnetic field that is carefully designed so as to deflect
ψ1 (x) in a different direction than ψ2 (x), and thus to separate the two components
in space. Put differently, if the initial wave function ψ(t = 0) has support in the ball
Br (y) of radius r around the center y then the final wave function ψ(t = 1) (i.e., the
wave function after passing through   the magnetic field) is such that ψ1 (x, t = 1) has
support in B+ := Br y + (1, 0, d) and ψ2 (x, t = 1) in B− := Br y + (1, 0, −d) with
deflection distance d > r (so that ψ1 and ψ2 do not overlap). The arrangement creating
this magnetic field is called a Stern–Gerlach magnet. In the second half of the Stern–
Gerlach experiment, one applies detectors to the regions B± . If the electron is found in
B+ then the outcome of the experiment is said to be up, if in B− then down.
    A case of particular interest is that the initial wave function satisfies
where φ ∈ S, kφkS = 1, and χ : R3 → C, kχk = 1. One says that for such a ψ, the spin
degree of freedom is disentangled from the spatial degrees of freedom. (Before, we have
considered many-particle wave functions for which some particles were disentangled from
others. We may also consider a single particle and say that the x variable is disentangled
from the y and z variables iff ψ(x, y, z) = f (x) g(y, z).)
   In the case (9.13), the wave function after passing the magnet is
                                                     
                                   φ1 χ x − (1, 0, d) 
                                                            ,                        (9.14)
                                  φ2 χ x − (1, 0, −d)
and it follows from the Born rule for position that the probability of outcome “up” is
|φ1 |2 and that of “down” is |φ2 |2 .
     These probabilities agree with the general Born rule (8.45) for the observable A = σ3
on the Hilbert space H = S. The spinors φ+1 = (1, 0) and φ−1 = (0, 1) form an
orthonormal basis of S consisting of eigenvectors of σ3 (with eigenvalues +1 and −1,
respectively); φ plays the role of ψ in (8.45); its coefficients in the ONB referred to
in Eq. (8.45) are hφ+1 |ψi = φ1 and hφ−1 |ψi = φ2 . That is why the Stern–Gerlach
experiment is often called a “measurement of σ3 ”, or a “measurement of the z component
of spin.”
     The Stern–Gerlach magnet can be rotated into any direction. For example, by
rotating by 90◦ around the x-axis (a rotation that will map the z-axis to the y-axis),
we obtain an arrangement that will deflect part of the initial wave packet ψ in the +y-
direction and another part in the −y-direction. However, these parts are not φ1 and φ2 .
                                            45
Instead, they are the parts along a different ONB of S:
            1                     1
  φ(+) = √ (1, i) and φ(−) = √ (1, −i) form an ONB of S with ω(φ(±) ) = (0, ±1, 0).
             2                     2
                                                                                      (9.15)
                     3                                           (+)          (−)
That is, and ψ : R → S can be written as ψ(x) = c+ (x)φ + c− (x)φ , and these
two terms will get spatially separated (inRthe ±y direction, in fact). The probabilities
of outcomes “up” and “down” are then dx|c± (x)|2 . In the special case (9.13), the
probabilities are just |c± |2 , where φ = c+ φ(+) + c− φ(−) . Equivalently, the probabilities
are |hφ(±) |φi|2 . These values are in agreement with the general Born rule for A = σ2
because φ(±) are eigenvectors of σ2 with eigenvalues ±1.
    Generally, if the Stern–Gerlach magnet is rotated from the z-direction to direction
n, where n is any unit vector in R3 , then the probabilities of its outcomes are governed
by the Born rule (8.45) for A = n · σ, which for any n is a self-adjoint 2 × 2 matrix with
eigenvalues ±1.
                               dQ   ~  ψ ∗ ∇ψ
                                   = Im ∗ (t, Q(t)) .                                 (9.16)
                                dt  m   ψ ψ
Recall that ψ ∗ ψ means the inner product in spin-space, so the denominator means
   It follows that Q(t) has probability density |ψt |2 at every t. This version of the
equivariance theorem can be obtained by a very similar computation as in the spinless
case, involving the following variant of the continuity equation:
                           ∂|ψ(x, t)|2        ~           
                                       = −∇ ·    Im(ψ ∗ ∇ψ) .                         (9.20)
                               ∂t              m
   As a consequence of the equivariance theorem, Bohmian mechanics leads to the
correct probabilities for the Stern–Gerlach experiment.
                                             46
9.5    Is an Electron a Spinning Ball?
If it were then the following paradox would arise. According to classical electrodynamics
(which of course is well confirmed for macroscopic objects), a spinning, electrically
charged object behaves like a magnet in two ways: it creates its own magnetic field, and
it reacts to an external magnetic field. Just as the strength of the electric charge can be
expressed by a number, the charge e, the strength of the magnet can be expressed by
a vector, the magnetic dipole moment or just magnetic moment µ. Its direction points
from the south pole to the north pole, and its magnitude is the strength of the magnet.
The magnetic moment of a charge e spinning at angular frequency ω around the axis
along the unit vector u is, according to classical electrodynamics,
                                          µ = γeωu ,                                (9.21)
where the factor γ depends on the size and shape of the object. Furthermore, if such an
object flies through a Stern–Gerlach magnet oriented in direction n then, still according
to classical electrodynamics, it gets deflected by an amount proportional to µ · n. Put
differently, the Stern–Gerlach experiment for a classical object measures µz , or the
component of µ in the direction of n. The vector ωu is called the spin vector.
    Where is the paradox? It is that different choices of n, when applied to objects
with the same µ, would lead to a continuous interval of deflections [−γ|e|ω, +γ|e|ω],
whereas the Stern–Gerlach experiment, for whichever choice of n, leads to a discrete set
{+d, −d} of two possible deflections.
    The latter fact was called by Wolfgang Pauli the “non-classical two-valuedness of
spin.” This makes it hard to come up with a theory in which the outcome of a Stern–
Gerlach experiment has anything to do with a spinning motion. While Feynman went
too far when claiming that the double-slit experiment does not permit any deeper ex-
planation, it seems safe to say that the Stern–Gerlach experiment does not permit an
explanation in terms of spinning balls. Note also that Bohmian mechanics does not
involve any spinning motion to account for (what has come to be called) spin.
where σ (k) means σ acting on the index sk of ψ. In Bohm’s equation of motion (9.16),
replace Q ∈ R3 by Q ∈ R3N and sum over all spin indices sj whenever taking the spin
inner product φ∗ ψ.
                                                 47
9.7    Representations of SO(3)
A deeper understanding of spinors comes from group representations.12 Let us start
easily. Consider the wave function of a single particle. Suppose it were, instead of
a complex scalar field, a vector field, so ψ : R3 → R3 . Well, it should be complex,
so we complexify the vector field, ψ : R3 → C3 . Now rotate your coordinate system
according to R ∈ SO(3). Then in the new coordinates, the same physical wave function
is represented by a different mathematical function,
Instead of real-valued potentials, the Schrödinger equation could then include matrix-
valued potentials, provided the matrices are always self-adjoint:
                                      ∂ψ    ~2
                                 i~      =−    ∆ψ + V ψ .                               (9.25)
                                      ∂t    2m
Now consider another possibility: that the wave function is tensor-valued, ψab with
a, b = 1, 2, 3. Then in a rotated coordinate system,
                                          3
                                          X
                             ψ̃ab (x) =           Rac Rbd ψcd (R−1 x) .                 (9.26)
                                          c,d=1
What the two examples have in common is that the components of the wave function
get transformed as well according to the scheme, for ψ : R3 → Cd ,
                                          d
                                          X
                              ψ̃r (x) =         Mrs (R) ψs (R−1 x) .                    (9.27)
                                          s=1
which means that they form a representation of the group SO(3) of rotations—in other
words, a homomorphism from SO(3) to GL(Cd ), the “general linear group” comprising
all invertible operators on Cd . Further representations of SO(3) provide further possible
value spaces for wave functions ψ.
     Spin space S for spin- 21 is almost of this kind, but there is one more complication:
SO(3) is represented, not by linear mappings S → S, but by mappings P (S) → P (S)
consistent with linear mappings, where P (S) is the set of all 1-dimensional subspaces
of S (called the projective space of S). This seems fitting as two wave functions that
differ only by a phase factor, φ(x) = eiθ ψ(x), are usually regarded as representing the
same physical state (they yield the same Born distribution, at all times and for all
  12
    More details about the topic of this section can be found in R. U. Sexl and H. K. Urbantke:
Relativity, Groups, Particles, Springer-Verlag (2001).
                                                   48
observables, and the same Bohmian trajectories for all times). That is, one can say that
a wave function is really an element of P (H ) rather than H because every normalized
element of Cψ is as good as ψ.
   By a mapping F : P (S) → P (S) consistent with a linear mapping, I mean an F
such there is a linear mapping M : S → S with F (Cψ) = CM ψ. While M determines
F uniquely, F does not determine M , as zM with any z ∈ C \ {0} leads to the same F .
In particular, if we are given F (R) and want an M (R), then M (R) is always another
possible candidate. For spin- 21 , it turns out that while F (R1 ) F (R2 ) = F (R1 R2 ) as it
should, M (R) can at best be found in such a way that
This sign mismatch has something to do with the halved angles. The M are elements of
SU (2) (unitary with determinant 1), and with every element R of SO(3) are associated
two elements of SU (2) that differ by a sign.
   This association can actually be regarded as a mapping
This mapping ϕ is a group homomorphism (i.e., ϕ(M1 )ϕ(M2 ) = ϕ(M1 M2 ) and ϕ(I) =
I), is smooth, two-to-one [ϕ(−M ) = ϕ(M )], and locally a diffeomorphism. The situation
is similar to the group homomorphism χ : R → U (1), θ 7→ eiθ , which is also smooth,
many-to-one, and locally a diffeomorphism; just like R is what you get from the circle
U (1) when you unfold it, SU (2) is what you get from SO(3) when you “unfold” it. (The
unfolding of a manifold Q is called the covering space Q,     \ = SU (2).) For every
                                                        b so SO(3)
continuous curve γ in SO(3) starting in I, there is a unique continuous curve γ̂ in SU (2)
with ϕ ◦ γ̂ = γ, called the lift of γ. Thus, continuous rotations in R3 can be translated
uniquely into continuous rotations in S.
    The upshot of all this is that spinors are one of the various types of mathematical
objects (besides vectors and tensors) that react to rotations in a well-defined way, and
that is why they qualify as possible values of a wave function.
                                             49
10      The Projection Postulate
10.1     Notation
In the Dirac notation one writes |ψi for ψ. This may seem like a waste of symbols at
first, but often it is the opposite, as it allows us to replace a notation such as φ1 , φ2 , . . .
by |1i, |2i, . . .. Of course, a definition is needed for what |ni means, just as one would
be needed for φn . It is also convenient when using long subscripts, such as replacing
ψleft slit by |left sliti. In spin space S, one commonly writes
                                                                   
                                            1                        0
                          |z-upi = ↑ =          , |z-downi = ↓ =                        (10.1)
                                            0                        1
                                                              
                                     1 1                    1     1
                          |y-upi = √          , |y-downi = √                            (10.2)
                                      2 i                    2 −i
                                                              
                                     1 1                    1     1
                          |x-upi = √          , |x-downi = √                            (10.3)
                                      2  1                   2   −1
(Compare to Eq. (9.15) and Exercise 11 in Assignment 4, and to Maudlin’s article.)
    Furthermore, in the Dirac notation one writes hφ| for the mapping H → C given
by ψ 7→ hφ|ψi. Obviously, hφ| applied to |ψi gives hφ|ψi, which suggested the notation.
Paul Dirac called hφ| a bra and |ψi a ket. Obviously, hφ|A|ψi means the same as
hφ|Aψi. Dirac suggested that for self-adjoint A, the notation hφ|A|ψi conveys better
that A can be applied equally well to either φ or ψ. |φihφ| is an operator that maps ψ
to |φihφ|ψi = hφ|ψiφ. If φ is a unit vector then this is the part of ψ parallel to φ, or the
projection of ψ to φ.
    Another common and useful notation is ⊗, called the tensor product. For
                                      Ψ(x, y) = ψ(x) φ(y)                                  (10.4)
one writes
                                            Ψ = ψ ⊗ φ.                                     (10.5)
Likewise, for Eq. (9.13) one writes ψ = φ ⊗ χ.
   The symbol ⊗ has another meaning when applied to Hilbert spaces.
                                  L2 (x, y) = L2 (x) ⊗ L2 (y) ,                            (10.6)
where L2 (x) means the square-integrable functions of x, etc. Likewise, when we replace
the continuous variable y by the discrete index s for spin, the tensor product of the
Hilbert space C2 of vectors φs and the Hilbert space L2 (R3 , C) of wave functions χ(x)
is the Hilbert space L2 (R3 , C2 ) of wave functions ψs (x):
                                C2 ⊗ L2 (R3 , C) = L2 (R3 , C2 ) .                         (10.7)
   Another notation we use is
                            f (t−) = lim f (s) ,     f (t+) = lim f (s)                    (10.8)
                                      s%t                     s&t
                                                50
10.2      The Projection Postulate
Here is the last rule of the quantum formalism:
Now change ψ by replacing some of the coefficients cn by zero while retaining the others
unchanged:                              X
                                   ψ̃ =     cn φn ,                              (10.14)
                                                  n∈J
where J is the set of those indices retained. This procedure is called projection to the
subspace spanned by {φn : n ∈ J}, and the projection operator is
                                        X
                                   P =      |φn ihφn | .                         (10.15)
                                               n∈J
                                                   51
    In Eq. (10.9), the index n numbers the index pairs (α, λ), and the subset J corre-
sponds to those pairs that have a given α and arbitrary λ. Except for the factor C,
the RHS of (10.9) is the corresponding projection of ψt− , which gives the projection
postulate its name. The subspace of Hilbert space spanned by the φα,λ with given α
is the eigenspace of A with eigenvalue α, which is the set of all eigenvectors of A with
eigenvalue α (together with the zero vector).
    For every closed subspace, there is a projection operator that projects to this sub-
space. For example, for any region B ⊆ R3N in configuration space, the functions whose
support lies in B (i.e., which vanish outside B) form an ∞-dimensional closed subspace
of L2 (R3N ). The projection to this subspace is
                                            (
                                              ψ(q) q ∈ B
                               (PB ψ)(q) =                                        (10.16)
                                              0     q∈
                                                     / B,
10.4     Remarks
According to the projection postulate (also known as the measurement postulate or
the collapse postulate), the wave function changes dramatically in a measurement. The
change is known as the reduction of the wave packet or the collapse of the wave function.
     For example, in a spin-z (or σ3 -) measurement, the wave function before the mea-
surement is an arbitrary spinor (φ1 , φ2 ) ∈ S with |φ1 |2 + |φ2 |2 = 1 (assuming Eq. (9.13)
and ignoring the space dependence). With probability |φ1 |2 , we obtain outcome “up”
and the collapsed spinor (φ1 /|φ1 |, 0) after the measurement. The term φ1 /|φ1 | is just
the phase of φ1 . With probability |φ2 |2 , we obtain “down” and the collapsed spinor
(0, φ2 /|φ2 |).
     With the projection postulate, the formalism provides a prediction of probabilities
for any sequence of measurements. If we prepare the initial wave function ψ0 and make
a measurement of A1 at time t1 then the Schrödinger equation determines what ψt1 −
is, the general Born rule (8.45) determines the probabilities of the outcome α1 , and the
projection postulate the wave function after the measurement. The latter is the initial
wave function for the Schrödinger equation, which governs the evolution of ψ until the
time t2 at which the second measurement, of observable A2 , occurs. The probability
distribution of the outcome α2 is given by the Born rule again and depends on α1 because
the initial wave function in the Schrödinger equation, ψt1 + , did. And so on. This scheme
is the quantum formalism. Note that the observer can choose t2 and A2 after the first
measurement and thus make this choice depend on the first outcome α1 .
     The projection postulate implies that if we make another measurement of A right
after the first one, we will with probability 1 obtain the same outcome α.
     For a position measurement, the projection postulate implies that the wave function
collapses to a delta function. This is not realistic, it is over-idealized. A delta function
is not a square-integrable function, and it contains in a sense an infinite amount of
                                            52
energy. More realistically, a position measurement has a finite inaccuracy ε and could
be expected to collapse the wave function to one of width ε, such as
                                               (x−α)2
                               ψt+ (x) = Ce−     4ε2    ψt− (x) .                   (10.17)
    You may feel a sense of paradox about the two different laws for how ψ changes with
time: the unitary Schrödinger evolution and the collapse rule. Already at first sight,
the two seem rather incompatible: the former is deterministic, the latter stochastic; the
former is continuous, the latter not; the former is linear, the latter not. It seems strange
that time evolution is governed not by a single law but by two. And even stranger that
the criterion for when the collapse rule takes over is something as vague as an observer
making a measurement. Upon scrutiny, the sense of paradox will persist and even deepen
in the form of what is known as the measurement problem of quantum mechanics.
                                             53
11     The Measurement Problem
11.1     What the Problem Is
This is a problem about orthodox quantum mechanics. It is solved in Bohmian mechan-
ics and several other theories. Because of this problem, some regard the orthodox view
as incoherent when it comes to analyzing the process of measurement.
    Consider a “quantum measurement of the observable A.” Realistically, there are
only finitely many possible outcomes, so A should have finite spectrum. Consider the
system formed by the object together with the apparatus. Since the apparatus consists
of electrons and quarks, too, it should itself be governed by quantum mechanics. (That
is reductionism at work.) So I write Ψ for the wave function of the system (object
and apparatus). Suppose for simplicity that the system is isolated (i.e., there is no
interaction with the rest of the universe), so Ψ evolves according to the Schrödinger
equation during the experiment (recall Exercise 13 of Assignment 3), which begins (say)
at t1 and ends at t2 . It is reasonable to assume that
with ψ = ψ(t1 ) the wave function of the object before the experiment and φ a wave
function representing a “ready” state of the apparatus. By the spectral theorem, ψ can
be written as a linear combination (superposition) of eigenfunctions of A,
                        X
                    ψ=      cα ψα with Aψα = αψα and kψα k = 1 .                 (11.2)
                         α
    If the object’s wave function is an eigenfunction ψα , then, by Born’s rule (8.45), the
outcome is certain to be α. Set Ψα (t1 ) = ψα ⊗ φ. Then Ψα (t2 ) must represent a state
in which the apparatus displays the outcome α.
    Now consider again a general ψ as in Eq. (11.2). Since the Schrödinger equation is
linear, the wave function of object and apparatus together at t2 is
                                           X
                                  Ψ(t2 ) =     cα Ψα (t2 ) ,                         (11.3)
                                            α
   • The evolution of the wave function of an isolated system is always given by the
     Schrödinger equation.
                                            54
Thus, we have to drop one of these assumptions. The first is dropped in the many-
worlds picture, in which all outcomes are realized, albeit in parallel worlds. If we drop
the second, we opt for additional variables as in Bohmian mechanics, where the state
at time t is described by the pair (Qt , ψt ). If we drop the third, we opt for replacing
the Schrödinger equation by a non-linear evolution (as in the GRW = Ghirardi–Rimini–
Weber approach). Of course, a theory might also drop several of these assumptions.
Orthodox quantum mechanics insists on all three assumptions, and that is why it has a
problem.
    We took for granted that the system was isolated and had a wave function. We may
wonder whether that was asking too much. However, we could just take the system to
consist of the entire universe, so it is disentangled and isolated for sure. More basically,
if we cannot solve the measurement problem for an isolated system with a wave function
then we have no chance of solving it for a system entangled with outside particles.
Since the Ψα have disjoint supports in the configuration space (of object and apparatus
together), and since the particle configuration Q has distribution |Ψ|2 , the probability
that Q lies in the support of Ψα is
                               Z                    Z
                                       3N       2
                                                       d3Nq |cα Ψα (q)|2 = |cα |2 , (11.5)
                       
   P Q ∈ support(Ψα ) =              d q |Ψ(q)| =
                           support(Ψα )              R3N
which agrees with the prediction of the quantum formalism for the probability of the
outcome α. And indeed, when Q ∈ support(Ψα ), then the particle positions (including
the particles of both the object and the apparatus!) are such that the pointer of the
apparatus points to the value α. Thus, the way out of the measurement problem is
that although the wave function is a superposition of terms corresponding to different
outcomes, the actual particle positions define the actual outcome.
    As a consequence of the above consideration, we also see that the predictions of
Bohmian mechanics for the probabilities of the outcomes of experiments agree with
those of standard quantum mechanics. In particular, there is no experiment that could
empirically distinguish between Bohmian mechanics and standard quantum mechanics,
while there are (in principle) experiments that distinguish the two from a GRW world.
    If Bohmian mechanics and standard quantum mechanics agree about all probabili-
ties, then where do we find the collapse of the wave function in Bohmian mechanics?
There are two answers, depending on which wave function we are talking about. The
first answer is, if the Ψα are macroscopically different then they will never overlap again
                                            55
                                                                                10
(until the time when the universe reaches thermal equilibrium, perhaps in 1010 years);
this fact is called decoherence. If Q lies in the support of one among several disjoint
packets then only the packet containing Q is relevant, by Bohm’s law of motion (6.1),
to determining dQ/dt. Thus, as long as the packets stay disjoint, only the packet con-
taining Q is relevant to the trajectories of the particles, and all other packets could be
replaced by zero without affecting the trajectories. That is why we can replace Ψ by
cα Ψα , with α the actual outcome. Furthermore, the factor cα cancels out in Bohm’s law
of motion (6.1) and thus can be dropped as well.
    The second answer is, the quantum formalism does not, in fact, talk about the wave
function Ψ of object and apparatus but about the wave function ψ of the object alone.
This leads us to the question what is meant by the wave function of a subsystem. If
then it is appropriate to call ψ the wave function of the x-system, but in general Ψ does
not factorize as in (11.6). In Bohmian mechanics, a natural general definition for the
wave function of a subsystem is the conditional wave function
where Y is the actual configuration of the y-system (while x is not the actual configu-
ration X but any configuration of the x-system) and
                                    Z                   −1/2
                                                    2
                              N =        |Ψ(x, Y )| dx                                (11.8)
is the normalizing factor. The conditional wave function does not, in general, evolve
according to a Schrödinger equation, but in a complicated way depending on Ψ, Y ,
and X. There are special situations in which the conditional wave function does evolve
according to a Schrödinger equation, in particular when the x-system and the y-system
do not interact and the wave packet in Ψ containing Q = (X, Y ) is of a product form such
as (11.6). Indeed, this is the case for the object before, but not during the measurement;
as a consequence, the wave function of the object (i.e., its conditional wave function)
evolves according to the Schrödinger equation before, but not during the measurement—
in agreement with the quantum formalism. To determine the conditional wave function
after the quantum measurement, suppose that Ψα is of the form
Ψα = ψα ⊗ φα (11.9)
with φα a wave function of the apparatus with the pointer pointing to the value α.
Let α be the actual outcome, i.e., Q ∈ support(Ψα ). Then Y ∈ support(φα ) and the
conditional wave function is indeed
ψ = ψα . (11.10)
                                            56
11.3      Schrödinger’s Cat
Often referred to in the literature, this is Schrödinger’s13 1935 formulation of the mea-
surement problem:
       One can even set up quite ridiculous cases. A cat is penned up in a steel
       chamber, along with the following diabolical device (which must be secured
       against direct interference by the cat): in a Geiger counter there is a tiny
       bit of radioactive substance, so small, that perhaps in the course of one hour
       one of the atoms decays, but also, with equal probability, perhaps none; if it
       happens, the counter tube discharges and through a relay releases a hammer
       which shatters a small flask of hydrocyanic acid. If one has left this entire
       system to itself for an hour, one would say that the cat still lives if meanwhile
       no atom has decayed. The first atomic decay would have poisoned it. The
       ψ-function of the entire system would express this by having in it the living
       and dead cat (pardon the expression) mixed or smeared out in equal parts.
       It is typical of these cases that an indeterminacy originally restricted to the
       atomic domain becomes transformed into macroscopic indeterminacy, which
       can then be resolved by direct observation. That prevents us from so naively
       accepting as valid a “blurred model” for representing reality. In itself it
       would not embody anything unclear or contradictory. There is a difference
       between a shaky or out-of-focus photograph and a snapshot of clouds and
       fog banks.
                                               57
system, as it is the only information about the system that can be found experimentally
without disturbing ψ. They tend not to take the measurement problem seriously.
    Realism is the view that a fundamental physical theory is meaningless unless it
provides a coherent story of what happens. Bohmian mechanics, GRW theory, and
many-worlds are examples of realist theories. For a realist, the quantum formalism
by itself does not qualify as a fundamental physical theory. The story provided by
Bohmian mechanics, for example, is that particles have trajectories, that there is a
physical object that is mathematically represented by the wave function, and that the
two evolve according to certain equations. For a realist, the measurement problem is
serious and can only be solved by denying one of the 3 conflicting premises.
                                          58
12      The GRW Theory
Bohmian mechanics is not the only possible explanation of quantum mechanics. Another
one is provided by the GRW theory, named after GianCarlo Ghirardi, Alberto Rimini,
and Tullio Weber, who proposed it in 1986. A similar theory, CSL (for continuous
spontaneous localization), was proposed by Philip Pearle in 1989. In both theories, Ψt
does not evolve according to the Schrödinger equation, but according to a modified
evolution law. This evolution law is stochastic, as opposed to deterministic. That is, for
any fixed Ψ0 , it is random what Ψt is, and the theory provides a probability distribution
over Hilbert space. A family of random variables Xt , with one variable for every time t,
is called a stochastic process. Thus, the family (Ψt )t>0 is a stochastic process in Hilbert
space. We leave CSL aside and focus on the GRW process. In it, periods governed by
the Schrödinger equation are interrupted by random jumps. Such a jump occurs, within
any infinitesimal time interval dt, with probability λ dt, where λ is a constant called
the jump rate. Let us call the random jump times T1 , T2 , . . .; the sequence T1 , T2 , . . . is
known as the Poisson process with rate λ; it has widespread applications in probability
theory. Let us have a closer look.
To compute this quantity, we reason as follows. If T1 has not occured until t, then the
probability that it will occur within the next dt is λ dt. Thus, (12.2) differs from (12.1)
by a factor λ dt, or, as the factor dt cancels out,
                                               59
where the expression 1C is 1 whenever the condition C is satisfied, and 0 otherwise. The
distribution (12.3) is known as the exponential distribution with parameter λ, Exp(λ).
We have thus found that the waiting time for the first event has distribution Exp(λ).
    After T1 , the next dt has again probability λ dt for the next event to occur. The
above reasoning can be repeated, with the upshot that the waiting time T2 − T1 for
the next event has distribution Exp(λ) and is independent of what happened up to time
T1 . The same applies to the other waiting times Tn+1 − Tn . In fact, at any time t0 the
waiting time until the next event has distribution Exp(λ).
    The exponential distribution has expectation value
                                   Z ∞
                                                   1
                                        t ρ(t) dt = .                              (12.4)
                                     0             λ
This fact is very plausible if you think of it this way: If in every second the probability of
an earthquake is, say, 10−8 , then you would guess that an earthquake occurs on average
every 108 seconds. The constant λ, whose dimension is 1/time, is thus the average
frequency of the earthquakes (or whichever events).
    Another way of representing the Poisson process is by means of the random variables
Theorem 12.1. If the earthquakes in Australia are governed by a Poisson process with
rate λ1 and the earthquakes in Africa are governed by a Poisson process with rate λ2 , and
the earthquakes in the two places are independent of each other, then the earthquakes in
Africa and Australia together are governed by a Poisson process with rate λ1 + λ2 .
Theorem 12.2. If we choose n points at random in the interval [0, n/λ], independently
with uniform distribution, then the joint distribution of these points converges, as n →
∞, to the Poisson process with parameter λ.
                                              60
Alternatively, Steven Adler suggested
with
                                                   1              2   2
                                  gX,σ (x) =        2  3/2
                                                           e−(X−x) /2σ .                            (12.10)
                                                (2πσ )
The point X k ∈ R3 is chosen at random with probability density
where ρ(· · · | · · · ) means the probability density, given the values of T1 , . . . , Tk , X 1 , . . . , X k−1 .
The right hand side of (12.11) is indeed a probability density because it is nonnegative
and
    Z                              Z                    Z      Z
      d y ρ(X k = y| · · · ) = d y kC(y)Ψk = d y d3 x |C(y)Ψ(x)|2 = (12.12)
       3                              3            2        3
                         Z        Z                               Z
                             3         3                    2
                     =       dx       d y gy,σ (x) |Ψ(x)| =           d3 x |Ψ(x)|2 = 1 .            (12.13)
  14
    Or rather, outside of the universe, as the idea is that the entire universe is governed by GRW
theory.
                                                       61
   For arbitrary N ∈ N and Ψt = Ψt (x1 , . . . , xN ),
                                                   CIk (X k )ΨTk −
                                       ΨTk + =                                                    (12.14)
                                                  kCIk (X k )ΨTk − k
where the collapse operator CI (X) is the following multiplication operator:
                                             q
                  CI (X)Ψ(x1 , . . . , xN ) = gX,σ (xI ) Ψ(x1 , . . . , xN ) .                    (12.15)
This completes the definition of the GRW process. But not yet the definition of the
GRW theory.
In words, one starts with the |ψ|2 –distribution in configuration space R3N , then obtains
the marginal distribution of the i-th degree of freedom xi ∈ R3 by integrating out all
other variables xj , j 6= i, multiplies by the mass associated with xi , and sums over i.
F = {(X 1 , T1 , I1 ), . . . , (X k , Tk , Ik ), . . .} . (12.18)
    Note that if the number N of the degrees of freedom in the wave function is large,
as in the case of a macroscopic object, the number of flashes is also large (if λ = 10−16
                                                      62
s−1 and N = 1023 , we obtain 107 flashes per second). Therefore, for a reasonable choice
of the parameters of the GRWf theory, a cubic centimeter of solid matter contains more
than 107 flashes per second. That is to say that large numbers of flashes can form
macroscopic shapes, such as tables and chairs. “A piece of matter then is a galaxy of
[flashes].” (Bell, page 205) That is how we find an image of our world in GRWf.
   A few remarks. The m function of GRWm and the flashes of GRWf are called the
primitive ontology of the theory. Ontology means what exists according to a theory; for
example, in Bohmian mechanics ψ and Q, in GRWm ψ and m, in GRWf ψ and F . The
“primitive” ontology is the part of the ontology representing matter in 3-d space (or 4-d
space-time): Q in Bohmian mechanics, m in GRWm, and F in GRWf.
   It may be seem that a continuous distribution of matter should conflict with the
evidence for the existence of atoms, electrons and quarks, and should thus make wrong
predictions. We will see below why that is not the case—why GRWm makes nearly the
same predictions as the quantum formalism.
                                             63
function Ψ, ρ(X = y) is essentially the marginal of |Ψ|2 connected to the xI -variable,
i.e., the distribution on 3-space obtained from the |Ψ|2 distribution on 3N -space by
integrating out 3N − 3 variables. (More precisely, smeared over width σ.) Thus, again,
on the macroscopic scale, the distribution of X is the same as the quantum mechanical
probability distribution for the position of the I-th particle.
     A wave function like the one we encountered in the measurement problem,
                                           X
                                      Ψ=       cα Ψα ,                         (12.19)
                                              α
where Ψα is a wave function corresponding to the pointer pointing to the value α, would
behave in the following way. Assuming the pointer contains 1023 particles, then every
10−7 sec a collapse would occur connected to one of the pointer particles. Since Ψα is
concentrated in a region in configuration space where all of the pointer particles are at
some location y α , and assuming that the y α are sufficiently distant for different values of
α (namely much more than σ), a single collapse connected to any of the pointer particles
will suffice for essentially removing all contributions Ψα except one. Indeed, suppose
the collapse is connected to the particle xi , which is one of the pointer particles. Then
the random center X of the collapse will be distributed according to a coarse-grained
version of the i-th marginal of |Ψ|2 ; since the separation between the y α is greater than
σ, we can neglect the coarse graining, and we can just take the i-th marginal of the
|Ψ|2 distribution. Thus, X will be close to one of the y α , and the probability that
X is close to y α0 is |cα |2 . Then, the multiplication by a Gaussian centered at X will
shrink all other packets Ψα by big factors, of the order exp(−(y α −y α0 )2 /2σ 2 ), effectively
collapsing them away.
     Thus, within a fraction of a second, a superposition such as (12.19) would decay into
one of the packets Ψα (times a normalization factor), and indeed into Ψα0 with proba-
bility |cα |2 , the same probability as attributed by quantum mechanics to the outcome
α0 .
     Let us make explicit how GRW succeeded in setting up the laws in such a way
that they are effectively different laws for microscopic and macroscopic objects: (i) We
realize that a few collapses (or even a single collapse) acting on a few (or one) of the
pointer particles will collapse the entire wave function Ψ of object and apparatus together
to essentially just one of the contributions Ψα . (ii) The frequency of the collapses
is proportional to the number of particles (which serves as a quantitative measure of
“being macroscopic”). (iii) We can’t ensure that microscopic systems experience no
collapses at all, but we can ensure the collapses are    Pvery infrequent. (iv) We can’t
ensure that macroscopic superpositions such as Ψ =          cα Ψα collapse immediately, but
we can ensure they collapse within a fraction of a second.
                                              64
                   104                                                          104
                   10−8                                                         10−8
                                       Adler                                                   Adler
                     −12                                                          −12
                   10                                                           10
          λ [s ]
                                                                       λ [s ]
                                   GRW                                                        GRW
              −1
                                                                       −1
                   10−16                                                        10−16
                   10−20                                                        10−20
                   10−24                                                        10−24
                   10−28                                                        10−28
                                       PUR                                                     PUR
                   10−32                                                        10−32
10−36 10−36
Figure 3: Parameter diagram (log-log-scale) of the GRW theory with the primitive on-
tology given by (a) flashes, (b) the matter density function. ERR = empirically refuted
region as of 2012 (equal in (a) and (b)), PUR = philosophically unsatisfactory region.
GRW’s and Adler’s choice of parameters are marked. Figure taken from W. Feldmann
and R. Tumulka: Parameter Diagrams of the GRW and CSL Theories of Wave Func-
tion Collapse. Journal of Physics A: Mathematical and Theoretical 45: 065304 (2012)
http://arxiv.org/abs/1109.6579
for which GRW theory predicts different outcomes than the quantum formalism. Here
is an example. GRW theory predicts that if we keep a particle isolated it will sponta-
neously collapse after about 100 million years, and quantum mechanics predicts it will
not collapse. So let’s take 104 electrons, for each of them prepare its wave function to
be a superposition of a packet in Paris and a packet in Tokyo; let’s keep each electron
isolated for 100 million years; according to GRW, a fraction of
                            Z    1/λ                   Z   1
                                          −λt
                                       λe       dt =           e−s ds = 1 − e−1 = 63.2%                       (12.20)
                             0                         0
of the 104 wave functions will have collapsed; according to quantum mechanics, none
will have collapsed; now let’s bring the packets from Paris and Tokyo together, let
them overlap and observe the interference pattern; according to quantum mechanics, we
should observe a clear interference patterns; if all of the wave functions had collapsed, we
should observe no interference pattern at all; according to GRW, we should observe only
                                                                65
a faint interference pattern, damped (relative to the quantum prediction) by a factor
of e. Ten thousand points should be enough to decide whether the damping factor is
there or not. This example illustrates two things: that in principle GRW makes different
predictions, and that in practice these differences may be difficult to observe (because
of the need to wait for 100 million years, and because of the difficulty with keeping the
electrons isolated for a long time, in particular avoiding decoherence).
    Another testable consequence of the GRW process is universal warming. Since the
GRW collapse usually makes wave packets narrower, their Fourier transforms (momen-
tum representation) become wider, by the Heisenberg uncertainty relation. As a ten-
dency, this leads to a long-run increase in energy. This effect amounts to a spontaneous
warming at a rate of the order of 10−15 K per year.
    No empirical test of GRW theory against the quantum formalism can presently be
carried out, but experimental techniques are progressing; see Figure 3. Adler’s pa-
rameters have in the meantime been empirically refuted as a byproduct of the LIGO
experiment that detects gravitational waves. A test of GRW’s parameters seems fea-
sible using a planned interferometer on a satellite in outer space. Interferometers are
disturbed by the presence of air, temperatures far from absolute zero, vibrations of the
apparatus, and the presence of gravity; that is why being in outer space is an advantage
for an interferometer and allows for heavier objects shot through the double slit and
longer flight times. Such an interferometer is being considered by the European Space
Agency ESA and may be up and running in 2025.
                                            66
that are macroscopically equivalent to each other. So we can read off from the post-
measurement wave function, e.g., what the actual outcome of a quantum measurement
was.
   On the other hand, there is a logical gap between saying
and saying
                                     “there is a live cat.”                              (12.22)
After all, in Bohmian mechanics, (12.22) follows from (12.21) by virtue of a law of the
theory, which asserts that the configuration Q(t) is |ψt |2 distributed at every time t.
Thus, Bohmian mechanics suggests that (12.22) would not follow from (12.21) if there
was not a law connecting the two by means of the primitive ontology. If that is so, then
it does not follow in GRW∅ either. Another indication in this direction is the fact that
the region “PUR” in Figure 3 depends on the primitive ontology we consider, GRWf or
GRWm.
    Other aspects of the question whether GRW∅ is a satisfactory theory have to do
with a number of paradoxes that arise in GRW∅ but evaporate in GRWf and GRWm.15
For the sake of simplicity, I will focus on GRWm and leave aside GRWf.
   Paradox: Here is a reason one might think that the GRW theory fails to solve
the measurement problem. Consider a quantum state like Schrödinger’s cat, namely a
superposition
                                ψ = c1 ψ1 + c2 ψ2                            (12.23)
of two macroscopically distinct states ψi with kψ1 k = 1 = kψ2 k, such that both contri-
butions have nonzero coefficients ci . Given that there is a problem—the measurement
problem—in the case in which the coefficients are equal, one should also think that there
is a problem in the case in which the coefficients are not exactly equal, but roughly of
the same size. One might say that the reason there is a problem is that, according to
quantum mechanics, there is a superposition whereas according to our intuition there
should be a definite state. But then it is hard to see how this problem should go away
just because c2 is much smaller than c1 . How small would c2 have to be for the problem
to disappear? No matter if c2 = c1 or c2 = c1 /100 or c2 = 10−100 c1 , in each case both
contributions are there. But the only relevant effect of the GRW process replacing the
unitary evolution, as far as Schrödinger’s cat is concerned, is to randomly make one of
the coefficients much smaller than the other (although it also affects the shape of the
suppressed contribution).
    Answer: From the point of view of GRWm, the reasoning misses the primitive
ontology. Yes, the wave function is still a superposition, but the definite facts that our
intuition wants can be found in the primitive ontology. The cat is made of m, not of
  15
    The following discussion is adapted from R. Tumulka: Paradoxes and Primitive Ontology in Col-
lapse Theories of Quantum Mechanics. Pages 139–159 in S. Gao (editor), Collapse of the Wave Func-
tion, Cambridge University Press (2018) https://arxiv.org/abs/1102.5767.
                                               67
ψ. If ψ is close to |deadi, then m equals m|deadi up to a small perturbation, and that
can reasonably be accepted as the m function of a dead cat. While the wave function
is a superposition of two packets ψ1 , ψ2 that correspond to two very different kinds
of (particle) configurations in ordinary QM or Bohmian mechanics, there is only one
configuration of the matter density m—the definite fact that our intuition wants.
     Paradox: As a variant of the first paradox, one might say that even after the GRW
collapses have pushed |c1 |2 near 1 and |c2 |2 near 0 in the state vector (12.23), there is
still a positive probability |c2 |2 that if we make a quantum measurement of the macro-
state—of whether the cat is dead or alive—we will find the state ψ2 , even though the
GRW state vector has collapsed to a state vector near ψ1 , a state vector that might be
taken to indicate that the cat is really dead (assuming ψ1 = |deadi). Thus, it seems not
justified to say that, when ψ is close to |deadi, the cat is really dead.
    Answer: In GRWm, what we mean when saying that the cat is dead is that the m
function looks and behaves like a dead cat. In orthodox QM, one might mean instead
that a quantum measurement of the macro-state would yield |deadi with probability 1.
These two meanings are not exactly equivalent in GRWm: that is because, if m ≈ m|deadi
(so we should say that the cat is dead) and if ψ is close but not exactly equal to |deadi,
then there is still a tiny but non-zero probability that within the next millisecond the
collapses occur in such a way that the cat is suddenly alive! But that does not contradict
the claim that a millisecond before the cat was dead; it only means that GRWm allows
resurrections to occur—with tiny probability! In particular, if we observe the cat after
that millisecond, there is a positive probability that we find it alive (simply because it
is alive) even though before the millisecond it actually was dead.
     Paradox: Let ψ1 be the state “the marble is inside the box” and ψ2 the state
“the marble is outside the box”; these wave functions have disjoint supports S1 , S2 in
configuration space (i.e., wherever one is nonzero the other is zero). Let ψ be given
by (12.23) with 0 < |c2 |2  |c1 |2 < 1; finally, consider a system of n (non-interacting)
marbles at time t0 , each with wave function ψ, so that the wave function of the system
is ψ ⊗n . Then for each of the marbles, we would feel entitled to say that it is inside the
box, but on the other hand, the probability that all marbles be found inside the box is
|c1 |2n , which can be made arbitrarily small by making n sufficiently large.
    Answer: According to the m function, each of the marbles is inside the box at the
initial time t0 . However, it is known that a superposition like (12.23) of macroscopically
distinct states ψi will approach under the GRW evolution either a wave function ψ1 (∞)
concentrated in S1 or another ψ2 (∞) in S2 with probabilities |c1 |2 and |c2 |2 , respectively.
(Here I am assuming H = 0 for simplicity. Although both coefficients will still be nonzero
after any finite number of collapses, one of them will tend to zero in the limit t → ∞.)
Thus, for large n the wave function will approach one consisting of approximately n|c1 |2
factors ψ1 (∞) and n|c2 |2 factors ψ2 (∞), so that ultimately about n|c1 |2 of the marbles
will be inside and about n|c2 |2 outside the box—independently of whether anybody
observes them or not. The occurrence of some factors ψ2 (∞) at a later time provides
                                              68
another example of the resurrection-type events mentioned earlier; they are unlikely but
do occur, of course, if we make n large enough.
    The act of observation plays no role in the argument and can be taken to merely
record pre-existing macroscopic facts. To be sure, the physical interaction involved
in the act of observation may have an effect on the system, such as speeding up the
evolution from ψ towards either ψ1 (∞) or ψ2 (∞); but GRWm provides unambiguous
facts about the marbles also in the absence of observers.
                                          69
13      The Copenhagen Interpretation
A very influential view, almost synonymous with the orthodox view of quantum me-
chanics, is the Copenhagen interpretation (CI), named after the research group headed
by Niels Bohr, who was the director of the Institute for Theoretical Physics at the Uni-
versity of Copenhagen, Denmark. Further famous defenders of this view and members
of Bohr’s group (temporarily also working in Copenhagen) include Werner Heisenberg,
Wolfgang Pauli, and Leon Rosenfeld. Bohr and Einstein were antagonists in a debate
about the foundations of quantum mechanics that began around 1925 and continued
until Einstein’s death in 1955. In Feynman’s text you have already seen an exposition
of (parts of) the orthodox view. Here is a description of the main elements of CI.
   • It is not precisely defined where the border between micro and macro lies. That
     lies in the nature of the word “macroscopic.” Clearly, an atom is micro and a
     table is macro, but what is the exact number of particles required for an object
     to be “macroscopic”? The vagueness inherent in the concept of “macroscopic” is
     unproblematical in Bohmian mechanics, GRW theory, or classical mechanics, but
     it is problematical here because it is involved in the formulation of the laws of
     nature. Laws of nature should not be vague.
  16
    This is a somewhat unfortunate terminology because the word classical suggests not only definite
positions but also particular laws (say, Newton’s equation of motion) which may actually not apply.
The word quantum is somewhat unfortunate as well because in a reductionist view, all laws (also those
governing macroscopic objects) should be consequences of the quantum laws applying to the individual
electrons, quarks, etc.
                                                 70
   • Likewise, what counts as a measurement and what does not? This ambiguity is
     unproblematical when we only want to compute the probabilities of outcomes of
     a given experiment because it will not affect the computed probabilities. But an
     ambiguity is problematical when it enters the laws of nature.
   • The special role played by measurements in the laws according to CI is also implau-
     sible and artificial. Even if a precise definition of what counts as a measurement
     were given, it would not seem believable that during measurement other laws than
     normal are in place.
   • The separation of the two realms, without the formulation of laws that apply to
     both, is against reductionism. If we think that macro objects are made out of
     micro objects, then the separation is problematical.
13.2      Positivism
CI leans towards positivism. In the words of Werner Heisenberg (1958):
       “We can no longer speak of the behavior of the particle independently of the
       process of observation.”
       “Does this mean that my observations become real only when I observe an
       observer observing something as it happens? This is a horrible viewpoint.
       Do you seriously entertain the thought that without observer there is no
       reality? Which observer? Any observer? Is a fly an observer? Is a star an
       observer? Was there no reality before 109 B.C. before life began? Or are
       you the observer? Then there is no reality to the world after you are dead?
       I know a number of otherwise respectable physicists who have bought life
       insurance.”
       “The idea of an objective real world whose smallest parts exist objectively
       in the same sense as stones or trees exist, independently of whether or not
       we observe them [...], is impossible.”
We know from Bohmian mechanics that this claim is, in fact, wrong.
                                           71
13.4      Completeness of the Wave Function
In CI, a microscopic system is completely described by its wave function. That is, there
are no further variables (such as Bohm’s particle positions) whose values nature knows
and we do not. For this reason, the wave function is also called the quantum state or
the state vector.
13.6      Complementarity
Another idea of CI, called complementarity, is that in the micro realm, reality is para-
doxical (contradictory) but the contradictions can never be seen (and are therefore not
problematical) because of the Heisenberg uncertainty relation. (Recall Feynman’s dis-
cussion of how the uncertainty relation keeps some things invisible.) Here is Bohr’s
definition of complementarity:
   I would describe the idea as follows. In order to compute a quantity of interest (e.g.,
the wave length of light scattered off an electron), we use both Theory A (e.g., classical
theory of billiard balls) and Theory B (e.g., classical theory of waves) although A and
B contradict each other.17 It is impossible to find one Theory C that replaces both A
  17
    In fact, before 1926 many successful theoretical considerations for predicting the results of exper-
iments proceeded in this way. For example, people made a calculation about the collision between an
electron and a photon as if they were classical billiard balls, then converted the momenta into wave
lengths using de Broglie’s relation p = ~k, then made another calculation about waves with wave
number k.
                                                  72
and B and explains the entire physical process. (Here we meet again the impossibility
claim mentioned in Section 13.3.) Instead, we should leave the conflict between A and
B unresolved and accept the idea that reality is paradoxical.
    Bell (Speakable and Unspeakable in Quantum Mechanics, page 190) wrote the fol-
lowing about complementarity:
     “It seems to me that Bohr used this word with the reverse of its usual
     meaning. Consider for example the elephant. From the front she is head,
     trunk and two legs. From the back she is bottom, tail, and two legs. From
     the sides she is otherwise, and from the top and bottom different again.
     These various views are complementary in the usual sense of the word. They
     supplement one another, they are consistent with one another, and they are
     all entailed by the unifying concept ‘elephant.’ It is my impression that to
     suppose Bohr used the word ‘complementary’ in this ordinary way would
     have been regarded by him as missing his point and trivializing his thought.
     He seems to insist rather that we must use in our analysis elements which
     contradict one another, which do not add up to, or derive from, a whole. By
     ‘complementarity’ he meant, it seems to me, the reverse: contradictoriness.”
   Einstein (1949):
     “Despite much effort which I have expended on it, I have been unable to
     achieve a sharp formulation of Bohr’s principle of complementarity.”
   Bell commented (1986):
     “What hope then for the rest of us?”
    Another version of complementarity concerns observables that cannot be simultane-
ously measured. We have encountered this situation in a homework exercise. Compare
two experiments, each consisting of two measurements: (a) first measure σ2 and then σ3 ,
(b) first measure σ3 and then σ2 . We have seen that the joint probability distribution
of the outcomes depends on the order. Some observables, though, can be measured
simultaneously, i.e., the joint distribution does not depend on the order. Examples: X2
and X3 , the y-component of position and the z-component; or σ2 of particle 1 and σ3 of
particle 2.
Theorem 13.1. The observables A and B can be simultaneously measured (i.e., for
every wave function the joint probability distribution of the outcomes is independent of
the order of the two measurements) iff the operators A and B commute, AB = BA.
Theorem 13.2. (An extension of the spectral theorem) Iff A and B commute, then
there exists an ONB {φn } whose elements are eigenvectors of both operators A and B,
Aφn = αn φn and Bφn = βn φn .
Example 13.3.                                           
                                 0 i                  0 −i
                        σ2 σ3 =        ,    σ3 σ2 =          .                    (13.1)
                                  i 0                 −i 0
                                           73
Any two multiplication operators commute. In particular, the position operators Xi ,
Xj commute with each other. The momentum operators Pj = −i~∂/∂xj commute with
each other. Xi commutes with Pj for i 6= j, but
                                         [Xj , Pj ] = i~I ,                                  (13.2)
with I the identity operator. Eq. (13.2) is called Heisenberg’s canonical commutation
relation. To verify it, it suffices to consider a function ψ of a 1-dimensional variable x.
Using the product rule,
                        [X, P ]ψ(x) = XP ψ(x) − P Xψ(x)                                      (13.3)
                                             ∂ψ          ∂        
                                    = x(−i~)    − (−i~)      xψ(x)                           (13.4)
                                             ∂x         ∂x
                                           ∂ψ                 ∂ψ
                                    = −i~x     + i~ψ(x) + i~x                                (13.5)
                                           ∂x                 ∂x
                                    = i~ψ(x) .                                               (13.6)
     So, for two commuting observables, the quantum formalism provides a joint proba-
bility distribution. For non-commuting observables, it does not. That is, it provides two
joint probability distributions, one for each order, but that means it does not provide
an unambiguous joint probability distribution. Moreover,
     two non-commuting observables typically do not both have sharp values
                                                                                      (13.7)
     at the same time.
Also this fact is often called complementarity. For example, there is no quantum state
that is an eigenvector to both σ2 and σ3 . In CI, this fact is understood as a paradox-
ical trait of the micro-realm that we are forced to accept. That this paradoxical trait
is connected to non-commutativity fits nicely with the analogy between operators in
quantum mechanics and quantities in classical mechanics (as described in Section 13.5):
In classical mechanics, which is free of paradoxes, all physical quantities (e.g., positions,
momenta, spin vectors) are just numbers and therefore commute.
    As a further consequence of (13.7), a measurement of B must disturb the value of
A if AB 6= BA. (Think of the exercise in which |z-upi underwent a σ2 - and then a σ3 -
measurement: After the σ2 -measurement, the particle was not certain any more to yield
“up” in the σ3 -measurement.) Also the Heisenberg uncertainty relation is connected to
(13.7), as it expresses that position and momentum cannot both have sharp values (i.e.,
σX = 0 and σP = 0) at the same time. In fact, the following generalized version of
Heisenberg’s uncertainty relation applies to observables A and B instead of X and P :
Theorem 13.4. (Robertson–Schrödinger inequality)18 For any bounded self-adjoint op-
erators A, B and any ψ ∈ H with kψk = 1,
                                             1
                                   σA σB ≥     hψ|[A, B]|ψi .                                (13.8)
                                             2
  18
    H.P. Robertson: The Uncertainty Principle. Physical Review 34: 163–164 (1929)
  E. Schrödinger: Zum Heisenbergschen Unschärfeprinzip. Sitzungsberichte der Preussischen Akademie
der Wissenschaften, physikalisch-mathematische Klasse 14: 296–303 (1930)
                                                74
   Note that the inequality is so much the stronger as the commutator [A, B] is bigger,
and becomes vacuous when [A, B] = 0.
Proof. Recall that the distribution over the spectrum of A defined by ψ has expectation
value hAi := hψ|A|ψi and variance
with
                                       φA := (A − hAi)ψ ,                        (13.10)
where we simply wrote hAi for hAiI. By the Cauchy-Schwarz inequality,
                                                                  2
                          σA2 σB2 = kφA k2 kφB k2 ≥ hφA |φB i .                  (13.11)
Since
we obtain that
                                               2
                               2
                   hφA |φB i       ≥ ImhφA |φB i                                 (13.15)
                                      hφA |φB i − hφB |φA i 2
                                   =                                             (13.16)
                                                2i
                                      hABi − hAihBi − hBAi + hBihAi   2
                                   =                                             (13.17)
                                                        2
                                     1               2
                                   = hψ|[A, B]|ψi .                              (13.18)
                                     4
   • Nobody can actually solve the Schrödinger equation for 1023 interacting particles.
     (Sure, and we do not need to. If Ψα looks like a state including
                                                                P     a pointer pointing
     to α then we know by linearity that Ψt1 evolves to Ψt2 =     cα Ψα , a superposition
     of macroscopically different states.)
                                              75
• Systems are never isolated. (If we cannot solve the problem for an isolated system,
  what hope can we have to treat a non-isolated one? The way you usually treat a
  non-isolated system is by regarding it as a subsystem of a bigger, isolated system,
  maybe the entire universe.)
• Who knows whether the initial wave function is really a product as in Ψt1 = ψ ⊗ φ.
  (It is not so important that it is precisely a product, but it is important that we
  could perform a quantum measurement on any ψ.)
• The collapse of the wave function is like the collapse of a probability distribution:
  as soon as I have more information, such as X ∈ B, I have to update my probability
  distribution ρt− for X accordingly, namely to
  (The parallel is indeed striking. However, if we insist that the wave function is
  complete, then there never is any new information, as there is nothing that we are
  ignorant of.)
                                                                         P
• Decoherence makes sure that you can replace the superposition Ψ =         cα Ψα by
  a mixture [i.e., a random one of the Ψα ]. (A super-observer cannot distinguish
  between the superposition and the mixture, but we are asking whether in reality
  it is a superposition or a mixture.)
                                        76
14      Many Worlds
Put very briefly, Everett’s many-worlds theory is GRW∅ with λ = 0, and Schrödinger’s
many-worlds theory is GRWm with λ = 0.
   The motivation for the many-worlds view comes from the wave function (11.3) of
object and apparatus together after a quantum measurement. It is a superposition of
macroscopically different terms. If we insist that the Schrödinger equation is correct
(and thus reject non-linear modifications such as GRW), and if we insist that the wave
function is complete, then we must conclude that there are different parts of reality,
each looking like our world but with a different measurement outcome, and without
any interaction between the different parts. They are parallel worlds. This view was
suggested by Hugh Everett III in 1957.19
   Everett’s is not the only many-worlds theory, though. It is less well known that also
Schrödinger had a many-worlds theory in 1926, and it is useful to compare the two.20
Schrödinger, however, did not realize that his proposal was a many-worlds theory. He
thought of it as a single-world theory. He came to the conclusion that it was empirically
inadequate and abandoned it. Let us first try to get a good understanding of this theory.
and ψt evolves according to the Schrödinger equation. The equation for m is exactly
the same as in GRWm, except that ψ is not the same wave function. (Actually, Schrö-
dinger replaced the mass factor mi by the electric charge ei , but this difference is not
crucial. It amounts to a different choice of weights in the weighted average over i. In
fact, Schrödinger’s choice has the disadvantage that the different signs of charges will
lead to partial cancellations and thus to an m function that looks less plausible as the
density of matter. Nevertheless, the two choices turn out to be empirically equivalent,
i.e., lead to the same predictions.)
  19
     H. Everett: The Theory of the Universal Wavefunction. Ph. D. thesis, Department of Physics,
Princeton University (1955). Reprinted on page 3–140 in B. DeWitt and R.N. Graham (editors): The
Many-Worlds Interpretation of Quantum Mechanics. Princeton: University Press (1973)
  H. Everett: Relative State Formulation of Quantum Mechanics. Reviews of Modern Physics 29:
454–462 (1957)
  20
     E. Schrödinger: Quantisierung als Eigenwertproblem (Vierte Mitteilung). Annalen der Physik 81:
109–139 (1926). English translation by J.F. Shearer in E. Schrödinger: Collected Papers on Wave
Mechanics. New York: Chelsea (1927).
  See also V. Allori, S. Goldstein, R. Tumulka, and N. Zanghı̀: Many-Worlds and Schrödinger’s First
Quantum Theory. British Journal for the Philosophy of Science 62(1): 1–27 (2011) http://arxiv.
org/abs/0903.2211
                                                77
    In analogy to GRWm, we may call this theory Sm (where S is for the Schrödin-
ger equation). Consider a double-slit experiment in this theory. Before the arrival at
the detection screen, the contribution to the m function coming from the electron sent
through the double slit (which is the only contribution in the region of space between
the double-slit and the detection screen) is a lump of matter smeared out over rather
large distances (as large as the interference pattern). This lump is not homogeneous, it
has interference fringes. And the overall amount of matter in this lump is tiny: If you
integrate m(x, t) over x in the region between the double-slit and the detection screen,
the result is 10−30 kg, the mass of an electron. But focus now on the fact that the
matter is spread out. Schrödinger incorrectly thought that this fact must lead to the
wrong prediction that the entire detection screen should glow faintly instead of yielding
one bright spot, and that was why he thought Sm was empirically inadequate.
    To understand why this reasoning was incorrect, consider a post-measurement situ-
ation (e.g., Schrödinger’s
                      P cat). The wave function is a superposition of macroscopically
different terms, Ψ = α cα Ψα . The Ψα do not overlap; i.e., where one Ψα is significantly
nonzero, the others are near zero. Thus, when we compute |Ψ|2 there are no (significant)
cross terms; that is, for each q there is at most one α contributing, so
Each mα (t) looks like the reasonable story of just one cat that Ψα (t) corresponds to.
Thus, the two cats do not interact with each other; they are causally disconnected. After
all, the two contributions mα come from Ψα that are normally thought of as alternative
outcomes of the experiment. So the two cats are like ghosts to each other: they can see
and walk through each other.
     And not just the cat has split in two. If a camera takes a photograph of the cat
then Ψ must be taken to be a wave function of the cat and the camera together (among
                                               78
other things). Ψ1 may then correspond to a dead cat and a photo of a dead cat, Ψ2 to
a live cat and a photo of a live cat. If a human being interacts with the cat (say, looks
at it), then Ψ1 will correspond to a brain state of seeing a dead cat and Ψ2 to one of
seeing a live cat. That is, there are two copies of the cat, two copies of the photo, two
copies of the human being, two copies of the entire world. That is why I said that Sm
has a many-worlds character. In each world, though, things seem rather ordinary: Like
a single cat in an ordinary (though possibly pitiful) state, and all records and memories
are consistent with each other and in agreement with the state of the cat.
form another ONB of the subspace spanned by |deadi and |alivei. So how do we know
that the two worlds correspond to |deadi and |alivei rather than to |+i and |−i? Ob-
viously, in Sm there is no such problem because a preferred basis (the position basis) is
built into the law (14.1) for m.
                                                    79
14.3      Bell’s First Many-Worlds Theory
Bell also made a proposal (first formulated in 1971, published21 in 1981) adding a prim-
itive ontology to Everett’s S∅; Bell did not seriously propose or defend the resulting
theory, he just regarded it as an ontological clarification of Everett’s theory. According
to this theory, at every time t there exists an uncountably infinite collection of universes,
each of which consists of N material points in Euclidean 3-space. Thus, each world has
its own configuration Q, but some configurations are more frequent in the ensemble of
worlds than others, with |Ψt |2 distribution across the ensemble. At every other time t0 ,
there is again an infinite collection of worlds, but there is no fact about which world at
t0 is the same as which world at t.
                                                 80
then there is nothing random; and in the situation of the measurement problem, there
is nothing that we are ignorant of. So what could talk of probability mean?
    Here is what it could mean in Sm: Suppose we have a way of counting worlds.
And suppose we repeat a quantum experiment (say, a Stern–Gerlach experiment with
|cup |2 = |cdown |2 = 1/2) many times (say, a thousand times). Then we obtain in each
world a sequence of 1000 ups and downs such as
↑↓↑↑↓↑↓↓↓ . . . . (14.8)
Note that there are 21000 ≈ 10300 such sequences. The statement that the fraction of
ups lies between 47% and 53% is true in some worlds and false in others. Now count
the worlds in which the statement is true. Suppose that the statement is true in the
overwhelming majority of worlds. Then that would explain why we find ourselves in
such a world. And that, in turn, would explain why we observe a relative frequency
of ups of about 50%. And that is what we needed to explain for justifying the use of
probabilities.
    Now consider |cup |2 = 1/3, |cdown |2 = 2/3. Then the argument might seem to break
down, because it is then still true that in the overwhelming majority of sequences such
as (14.8) the frequency of ups is about 50%. But consider the following
Rule for counting worlds.     The “fractionPof worlds” f (P ) with property P in the
splitting given by Ψ = α cα Ψα and m(x) = α |cα |2 mα (x) is
                      P
                                          X
                                 f (P ) =   |cα |2 ,                          (14.9)
                                            α∈M
    Note that f (P ) lies between 0 and 1 because α |cα |2 = 1. It is not so clear whether
                                                    P
this rule makes sense—whether there is room in physics for such a law. But let us
accept it for the moment and see what follows. Consider the property P that the
relative frequency of ups lies between 30% and 36%. Then f (P ) is actually the same
value as the probability of obtaining a frequency of ups between 30% an 36% in 1000
consecutive independent random tossings of a biased coin with P(up) = 1/3 and P(down)
= 2/3. And in fact, this value is very close to 1. Thus, the above rule for counting worlds
implies the frequency of ups lies between 30% and 36% in the overwhelming majority
of worlds. This reasoning was essentially developed by Everett.
    A comparison with Bohmian mechanics is useful. The initial configuration of the
lab determines the precise sequence such as (14.8). If the initial configuration is chosen
with |Ψ0 |2 distribution, then with overwhelming probability the sequence will have a
fraction of ups between 30% and 36%. That is, if we count initial conditions with
the |Ψ0 |2 distribution,R that is, if we say that the fraction of initial conditions lying
in a set B ⊆ R3N is B |Ψ0 |2 , then we can say that for the overwhelming majority of
Bohmian worlds, the observed frequency is about 33%. Now to make the connection
with many-worlds, note that the reasoning does not depend, in fact, on whether all of
                                            81
the worlds are realized or just one. That is, imagine many Bohmian worlds with the
same initial wave function Ψ0 but different initial configurations, distributed across the
ensemble according to |Ψ0 |2 . Then there is an explanation for why inhabitants should
see a frequency of about 33%.
    The problem that remains is whether there is room for a rule for counting worlds.
In terms of a creation myth, suppose God created the wave function Ψ and made it a
law that Ψ evolves according to the Schrödinger equation; then he created matter in
3-space distributed with density m(x, t) and made it a law that m is given by (14.1).
Now what would God need to do in order to make the rule for counting worlds a law?
He does not create anything further, so in which way would two universes with equal Ψ
and m but different rules for counting worlds differ? That is a reason for thinking that
ultimately, Sm fails to work (though in quite a subtle way).
    Various authors have proposed other reasonings for justifying probabilities in many-
worlds theories; they seem less relevant to me, but let me mention a few. David
Deutsch23 proposed that it is rational for inhabitants of a universe governed by a many-
worlds theory (a “multiverse,” as it is often called) to behave as if the events they
perceive were random with probabilities given by the Born rule; he proposed certain
principles of rational behavior from which he derived this. (Of course, this reasoning
does not provide an explanation of why we observe frequencies in agreement with Born’s
rule.) Lev Vaidman24 proposed that in a many-worlds scenario, I can be ignorant of
which world I am in: before the measurement, I know that there will be a copy of me
in each post-measurement world, and afterwards, I do not know which worlds I am in
until I look at the pointer position. And I could try to express my ignorance through
a probability distribution, although it is not clear why the Born distribution would be
correct and other distributions would not.
    For comparison, in Bell’s many-worlds theories it is not hard to make sense of prob-
abilities. In Bell’s first theory, there is an ensemble of worlds at every time t, and clearly
most of the worlds have configurations that look as if randomly chosen with |Ψ|2 distri-
bution, in particular with a frequency of ups near 33% in the example described earlier.
In Bell’s second theory, Qt is actually random with |Ψt |2 distribution, and although the
recorded sequence of outcomes fluctuates within every fraction of a second, the sequence
in our memories and records at time t has, with probability near 1, a frequency of ups
near 33%.
  23
     D. Deutsch: Quantum theory of probability and decisions. Proceedings of the Royal Society of
London A 455: 3129–3137 (1999) http://arxiv.org/abs/quant-ph/9906015
  24
     L. Vaidman: On Schizophrenic Experiences of the Neutron or Why We should Believe in the
Many-Worlds Interpretation of Quantum Theory. International Studies in the Philosophy of Science
12: 245–261 (1998) http://arxiv.org/abs/quant-ph/9609006
                                               82
15     The Einstein–Podolsky–Rosen Argument
In the literature, the “EPR paradox” is often mentioned. It is clear from EPR’s article
that they did not intend to describe a paradox (as did, e.g., Wheeler when describing
the delayed-choice experiment), but rather to describe an argument. The argument
supports the conclusion that there are additional variables beyond the wave function.
I now explain their reasoning in my own words, partly in preparation for Bell’s 1964
argument ,which builds on EPR’s argument.
   EPR draw further conclusions from their example by considering also momentum.
Note that the Fourier transform of Ψ is
                              b 1 , k2 ) = e−ik1 x0 δ(k1 + k2 ) .
                              Ψ(k                                                  (15.4)
Alice could measure either the position or the momentum of particle 1, and Bob either
the position or the momentum of particle 2. If Alice measures position then, as seen
above, the outcome X1 is uniformly distributed and Bob, if he chooses to measure
position, finds X2 = X1 + x0 with certainty. If, alternatively, Alice measures momentum
then the outcome K1 will be uniformly distributed and the wave function in momentum
representation collapses from Ψ
                              b to
                       b 00 (k1 , k2 ) = e−iK1 x0 δ(k1 − K1 ) δ(k2 + K1 )
                       Ψ                                                           (15.5)
                                              83
so that Bob, if he chooses to measure momentum, is certain to find K2 = −K1 . In
the same way as above, it follows that Bob’s particle had a position before any of the
experiments, and that it had a momentum!
    There even arises a way of simultaneously measuring the position and momentum of
particle 2: Alice measures position X1 and Bob momentum K2 . Since particle 2 has, as
just proved, a well-defined position and a well-defined momentum, and since, by (15.3),
Alice’s measurement did not influence particle 2, K2 must be the original momentum
of particle 2. Likewise, if Bob had chosen to measure position, his result would have
agreed with the original position, and since it would have obeyed X2 = X1 + x0 , we can
infer from Alice’s result what the original position must have been.
and Bob is certain to obtain Z2 = +1. Thus, always Z2 = −Z1 ; one speaks of perfect
anti-correlation. As a consequence, particle 2 had a definite value of z-spin even before
Bob’s experiment. Now, from the assumption (15.3) it follows that it had that value
even before Alice’s experiment. Likewise, particle 1 had a definite value of z-spin before
any attempt to measure it.
   Again as in EPR’s reasoning, we can consider other observables, say σ1 and σ2 . In
homework Exercise 30 of Assignment 7, we checked that the singlet state has the same
form relative to the x-spin basis or the y-spin basis. It follows that if Alice and Bob both
measure x-spin then their outcomes are also perfectly anti-correlated, and likewise for
y-spin. It can be inferred that each spin component, for each particle, has a well-defined
value before any experiment.
   Moreover, Alice and Bob together can measure σ1 and σ3 of particle 2: Alice measures
σ1 of particle 1 and Bob σ3 of particle 2. By (15.3) and the perfect anti-correlation, the
negative of Alice’s outcome is what Bob would have obtained had he measured σ1 ; and
by (15.3), Bob’s outcome is not affected by Alice’s experiment.
                                            84
15.3      Einstein’s Boxes Argument
We have seen that EPR’s argument yields more than just the incompleteness of the
wave function. It also yields that particles have well-defined positions and momenta.
If we only want to establish the incompleteness of the wave function, which seems like
a worthwhile goal for a proof, a simpler argument will do. Einstein developed such an
argument already in 1927 (before the EPR paper), presented it at a conference but never
published it.25
    Consider a single particle whose wave function ψ(x) is confined to a box B with
impermeable walls and (more or less) uniform in B. Now split B (e.g., by inserting a
partition) into two boxes B1 and B2 , move one box to Tokyo and the other to Paris.
There is some nonzero amount of the particle’s wave function in Paris and some in
Tokyo. Carry out a detection in Paris. Let us assume that
                  no real change can take place in Tokyo in consequence
                                                                                             (15.9)
                  of a measurement in Paris.
If we believed that the wave function was a complete description of reality, then there
would be no fact of the matter, before the detection experiment, about whether the
particle is in Paris or Tokyo, but afterwards there would be. This contradicts (15.9), so
the wave function cannot be complete.
    The assumption (15.9) is intended as allowing changes in Tokyo after a while, such
as the while it would take a signal to travel from Paris to Tokyo at the speed of light.
That is, (15.9) (and similarly (15.3)) is particularly motivated by the theory of relativity,
which strongly suggests that signals cannot propagate faster than at the speed of light.
On one occasion, Einstein wrote that the faster-than-light effect entailed by insisting
on completeness of the wave function was “spukhafte Fernwirkung” (spooky action-at-
a-distance).
  25
    It has been reported by, e.g., L. de Broglie: The Current Interpretation of Wave Mechanics: A
Critical Study. Elsevier (1964). A more detailed discussion is given by T. Norsen: Einstein’s Boxes,
American Journal of Physics 73(2): 164–176 (2005) http://arxiv.org/abs/quant-ph/0404016
                                                85
16      Nonlocality
Two space-time points x = (s, x) and y = (t, y) are called spacelike separated iff no
signal propagating at the speed of light can reach x from y or y from x. This occurs iff
with c = 3 × 108 m/s the speed of light. Einstein’s theory of relativity strongly suggests
that signals cannot propagate faster than at the speed of light (superluminally). That
is, if x and y are spacelike separated then no signal can be sent from x to y or from y
to x. This in turn suggests that
                 If x and y are spacelike separated then events at x cannot
                                                                                            (16.2)
                 influence events at y.
This statement is called locality. It is true in relativistic versions of classical physics
(mechanics, electrodynamics, and also in Einstein’s relativistic theory of gravity he
called the general theory of relativity). Bell proved in 1964 that locality is false if certain
empirical predictions of the quantum formalism are correct; this analysis is often called
Bell’s theorem.26 The relevant predictions have since been experimentally confirmed;
the first convincing tests were carried out by Alain Aspect in 1982.27 Thus, locality is
false in our world; this fact is often called quantum nonlocality. Our main goal in this
chapter is to understand Bell’s proof.
    Some remarks.
   • Einstein believed in locality until his death in 1955. Locality is very closely related
     to (almost the same as) the EPR assumption (15.3): If Alice’s measurement takes
     place at x and Bob’s at y, and if x and y are spacelike separated, then locality
     implies that Alice’s measurement on particle 1 at x cannot affect particle 2 at y.
     Conversely, the only situation in which we can be certain that the two particles
     cannot interact occurs if Alice’s and Bob’s experiments are spacelike separated
     and locality holds true. Ironically, EPR were wrong even though their argument
     was correct: The premise (15.3) is false. They took locality for granted. Likewise
     in Einstein’s boxes argument, the assumption (15.9) is equivalent to locality: The
     point of talking about Tokyo and Paris is that these two places are distant, and
     since there clearly can be influences if we allow more time than distance/c, the
     assumption is that there cannot be an influence between spacelike separated events.
   • Despite nonlocality, it is not possible to send messages faster than light, according
     to the appropriate relativistic version of the quantum formalism; this fact is often
     called the no-signalling theorem. We will prove it in great generality in a later
  26
     J. S. Bell: On the Einstein-Podolsky-Rosen Paradox. Physics 1: 195–200 (1964) Reprinted as
chapter 2 of J. S. Bell: Speakable and unspeakable in quantum mechanics. Cambridge University Press
(1987)
  27
     A. Aspect, J. Dalibard, G. Roger: Experimental Test of Bell’s Inequalities using Time-Varying
Analyzers. Physical Review Letters 49: 1804–1807 (1982)
                                                86
       chapter. Put differently, the superluminal influences cannot be used by agents for
       sending messages.
   • Does nonlocality prove relativity wrong? That statement would be too strong.
     Nonlocality proves a certain understanding of relativity wrong. Much of relativity
     theory, however, remains untouched by nonlocality.
   • Bell’s proof shows for a certain experiment that either events at x must have
     influenced events at y or vice versa, but does not tell us who influenced whom.
                                             87
Fact 1. For any unit vector n ∈ R3 ,
                                                       
                     φ ∝ |n-upi|n-downi − |n-downi|n-upi .                             (16.4)
Sketch of proof: Consider first the case that n is infinitesimally close to the z-direction,
arising from (0, 0, 1) by a rotation around the axis along the unit vector m = (cos γ, sin γ, 0)
through an infinitesimal angle δϕ. Then
                                                                             
                           σm++ σm+−                  0         cos γ − i sin γ
        σm = m · σ =                       =                                          (16.5)
                           σm−+ σm−−           cos γ + i sin γ        0
and
                                                                 
                             1       δϕ      1   1              δϕ   0
                   |n-upi =     +       m·σ    =    +                                  (16.6)
                             0        2      0   0               2 σm−+
                                                                 
                             0       δϕ      0   0              δϕ σm+−
                |n-downi =      +       m·σ    =    +                                  (16.7)
                             1        2      1   1               2   0
because spinors rotate through half the angle δϕ. As a consequence, to first order in δϕ,
                                                               
                                        0 1      δϕ σm+−       0
                   |n-upi|n-downi =           +                                    (16.8)
                                        0 0       2     0    σm−+
and
                       |n-upi|n-downi − |n-downi|n-upi =
                                                         
                  0 1    δϕ σm+−       0      δϕ σm+−      0
                       +                    −                   .                      (16.9)
                  −1 0    2     0   σm−+       2     0   σm−+
This proves (16.4) for an infinitesimal change in n. Now think of a finite change in n as
partitioned into infinitely many infinitesimal changes. This proves (16.4) for arbitrary
n.                                                                                     
Fact 2. Independently of whether Alice’s or Bob’s experiment occurs first, the joint
distribution of Z 1 , Z 2 is
                                                                    
                                        P(up,up)       P(up,down)
                             µα,β :=                                         (16.10)
                                      P(down,up) P(down,down)
                                      1
                                                                 !
                                      4
                                        − 41 α · β 14 + 14 α · β
                                  = 1 1                                      (16.11)
                                      4
                                        + 4 α · β 14 − 41 α · β
                                                                 !
                                      1
                                      2
                                        sin2 (θ/2) 21 cos2 (θ/2)
                                  = 1                              ,         (16.12)
                                      2
                                        cos2 (θ/2) 21 sin2 (θ/2)
                                             88
   Proof: Assume that Alice’s experiment occurs first and write the initial spinor as
                      φ = c|α-upi|α-downi − c|α-downi|α-upi                    (16.13)
                                       √
with c a complex constant with |c| = 1/ 2. According to Born’s rule, Alice obtains +1
or −1, each with probability 1/2. In case Z 1 = +1, φ collapses to
                                     φ0+ = |α-upi|α-downi .                         (16.14)
                                                                    2
According to Born’s rule, the probability that Bob obtains Z = +1 is
                                                      2                     2
       P(Z 2 = +1|Z 1 = +1) = hβ-up|α-downi               = 1 − hβ-up|α-upi .       (16.15)
Since the angle in Hilbert space between |β-upi and |α-upi is half the angle between β
and α, and since they are unit vectors in Hilbert space, we have that
                                    hβ-up|α-upi = cos(θ/2)                          (16.16)
and thus
                       P(Z 2 = +1|Z 1 = +1) = 1 − cos2 (θ/2) = sin2 (θ/2)           (16.17)
and
                                                        1 2
                              P(Z 1 = +1, Z 2 = +1) =     sin (θ/2) .               (16.18)
                                                        2
                 1
Since cos2 x =   2
                     + 12 cos(2x), this value can be rewritten as
                           1 1                1 1 1        1 1
  P(Z 1 = +1, Z 2 = +1) =     − cos2 (θ/2) = − − cos θ = − α · β . (16.19)
                           2 2                2 4 4        4 4
The other three matrix elements can be computed in the same way. Assuming that
Bob’s experiment occurs first leads to the same matrix.                     
Remarks.
   • Note that the four entries in µα,β are nonnegative and add up to 1, as they should.
   • In the case α = β corresponding to Bohm’s version of the EPR example,
                                                 !
                                            0 12
                                   µα,α = 1        ,                       (16.20)
                                            2
                                               0
      implying the perfect anti-correlation Z 2 = −Z 1 .
   • The marginal distribution is the distribution of Z 1 alone, irrespective of Z 2 . It is
     1/2, 1/2. Likewise for Z 2 . Let us assume that Alice’s experiment occurs first. Then
     the fact that the marginal distribution for Z 2 is 1/2, 1/2 amounts to a no-signalling
     theorem for Bell’s experiment: Bob cannot infer from Z 2 any information about
     Alice’s choice α because the distribution of Z 2 does not depend on α. (The general
     no-signalling theorem that we will prove later covers all possible experiments.)
   • The fact that the joint distribution of the outcomes does not depend on the order
     of experiments means that the observables measured by Alice and Bob can be
     simultaneously measured. What are these observables, actually? Alice’s is the
     matrix σα ⊗ I with components σαs1 s01 δs2 s02 , and Bob’s is I ⊗ σβ with components
     δs1 s01 σβs2 s02 .
                                                89
16.2      Bell’s 1964 Proof of Nonlocality
Let us recapitulate what needs to be shown in Bell’s theorem. The claim is that the
joint distribution µα,β of Z 1 and Z 2 , as a function of α and β, is such that it cannot
be created in a local way (i.e., in the absence of influences) if no information about α
and β is available beforehand. We can also put it this way: it is impossible for two
computers A and B to be set up in such a way that, upon input of α into A and β into
B, A produces a random number Z 1 and B Z 2 with joint distribution µα,β if A and B
cannot communicate (while they can use prepared random bits that both have copies
of).28 To put this yet differently, two suspects interrogated separately by police cannot
provide answers Z 1 and Z 2 with distribution µα,β when asked the questions α and β,
no matter which prior agreement they took.
    Bell’s proof involves two parts. The first part is the EPR argument (in Bohm’s ver-
sion), applied to all directions α; it shows that if locality is true then the values of Z 1
and Z 2 must have been determined in advance. Thus, in every run of the experiment,
there exist well-defined values Zα1 for every α and Zα2 = −Zα1 even before any measure-
ment. Moreover, Alice’s outcome will be Zα1 for the α she chooses; also Bob’s outcome
will be Zβ2 = −Zβ1 for the β he chooses, also if β 6= α and independently of whether
Alice’s or Bob’s experiment occurs first. (Put differently, the two suspects must have
agreed in advance on the answer to every possible question.)
    In other words, locality implies the existence of random variables Zαi , i = 1, 2 and
|α| = 1, such that Alice’s outcome is Zα1 and Bob’s is Zβ2 . In particular, focusing on
components in only 3 directions a, b and c, locality implies the existence of 6 random
variables Zαi , i = 1, 2, α = a, b, c such that
                                           Zαi = ±1                                           (16.21)
                                           Zα1 = −Zα2                                         (16.22)
and, more generally,
                                        P(Zα1 6= Zβ2 ) = qαβ ,                                (16.23)
                                                                     2
where the qαβ = µα,β (+−)+µα,β (−+) = (1+α·β)/2 = cos (θ/2) are the corresponding
quantum mechanical probabilities.
   The second part of the proof involves only very elementary mathematics. Clearly,
                   P {Za1 = Zb1 } ∪ {Zb1 = Zc1 } ∪ {Zc1 = Za1 } = 1 ,
                                                               
                                                                             (16.24)
since at least two of the three (2-valued) variables Zα1 must have the same value. Hence,
by elementary probability theory,
                     P Za1 = Zb1 + P Zb1 = Zc1 + P Zc1 = Za1 ≥ 1,
                                                              
                                                                                   (16.25)
and using the perfect anti-correlations (16.22) we have that
                P Za1 = −Zb2 + P Zb1 = −Zc2 + P Zc1 = −Za2 ≥ 1.
                                                           
                                                                                              (16.26)
  28
     This statement is perhaps a bit less general than Bell’s theorem because computers always work
in either a deterministic or a stochastic way, while Bell’s theorem would apply even to a theory, if it
exists, that is neither deterministic nor stochastic.
                                                  90
(16.26) is equivalent to the celebrated Bell inequality. It is incompatible with (16.23).
For example, when the angles between a, b and c are 120◦ , the 3 relevant qαβ are all
1/4, implying a value of 3/4 for the left hand side of (16.26).
where the last factor is the conditional distribution of the outcomes, given λ.
    What is the condition on P that characterizes the absence of communication? Sup-
pose computer 1 makes its decision about Z 1 first. In the absence of communication,
it has only λ and α as the basis of its decision (which may still be random); thus, the
(marginal) distribution of Z 1 does not depend on β:
Computer 2 has only λ and β as the basis of its decision; thus, the (conditional) distri-
bution of Z 2 does not depend on α or Z 1 :
                                                 91
    Now we want to know how the locality condition (16.30) restricts the possibility of
functions to occur as P(Z 1 , Z 2 |α, β). To this end, we introduce the correlation coefficient
defined by                        X X
                   κ(α, β) =                z1 z2 P(Z 1 = z1 , Z 2 = z2 |α, β) .       (16.31)
                                         z1 =±1 z2 =±1
Proposition 16.1. Locality implies the following version of Bell’s inequality known as
the CHSH inequality31 :
So,
                                     Z                                                      
                           0
       κ(α, β) ± κ(α, β ) =               dλ ρ(λ) E(Z 1 |α, λ) E(Z 2 |β, λ) ± E(Z 2 |β 0 , λ)    (16.37)
                                     Z
                                 ≤       dλ ρ(λ) E(Z 2 |β, λ) ± E(Z 2 |β 0 , λ) .                (16.38)
                                                            92
Hence, setting u = E(Z 2 |β, λ) and v = E(Z 2 |β 0 , λ),
Since the quantum mechanical prediction µα,β for the Bell experiment has
κ(α, β) = µα,β (++) − µα,β (+−) − µα,β (−+) + µα,β (−−) = −α · β = − cos θ , (16.45)
leads to                                                                √
                    κ(α, β) + κ(α, β 0 ) + κ(α0 , β) − κ(α0 , β 0 ) = −2 2 ,           (16.47)
violating (16.32).
    Now if the values of P(Z 1 = z1 , Z 2 = z2 |α, β) are known only with some inaccuracy
(because they were obtained experimentally, not from the quantum formalism) then also
the κ(α, β) are subject to some inaccuracy. But if (16.32) is violated by more than the
inaccuracy, then locality is refuted.
16.4       Photons
Experimental tests of Bell’s inequality are usually done with photons instead of electrons.
For photons, spin is usually called polarization, and the Stern–Gerlach magnets are
replaced with polarization analyzers (also known as polarizers), i.e., crystals that are
transparent to the |z-upi part of the wave but reflect (or absorb) the |z-downi part.
Like the Stern–Gerlach magnets, the analyzers can be rotated into any direction. Since
photons have spin 1, θ/2 needs to be replaced by θ.
                                                 93
17       Further Discussion of Nonlocality
17.1      Nonlocality in Bohmian Mechanics, GRW, Copenhagen,
          Many-Worlds
Since we have considered only non-relativistic formulations of these theories, we cannot
directly analyze spacelike separated events, but instead we can analyze the case of two
systems (e.g., Alice’s lab and Bob’s lab) without interaction (i.e., without an interaction
term between them in the Hamiltonian).
   • In GRW theory, nonlocality comes in at the point when the wave function
     collapses, as then it does so instantaneously over arbitrary distances.
       At least, this trait of the theory suggests that GRW is nonlocal, and in fact that is
       the ultimate source of the nonlocality. Strictly speaking, however, the definition of
       nonlocality, i.e., the negation of (16.2), requires that events at x and at y influence
       each other, and the value of the wave function ψt (x1 , x2 ) is linked to several space-
       time points, (t, x1 ) and (t, x2 ), and thus is not an example of an “event at x.”
       So we need to formulate the proof that GRW theory is nonlocal more carefully;
       of course, Bell’s proof achieves this, but we can give a more direct proof. Since
       the “events at x” are not given by the wave function itself but by the primitive
       ontology, we need to consider GRWf and GRWm separately.
       In GRWf, consider Einstein’s boxes example. The wave function of a particle
       is half in a box in Paris and half in a box in Tokyo. Let us apply detectors to
       both boxes at time t, and consider the macroscopic superposition of the detectors
       arising from the Schrödinger equation. It is random whether the first flash (in
                                               94
       any detector) after t occurs in Paris or in Tokyo. Suppose it occurs in Tokyo, and
       suppose it can occur in one of two places in Tokyo, corresponding to the outcomes
       0 or 1. If it was 1, then after the collapse the wave function of the particle is
       100% in Tokyo, and later flashes in Paris are certain to occur in a place where
       they indicate the outcome 0—that is a nonlocal influence of a flash in Tokyo on
       the flashes in Paris.
       Likewise in GRWm: If, after the first collapse, the pointer of the detector in Tokyo,
       according to the m function, points to 1 then the pointer in Paris immediately
       points to 0. (You might object that the Tokyo pointer position according to
       the m function was not the cause of the Paris pointer position, but rather both
       pointer positions were caused by the collapse of the wave function. However, this
       distinction is not relevant to whether the theory is nonlocal.)
       Note that while Bell’s proof shows that any version of quantum mechanics must
       be nonlocal, for proving that GRWf and GRWm are nonlocal it is sufficient to
       consider a simpler situation, that of Einstein’s boxes.
       Both GRWf and GRWm are already nonlocal when governing a universe containing
       only one particle; thus, their nonlocality does not depend on the existence of a
       macroscopic number of particles, and they are even nonlocal in a case (one particle)
       in which Bohmian mechanics is local. For example, consider a particle with wave
       function
                                         1                  
                                   ψ = √ |herei + |therei                           (17.1)
                                          2
       at time t, as in Einstein’s boxes example. Suppose that |herei and |therei are
       two narrow wave packets separated by a distance of 500 million light years. The
       distance is so large that the first collapse is likely to occur before a light signal can
       travel between the two places. For GRWf, a flash here precludes a flash there—
       that is a nonlocal influence. For GRWm, if the wave function collapses to |herei
       then m(here) doubles and m(there) instantaneously goes to zero—that is a nonlo-
       cal influence. (There is a relativistic version of GRWm32 in which m(there) goes to
       zero only after a delay of distance/c, or when a collapse centered “there” occurs.
       Nevertheless, also this theory is nonlocal even for one particle because when a col-
       lapse centered “there” occurs, which can happen any time, then m(there) cannot
       double (as it could in a local theory) but must go to zero.)
   • That orthodox quantum mechanics (OQM) is nonlocal can also be seen from
     Einstein’s boxes argument: OQM says the outcomes of the detectors are not pre-
     determined. (That is, there is no fact about where the particle really is before
     any detectors are applied.) Thus, the outcome of the Tokyo detector must have
     influenced the Paris detector, or vice versa.
  32
    D. Bedingham, D. Dürr, G.C. Ghirardi, S. Goldstein, R. Tumulka, and N. Zanghı̀: Matter Density
and Relativistic Models of Wave Function Collapse. Journal of Statistical Physics 154: 623–631 (2014)
http://arxiv.org/abs/1111.1425
                                                 95
       This, of course, was the point of Einstein’s boxes argument: He objected to OQM
       because it is nonlocal.
   • Many-worlds is nonlocal, too. This is not obvious from Bell’s argument because
     the latter is formulated in a single-world framework. Here is why Sm is nonlocal.33
     After Alice carries out her Stern–Gerlach experiment, there are two pointers in her
     lab, one pointing to +1 and the other to −1. Then Bob carries out his experiment,
     and there are two pointers in his lab. Suppose Bob chose the same direction as
     Alice. Then the world in which Alice’s pointer points to +1 is the same world as
     the one in which Bob’s pointer points to −1, and this nonlocal fact was created
     in a nonlocal way by Bob’s experiment. The same kind of nonlocality occurs in
     Sm already in Einstein’s boxes experiment: The world in which a particle was
     detected in Paris is the same as the one in which no particle was detected in
     Tokyo—a nonlocal fact that arises as soon as both experiments are completed,
     without the need to wait for the time it takes light to travel from Paris to Tokyo.
       How about Bell’s many-worlds theories? The second theory, involving a random
       configuration selected independently at every time, is very clearly nonlocal, for
       example in Einstein’s boxes experiment: At every time t, nature makes a random
       decision about whether the particle is in Paris, and if it is, nature ensures imme-
       diately that there is no particle in Tokyo. A local theory would require that the
       particle has a continuous history of traveling, at a speed less than that of light,
       to either Paris or Tokyo, and this history is missing in Bell’s second many-worlds
       theory. Bell’s first many-worlds theory is even more radical, in fact in such a way
       that the concept of locality is not even applicable. The concept of locality requires
       that at every point in space, there are local variables whose changes propagate at
       most at the speed of light. Since in Bell’s first many-worlds theory, no association
       is made between worlds at different times, one cannot even ask how any local
       variables would change with time. Thus, this theory is nonlocal as well.
    Another remark concerns the connection between Bell’s 1976 nonlocality proof and
the theories mentioned above. In physical theories, λ represents the information located
at all space-time points from which light signals can reach both x and y. In orthodox
quantum mechanics and GRW theory, λ is the wave function ψ; in Bohmian mechanics,
λ is ψ together with the initial configuration of the two particles.
                                               96
described in Section 16.2, has the following structure:
For this argument what is relevant about “quantum mechanics” is merely the predictions
concerning experimental outcomes corresponding to (16.21)–(16.23) (with part 1 using
in fact only (16.22)).
    Certain popular myths about Bell’s proof arise from missing part 1 and noticing only
part 2 of the argument. (In Bell’s 1964 paper, part 1 is formulated in 3 lines, part 2 in
2.5 pages.) Bell, Speakable and unspeakable, p. 143:
       It is important to note that to the limited degree to which determinism plays
       a role in the EPR argument, it is not assumed but inferred. What is held
       sacred is the principle of ‘local causality’ – or ‘no action at a distance’. [. . . ]
       It is remarkably difficult to get this point across, that determinism is not a
       presupposition of the analysis.
Here, “determinism” means P. What Bell writes about the EPR argument is true in
spades about his own nonlocality argument: P plays a “limited role” because it is only
an auxiliary statement, and non-P is not the upshot of the argument.
   The mistake of missing part 1 leads to the impression that Bell proved that
or that
               hidden variables, while perhaps possible, must be nonlocal.                      (17.6)
These claims are still widespread, and were even more common in the 20th century.34
They are convenient for Copenhagenists, who tend to think that coherent theories of
the microscopic realm are impossible (see Section 13.3). Let me explain what is wrong
about (17.5) and (17.6).
    Statement (17.5) is plainly wrong, since a deterministic hidden-variables theory exists
and works, namely Bohmian mechanics. The hidden variables that Bohmian mechanics
provides35 for the Bell experiment are of the form Zα,β i
                                                           , as the outcome according to
Bohmian mechanics depends on both parameter choices (at least for one i, namely for the
second Stern–Gerlach experiment). Considering the three directions relevant to Bell’s
                  i
inequality, the Zα,β  are 18 random variables instead of 6 Zαi , and the dependence on
both α and β reflects the nonlocality of Bohmian mechanics. Bell did not establish the
impossibility of a deterministic reformulation of quantum theory, nor did he ever claim
to have done so.
  34
     For example, recall the title of Clauser et al.’s paper: Proposed Experiment to Test Local Hidden-
Variable Theories. Other authors claimed that Bell’s argument excludes “local realism.”
  35
     We assume a fixed temporal order of the two spin measurements, and that each is carried out as a
Stern–Gerlach experiment.
                                                  97
    Statement (17.6) is true and non-trivial but nonetheless rather misleading. It follows
from (17.2) and (17.3) that any (single-world) account of quantum phenomena must be
nonlocal, not just any hidden-variables account. Bell’s argument shows that nonlocality
is implied by the predictions of standard quantum theory itself. Thus, if nature is
governed by these predictions (as has been confirmed in experiment), then nature is
nonlocal.
                                           98
18      POVMs: Generalized Observables
18.1      Definition
An observable is mathematically represented by a self-adjoint operator. A generalized ob-
servable is mathematically represented by a positive-operator-valued measure (POVM).
Definition 18.1. An operator is called positive iff it is self-adjoint and all (generalized)
eigenvalues are greater than or equal to zero. (In linear algebra, a positive operator is
commonly called “positive semi-definite.”) Equivalently, a bounded operator A : H →
H is positive iff
                           hψ|A|ψi ≥ 0 for every ψ ∈ H .                              (18.1)
    The sum of two positive operators is again a positive operator, whereas the product
of two positive operators is in general not even self-adjoint. Note that every projection
is a positive operator.
    As a first, rough definition, we can say the following: A POVM is a family of positive
operators Ez such that                   X
                                             Ez = I .                              (18.2)
                                          z
     so η ≤ 1.
                                
             1                 0
  2. E1 =            , E2 =          . In the special case in which all operators Ez are
                  0               1
       projection operators, E is called a projection-valued measure (PVM). In this case,
       the subspaces to which Ez and Ez0 (z 6= z 0 ) project must be mutually orthogonal
       (homework problem).
  3. Every self-adjoint matrix defines a PVM: Let z = α run through the eigenvalues
     of A and let Eα be the projection to the eigenspace of A with eigenvalue α,
                                         X
                                   Eα =     |φα,λ ihφα,λ | .                    (18.4)
                                               λ
       Then their sum is I, as easily seen from the point of view of an orthonormal basis
       of eigenvectors of A. So E is a PVM, the spectral PVM of A. Example 2 above is
       of this form for A = σ3 .
                                               99
      4. A POVM E and a vector ψ ∈ H with kψk = 1 together define a probability
         distribution over z as follows:
        To see this, note that hψ|Ez |ψi is a nonnegative real number since Ez is a positive
        operator, and
                          X            X
                             Pψ (z) =      hψ|Ez |ψi = hψ|I|ψi = kψk2 = 1.           (18.6)
                              z             z
        Indeed,           Z                                           Z
                                                     1                              (x−z)2
                              dz Ez ψ(x) = √      ψ(x)                     dz e−      2σ 2   = ψ(x) .   (18.10)
                                            2πσ 2
    The case of a continuous variable z brings us to the general definition of a POVM,
which I will formulate rigorously although we do not aim at rigor in general. The defini-
tion is, in fact, quite analogous to the rigorous definition of a probability distribution in
measure theory: A measure associates a value (i.e., a number or an operator) not with
a point but with a set: E(B) instead of Ez , where B ⊆ Z and Z is the set of all z’s.
More precisely, let Z be a set and B a σ-algebra of subsets of Z ,36 the family of the
“measurable sets.” A probability measure is a mapping µ : B → [0, 1] such that for any
B1 , B2 , . . . ∈ B with Bi ∩ Bj = ∅ for i 6= j,
                                           ∞
                                          [                  ∞
                                                               X
                                      µ         Bn       =           µ(Bn ) .                           (18.11)
                                          n=1                  n=1
  36
    A σ-algebra is a family B of subsets of Z such that ∅ ∈ B and, for every B1 , B2 , B3 , . . . in A also
B1c:= Z \ B1 ∈ B and B1 ∪ B2 ∪ . . . ∈ B. It follows that Z ∈ B and B1 ∩ B2 ∩ . . . ∈ B. A set Z
equipped with a σ-algebra is also called a measurable space. The σ-algebra usually considered on Rn
consists of the “Borel sets” and is called the “Borel σ-algebra.”
                                                         100
Definition 18.3. A POVM on the measurable space (Z , B) acting on the Hilbert
space H is a mapping E from B to the set of bounded operators on H such that each
E(B) is positive, E(Z ) = I, and for any B1 , B2 , . . . ∈ B with Bi ∩ Bj = ∅ for i 6= j,
                                         ∞
                                        [                 ∞
                                                            X
                                    E          Bn       =         E(Bn ) ,                    (18.12)
                                         n=1                n=1
where the series on the right-hand side converges in the operator norm.37
with Ez = E({z}), so in that case Definition 18.3 boils down to the earlier definition
around (18.2). The fuzzy position observable of Example 5 corresponds to Z = R, B
the Borel sets, and E(B) the multiplication operator
                                    Z
                                             1      (x−z)2
                       E(B)ψ(x) =       dz √      e− 2σ2 ψ(x) ,               (18.15)
                                      B     2πσ 2
which multiplies by the function 1B ∗ g, where 1B is the characteristic function of B, g
is the Gaussian density function, and ∗ means convolution.
    It turns out that every observable is a generalized observable; that is, every self-
adjoint operator A defines a PVM E with E(B) the projection to the so-called spectral
subspace of B. If there is an ONB of eigenvectors of A, then the spectral subspace of B
is the closed span of all eigenspaces with eigenvalues in B; that is, in that case E({z}) is
the projection to the eigenspace of eigenvalue z (and 0 if z is not an eigenvalue). In the
case of a general self-adjoint operator A, the following is a reformulation of the spectral
theorem:
Theorem 18.4. For every self-adjoint operator A there is a uniquely defined PVM E
on the real line with the Borel σ-algebra (the “spectral PVM” of A) such that
                                         Z
                                    A=      α E(dα) .                         (18.16)
                                                    R
  37
P It is equivalent to merely demand that the series on the right-hand side converges weakly, i.e., that
 n hψ|E(Bn )|ψi converges for every ψ ∈ H .
                                                    101
                                                                                R
    To explain the last equation: In the same way as one can define the integral Z f (z) µ(dz)
of a measurable Rfunction f : Z → R relative to a measure µ, one can define an operator-
valued integral Z f (z) E(dz) relative to a POVM E. Eq. (18.16) is a generalization of
the relation                                X
                                       A=      α Eα                                (18.17)
                                                      α
Example 18.6. It follows from the quantum formalism that if we make consecutive
ideal quantum measurements of observables A1 , . . . , An (which need not commute with
each other) at times 0 < t1 < . . . < tn respectively on a system with initial wave function
ψ0 ∈ H with kψ0 k = 1, then the joint distribution of the outcomes Z1 , . . . , Zn is of the
form                                            
                         P (Z1 , . . . , Zn ) ∈ B = hψ0 |E(B)|ψ0 i                   (18.20)
for all (Borel) subsets B ⊆ Rn , where E is a POVM on Rn . The precise version of this
statement requires that each Ak has purely discrete spectrum (or, equivalently, an ONB
of eigenvectors in H ).
   Derivation: In that case, the spectrum is at most countable, and the spectral de-
composition can be written in the form
                                       X
                                  Ak =    αk Pk,αk .                         (18.21)
                                                 αk
                                                  102
with t0 = 0 (and units of measurement chosen so that ~ = 1), so (18.20) holds with
                       
 E {(α1 , . . . , αn )} =
             eiH(t1 −t0 ) P1,α1 · · · eiH(tn −tn−1 ) Pn,αn Pn,αn e−iH(tn −tn−1 · · · P1,α1 e−iH(t1 −t0 ) . (18.23)
It becomes clear that E(B) is, in general not a projection but still a positive operator.
One easily verifies that E is a POVM.
Example 18.7. In GRWf, the joint distribution of all flashes is of the form
                                       P(F ∈ B) = hΨ0 |G(B)|Ψ0 i                                        (18.24)
for all sets B ⊆ Z , with Ψ0 the initial wave function and G a POVM on the history
space Z of flashes,
                n                                                        oN
           Z = (t1 , x1 ), (t2 , x2 ), . . . ∈ (R ) : 0 < t1 < t2 < . . .
                                                 4 ∞
                                            
                                                                            . (18.25)
      Derivation: Consider first the joint distribution of the first two flashes for N = 1
particle: The probability of T1 ∈ [t1 , t1 + dt1 ] is 1t1 >0 e−λt1 λ dt1 ; given T1 , the probability
of X 1 ∈ d3 x1 is, according to (12.11), kC(x1 )ΨT1 − k2 with ΨT1 − = e−iHT1 Ψ0 and C(x1 )
the collapse operator defined in (12.9). Given T1 and X 1 , the probability of T2 ∈
[t2 , t2 + dt2 ] is 1t2 >t1 e−λ(t2 −t1 ) λ dt2 ; given T1 , X 1 , and T2 , the probability of X 2 ∈ d3 x2
is kC(x2 )e−iH(T2 −T1 ) ΨT1 + k2 with ΨT1 + = C(X 1 )ΨT1 − . Putting these formulas together,
the joint distribution of T1 , x1 , T2 , and X 2 is given by
                                                                                 
       P T1 ∈ [t1 , t1 + dt1 ], X 1 ∈ d3 x1 , T2 ∈ [t2 , t2 + dt2 ], X 2 ∈ d3 x2
                                                                            2
          = 10<t1 <t2 e−λt2 λ2 C(x2 )e−iH(t2 −t1 ) C(x1 )e−iHt1 Ψ0              dt1 d3 x1 dt2 d3 x2     (18.26)
          × eiHt1 C(x1 )eiH(t2 −t1 ) C(x2 )2 e−iH(t2 −t1 ) C(x1 )e−iHt1 dt1 d3 x1 dt2 d3 x2 ,           (18.28)
which is self-adjoint and positive because (18.27) is always real and ≥ 0. It follows
that also G(B), obtained by summing (that is, integrating) over all infinitesimal vol-
ume elements in B, is self-adjoint and positive. Additivity holds by construction, and
G(Z ) = I because (18.27) is a probability distribution (so hΨ0 |G(Z )|Ψ0 i = 1 for ev-
ery Ψ0 with kΨ0 k = 1). Thus, G is a POVM. For the joint distribution of more than
two flashes or more than one particle, the reasoning proceeds in a similar way. For the
joint distribution of all (infinitely many) flashes, the rigorous proof requires some more
technical steps38 but bears no surprises.
  38
   carried out in R. Tumulka: A Kolmogorov Extension Theorem for POVMs. Letters in Mathematical
Physics 84: 41–46 (2008) http://arxiv.org/abs/0710.3605
                                                      103
18.2     The Main Theorem about POVMs
It says: For every quantum physical experiment E on a quantum system S whose possible
outcomes lie in a space Z , there exists a POVM E on Z such that, whenever S has
wave function ψ at the beginning of E , the random outcome Z has probability distribution
given by
                               P(Z ∈ B) = hψ|E(B)|ψi .                            (18.29)
   We will prove this statement in Bohmian mechanics and GRWf. It plays the role of
Born’s rule for POVMs. The experiment E consists of coupling S to an apparatus A at
some initial time ti , letting S ∪ A evolve up to some final time tf , and then reading off
the result Z from A. It is assumed that S and A are not entangled at the beginning of
E:
                                 ΨS∪A (ti ) = ψS (ti ) ⊗ φA (ti )                  (18.30)
with φA the ready state of A. (The main theorem of POVMs can also be proven for the
case in which tf is itself chosen by the experiment; e.g., the experiment might wait for a
detector to click, and the outcome Z may be the time of the click. I give the proof only
for the simpler case in which tf is fixed in advance.) I will further assume that E has
only finitely many possible outcomes Z; actually, this assumption is not needed for the
proof, but it simplifies the consideration a bit and is satisfied in every realistic scenario.
Proof from Bohmian mechanics. Since the outcome is read off from the pointer
position,                              
                           Z = ζ Q(tf ) ,                            (18.31)
where Q is the Bohmian configuration and ζ is called the calibration function. (In
practice, the function ζ depends only on the configuration of the apparatus, in fact only
on its macroscopic features, not on microscopic details. However, the arguments that
follow apply to arbitrary calibration functions.) Let
and
                                Bz = {q ∈ R3N : ζ(q) = z} .                           (18.33)
Then, using the projection operator PB defined in (10.16),
                                                     
                         P(Z = z) = P Q(tf ) ∈ Bz                                     (18.34)
                                     Z
                                   =     |Ψ(q, tf )|2 dq                              (18.35)
                                          Bz
                                      = hΨ(tf )|PBz |Ψ(tf )i                          (18.36)
                                               104
where h·|·iS denotes the inner product in the Hilbert space of the system S alone (as
opposed to the Hilbert space of S ∪ A), and Ez is defined as follows: For given ψ, form
ψ ⊗ φ, then apply the operator U † PBz U , and finally take the partial inner product with
φ. The partial inner product of a function Ψ(x, y) with the function φ(y) is a function
of x defined as                           Z
                           hφ|Ψiy (x) = dy φ∗ (y) Ψ(x, y) .                        (18.39)
Thus,
                                     Ez ψ = hφ|U † PBz U (ψ ⊗ φ)iy .                      (18.40)
We now verify that E is a POVM. First, Ez is a positive operator because
that U † U = I, and that the partial inner product of ψ ⊗ φ with φ returns ψ. Eq. (18.46)
follows from the fact that the sets Bz form a partition of configuration space R3N (i.e.,
they are mutually disjoint and together cover the entire configuration space, ∪z Bz =
R3N ). This, in turn, follows from the assumption that the calibration function ζ is
defined everywhere in R3N .39 Thus, the proof is complete.                             
Proof from GRWf. Let F = {(T1 , X 1 ), (T2 , X 2 ), . . .} be the set of flashes (of both S
and A) from ti onwards. We know from Example 18.7 that the distribution of F (i.e.,
the joint distribution of all flashes after ti ) is given by Ψ(ti ) and some POVM G:
Since the outcome Z of the experiment is read off from A after ti , it is a function of F ,
                                              Z = ζ(F ) .                                 (18.48)
  39
    The physical meaning of this asumption is that the experiment always has some outcome. You
may worry about the possibility that the experiment could not be completed as planned due to power
outage, meteorite impact, or whatever. This possibility can be taken into account by introducing a
further element f for “failed” into the set Z of possible outcomes.
                                                   105
(Z is a function of F because the flashes define where the pointers point, and what the
shape of the ink on a sheet of paper is. It would even be realistic to assume that Z
depends only on the flashes of the apparatus, but this restriction is not needed for the
further argument.)
    Let Bz = {f : ζ(f ) = z}, the set of flash patterns having outcome z. Then,
                                                    
                            P(Z = z) = P F ∈ Bz                                  (18.49)
                                       = hΨ(ti )|G(Bz )|Ψ(ti )i                  (18.50)
                                       = hψ|EzGRW |ψi                           (18.51)
with
                               EzGRW ψ = hφ|G(Bz )|ψ ⊗ φiy .                    (18.52)
In fact, EzGRW may be different from Ez obtained from Bohmian mechanics as in (18.40),
in agreement with the fact that the same experiment (using the same initial wave func-
tion of the apparatus, etc.) may yield different outcomes in GRW than in Bohmian
mechanics. (However, since we know the two theories make very very similar predic-
tions, EzGRW will usually be very very close to Ez .) To see that EzGRW is a POVM, we
note that
                        hψ|EzGRW |ψi = hΨ(t1 )|G(Bz )|Ψ(t1 )i ≥ 0               (18.53)
and
                         X                     X
                               EzGRW ψ = hφ|       G(Bz )|ψ ⊗ φiy               (18.54)
                           z                   z
                                       = hφ|G(∪z Bz )|ψ ⊗ φiy                   (18.55)
   The main theorem about POVMs is equally valid in orthodox quantum mechanics
(OQM). However, since OQM does not permit a coherent analysis of measurement
processes (as it suffers from the measurement problem), we cannot give a complete
proof of the main theorem from OQM, but the same reasoning as given in the proof
from Bohmian mechanics would be regarded as compelling in OQM. At the same time,
the main theorem undercuts the spirit of OQM, which is to leave the measurement
process unanalyzed and to introduce observables by postulate. Put differently, the main
theorem about POVMs makes it harder to ignore the measurement problem.
                                           106
Proof. Suppose there was an experiment with Z = ψ. Then, for any given ψ, Z is
deterministic, i.e., its probability distribution is concentrated on a single point, P(Z =
φ) = δ(φ − ψ). The dependence of this distribution on ψ is not quadratic, and thus not
of the form hψ|Eφ |ψi for any POVM E. The argument remains valid when we replace
ψ by Cψ.
    This fact amounts to a limitation of knowledge in any version of quantum mechanics
in which wave functions are part of the ontology, which includes all interpretations of
quantum mechanics that we have talked about: Suppose Alice chooses a direction in
space n, prepares a spin- 12 particle in the state |n-upi, and hands that particle over to
Bob. Then, by Corollary 18.8, Bob has no way of discovering n if Alice does not give the
information away. The best thing Bob can do is, in fact, a Stern–Gerlach experiment in
any direction he likes, say in the z-direction; then he obtains one bit of information, up
or down; if the result was “up” then it is more likely that n lies on the upper hemisphere
than on the lower.
Corollary 18.9. There is no experiment in Bohmian mechanics that can measure the
instantaneous velocity of a particle with unknown wave function.
Proof. Again, the distribution of the velocity Im∇ψ/ψ(Q) with Q ∼ |ψ|2 is not quadratic
in ψ.
    In contrast, the asymptotic velocity can be measured, and its probability distribution
is in fact quadratic in ψ: Recall from (7.39) that it is given by (m/~)3 |ψ(mu/~)|
                                                                          b          2
                                                                                       .
    The impossibility of measuring instantaneous velocity goes along with the impossi-
bility to measure the entire trajectory without disturbing it. If we wanted to measure
the trajectory, for example by repeatedly measuring the positions every ∆t with inaccu-
racy ∆x, then the measurements will collapse the wave function, with the consequence
that the observed trajectory is very different from what the trajectory would have been
had we not intervened. Some authors regard this as an argument against Bohmian me-
chanics. Bell disagreed (Speakable and unspeakable in quantum mechanics, page 202):
       To admit things not visible to the gross creatures that we are is, in my
       opinion, to show a decent humility, and not just a lamentable addiction to
       metaphysics.
So, Bell criticized the positivistic idea that anything real can always be measured. In-
deed, this idea seems rather dubious in view of Corollary 18.8. We will sharpen this
consideration in Section 20.3.
                                           107
Definition 18.10. Two experiments (that can be carried out on arbitrary wave func-
tions ψ ∈ H with norm 1) are equivalent in law iff for every ψ ∈ H with kψk = 1,
they have the same distribution of the outcome. (Thus, they are equivalent in law iff
they have the same POVM.) A corresponding equivalence class of experiments is called
an observable.
     If E1 and E2 are equivalent in law and a particular run of E1 has yielded the outcome
z1 , it cannot be concluded that E2 would have yielded z1 as well. The counterfactual
question, “what would z2 have been if we had run E2 ?” cannot be tested empirically, but
it can be analyzed in Bohmian mechanics; there, one sometimes finds z2 6= z1 (for the
same QS and ψ in both experiments, but different QA and φ). For example, let E1 be a
Stern–Gerlach experiment in the z direction and E2 a Stern–Gerlach experiment in the
−z direction with the outcome called +1 if the particle is detected in the down channel
and −1 if the particle is detected in the up channel. Then E1 and E2 are equivalent
in law, although in Bohmian mechanics, the two experiments will often yield different
results when applied to the same 1-particle wave function and position.
     This situation illustrates why the term “observable” can be rather misleading: It is
intended to suggest “observable quantity,” but an observable is not even a well-defined
quantity to begin with (as the outcome Z depends on QA and φ), it is a class of
experiments with equal probability distributions.
     This point is connected to Wheeler’s fallacy. Recall the delayed choice experiment,
but now consider detecting the particle either directly at the slits or far away, ignoring
the interference region. As E1 , we put detectors directly at the slits and say that the
outcome is Z1 = +1 if the particle was detected in the left slit and Z1 = −1 if in the
right. This is a kind of position measurement that can be represented in the 2d Hilbert
space formed by wave functions of the form
so P(Z1 = +1) = |c1 |2 . Relative to the basis {|left sliti, |right sliti}, the POVM is the
spectral PVM of σ3 . As E2 , we put the detectors far away and say that Z2 = +1 if the
particle was detected in the far right and Z2 = −1 if in the far left. ψ evolves to
so P(Z2 = +1) = |c1 |2 . So, Z1 and Z2 have the same distribution, E1 and E2 have the
same POVM, and the two experiments are equivalent in law, although we know that the
Bohmian particle often passes through the right slit and still ends up on the far right.
    Now comes the point that has confused a number of authors40 : Since E1 measures the
“position observable,” and since E1 and E2 “measure” the same observable, it is clear
that E2 also measures the position observable. People concluded that E2 “measures
through which slit the particle went”—Wheeler’s fallacy! People concluded further that
since the Bohmian trajectory may pass through the left slit while Z2 = −1, Bohmian
  40
   For example (using a different but similar setup), B.-G. Englert, M.O. Scully, G. Süssmann, and
H. Walther: Surrealistic Bohm Trajectories. Zeitschrift für Naturforschung A 47: 1175–1186 (1992)
                                               108
mechanics must somehow disagree with measured facts about which slit the particle
went through. Bad, bad Bohm!
                                      109
19     Time of Detection
19.1     The Problem
Suppose we set up a detector, wait for the arrival of the particle at the detector, and
measure the time T at which the detector clicks. What is the probability distribution
of T ? This is a natural question not covered by the usual quantum formalism because
there is no self-adjoint operator for time. But from the main theorem about POVMs it
is clear that there must be a POVM E such that
That is, time of detection is a generalized observable. In this section we take a look at
this POVM E.
ψ0
    Suppose that we form a surface Σ ⊂ R3 out of little detectors so we can measure the
time and the location at which the quantum particle first crosses Σ. Suppose further
that, as depicted in Figure 4, Σ divides physical space R3 into two regions, Ω and
its complement, and the particle’s initial wave function ψ0 is concentrated in Ω. The
outcome of the experiment is the pair Z = (T, X) of the time T ∈ [0, ∞) of detection
                                           110
and the location X ∈ Σ of detection; should no detection ever occur, then we write
Z = ∞. So the value space of E is Z = [0, ∞) × Σ ∪ {∞}. We want to compute the
distribution of Z from ψ0 .
    Let us compare the problem to Born’s rule. In Born’s rule, we choose a time t0
and measure the three position coordinates at time t0 ; here, if we take Ω to be the half
space {(x, y, z) : x > x0 } and Σ its boundary plane {(x, y, z) : x = x0 }, then we choose
the value of one position coordinate (x0 ) and measure the time as well as the other
two position coordinates when the particle reaches that value. Put differently in terms
of space-time R4 = {(t, x, y, z)}, Born’s rule concerns measuring where the particle
intersects the spacelike hypersurface {t = t0 }, and our problem concerns measuring
where the particle intersects the timelike hypersurface {x = x0 }. We could say that we
need a Born rule for timelike hypersurfaces.
    I should make three caveats, though.
   • I have used language such as “particle arriving at a surface” that presupposes the
     existence of trajectories although we know that some theories of quantum me-
     chanics (GRWm and GRWf) claim that there are no trajectories, and still these
     theories are approximately empirically equivalent to Bohmian mechanics, so the
     time and location of the detector click would have approximately the same dis-
     tribution as in Bohmian mechanics. Our problem really concerns the distribution
     of the detection events, and we should keep in mind that in some theories the
     trajectory language cannot be taken seriously.
   • Even in Bohmian mechanics, there is a crucial difference between the case with
     the spacelike hypersurface and the one with the timelike hypersurface: The point
     where the particle arrives on the timelike hypersurface {x = x0 } may depend on
     whether or not detectors are present on that hypersurface. A detector that does
     not click may still affect ψ and thus the future particle trajectory. That is why
     I avoid the expression “time of arrival” (which is often used in the literature) in
     favor of “time of detection.” In contrast, the point where the particle arrives at
     the spacelike hypersurface {t = t0 } does not depend on whether or not detectors
     are placed along {t = t0 }.
   • The exact POVM E is given by (18.40) (with tf some late time at which we read
     off the values of T and X recorded by the apparatus) and will depend on the exact
     wave function of the detectors, so different detectors will lead to slightly different
     POVMs. Of course, we expect that these differences are negligible. What we want
     is a simple rule defining the POVM for an ideal detector, Eideal . That, of course,
     involves making a definition of what counts as an ideal detector. So the formula
     for Eideal is in part a matter of definition, as long as it fits well with the POVMs
     E of real detectors.
                                           111
19.2        The Absorbing Boundary Rule
The question of what Eideal is is not fully settled; I will describe the most plausible
proposal, the absorbing boundary rule.41 Such a rule was for a long time believed to be
impossible because of the quantum Zeno effect and Allcock’s paradox (see homework
exercises). Henceforth I will write E instead of Eideal . Let Σ = ∂Ω, ψ0 be concentrated
in Ω, kψ0 k = 1, and let κ > 0 be a constant of dimension 1/length (it will be a parameter
of the detector). Here is the rule:
                                          ∂ψ    ~2 2
                                     i~      =−    ∇ ψ+Vψ                                     (19.2)
                                          ∂t    2m
in Ω with potential V : Ω → R and boundary condition
                                           ∂ψ
                                              (x) = iκψ(x)                                    (19.3)
                                           ∂n
at every x ∈ Σ, with ∂/∂n the outward normal derivative on the surface, ∂ψ/∂n :=
n(x) · ∇ψ(x) with n(x) the outward unit normal vector to Σ at x ∈ Σ. Then, the rule
asserts,
                                      Zt2 Z
               Pψ0 t1 ≤ T < t2 , X ∈ B = dt d2 x n(x) · j ψt (x)             (19.4)
                                                           t1     B
for any 0 ≤ t1 < t2 and any set B ⊆ Σ, with d2 x the surface area element and j ψ the
probability current vector field (2.16). In other words, the joint probability density of T
and X relative to dt d2 x is the normal component of the current across the boundary,
jnψt (x) = n(x) · j ψt (x). Furthermore,
                                                 Z∞        Z
                        Pψ0 (Z = ∞) = 1 −             dt        d2 x n(x) · j ψt (x) .        (19.5)
                                                 0         Ω
   Let us study the properties of the rule. To begin with, the boundary condition (19.3)
implies that the current vector j at the boundary is always outward-pointing: For every
x ∈ Σ,
                                ∗ ∂ψ
                                                                     ~κ
                                                                   ∗
  n(x) · j(x) =     ~
                    m
                      Im     ψ(x)        (x) =   ~
                                                 m
                                                   Im       ψ(x) iκψ(x) =   |ψ(x)|2 ≥ 0 .     (19.6)
                                    ∂n                                    m
  41
     R. Werner: Arrival time observables in quantum mechanics. Annales de l’Institut Henri Poincaré,
section A 47: 429–449 (1987)
  R. Tumulka: Distribution of the Time at Which an Ideal Detector Clicks. (2016) http://arxiv.
org/abs/1601.03715
                                                     112
For this reason, (19.3) is called an absorbing boundary condition: It implies that there
is never any current coming out of the boundary. In particular, the right-hand side of
(19.4) is non-negative.
    So the rule invokes a new kind of time evolution for a 1-particle wave function as
an effective treatment of the whole system formed by the 1 particle and the detec-
tors together. It is useful to picture the Bohmian trajectories for this time evolution.
Eq. (19.6) implies that the Bohmian velocity field v(x) is always outward-pointing at
the boundary, n(x) · v(x) > 0 for all x ∈ Σ; in fact, the normal velocity is prescribed,
n(x)·v(x) = ~κ/m. In particular, Bohmian trajectories can cross Σ only in the outward
direction; when they do, they end on Σ, as ψ is not defined behind Σ. Put differently,
no Bohmian trajectories begin on Σ, they all begin at t = 0 in Ω with |ψ0 |2 distribu-
tion. In fact, the right-hand side of (19.4) is exactly the probability distribution of the
space-time point at which the Bohmian trajectory reaches the boundary. That is not
surprising, as in a Bohmian world we would expect the detector to click when and where
the particle reaches the detecting surface. As a further consequence, the right-hand side
of (19.5) is exactly the probability that the Bohmian trajectory never reaches Σ. In
particular, (19.4) and (19.5) together define a probability distribution on Z . Had we
evolved ψ0 with the Schrödinger equation on R3 without boundary condition on Σ, then
some Bohmian trajectories may cross Σ several times in both directions; this illustrates
that the trajectory in the presence of detectors can be different from what it would have
been in the absence of detectors.
    Since probability can only be lost at the boundary, never gained,
                                          Z
                                       2
                                  kψt k =     d2 x |ψt (x)|2                         (19.7)
                                           Ω
can only decrease with t, never increase. So here we are dealing with a new kind
of Schrödinger equation whose time evolution is not unitary as the norm of ψ is not
conserved. The time evolution operators Wt , defined by the property Wt ψ0 = ψt , have
the following properties: First, they are not unitary but satisfy kWt ψk ≤ kψk; such
operators are called contractions. Second, Ws Wt = Ws+t and W0 = I; a family (Wt )t≥0
with this property is called a semigroup. Thus, the Wt form a contraction semigroup.
    In fact, kψt k2 is the probability that the Bohmian particle is still somewhere in Ω
at time t, that is, has not reached the boundary yet. In particular, as an alternative to
(19.5) we can write
                                 P(Z = ∞) = lim kψt k2 .                           (19.8)
                                               t→∞
   The conclusions from our considerations about Bohmian trajectories can also be
obtained from the Ostrogradskii–Gauss integral theorem (divergence theorem) in 4 di-
mensions: The 4-vector field j = (ρ, j) has vanishing 4-divergence, as that is exactly
what the continuity equation (2.16) expresses. Integrating the divergence over [0, t] × Ω
                                           113
yields
              Z   t        Z
                       0
         0=           d3 x div j(t0 , x)
                      dt                                                                  (19.9)
            Z0      Ω
                              Z              Z t Z
          =      3                  3
               d x ρ(t, x) − d x ρ(0, x) +         dt0 d2 x n(x) · j(t0 , x)             (19.10)
             Ω
                          Z t ΩZ               0         Σ
Since the last integrand is non-negative, kψt k2 is decreasing with time and equals 1−
the flux of j into the boundary during [0, t]. In particular,
                                                 Z∞        Z
                                                       0
                                         2
                               lim kψt k = 1 −        dt       d2 x n(x) · j(t0 , x) ,   (19.12)
                               t→∞
                                                 0         Σ
so (19.5) is non-negative, and (19.4) and (19.5) together define a probability distribution.
    So what is the POVM E? It is given by
                                            ~κ †
                                E dt × d2 x =   Wt |xihx| Wt dt d2 x                     (19.13)
                                              m
                                   E({∞}) = lim Wt† Wt .                                 (19.14)
                                                     t→∞
Since the E(dt) are not projections, there are in general no eigenstates of detection time.
   Variants of the absorbing boundary rule have been developed for moving surfaces,
systems of several detectable particles, and particles with spin.42
 42
   R. Tumulka: Detection Time Distribution for Several Quantum Particles. http://arxiv.org/
abs/1601.03871
                                                      114
20      Density Matrix and Mixed State
In this chapter we prove a limitation to knowledge in quantum mechanics that follows
from the main theorem about POVMs. Let
denote the unit sphere in Hilbert space. Suppose that we have a mechanism that gener-
ates random wave functions Ψ ∈ S(H ) with probability distribution µ on S(H ). Then
it is impossible to determine µ empirically. In fact, there exist different distributions
µ1 6= µ2 that are empirically indistinguishable, i.e., they lead to the same distribution of
outcomes Z for any experiment. We call such distributions empirically equivalent (which
is an equivalence relation) and show that the equivalence classes are in one-to-one cor-
respondence with certain operators known as density matrices or density operators.
    To describe these matters, we need the mathematical concept of trace.
20.1      Trace
Definition 20.1. The trace of a matrix A = (Amn ) is the sum of its diagonal elements.
The trace of an operator T is defined to be the sum of the diagonal elements of its
matrix representation Tnm = hn|T |mi relative to an arbitrary ONB {|ni},
                                               ∞
                                               X
                                      tr T =         hn|T |ni .                       (20.2)
                                               n=1
   Every positive operator either has finite trace or has trace +∞, and the value of the
trace does not depend on the choice√of ONB. The trace class is the set of those operators
T for which the positive operator T † T has finite trace. For every operator from the
trace class, the trace is finite and does not depend on the ONB.
   The trace has the following properties for all operators A, B, . . . from the trace class:
  (i) The trace is linear:
for all λ ∈ C.
       In particular tr(AB) = tr(BA) and tr(ABC) = tr(CAB), which is, however, not
       always the same as tr(CBA).
                                               115
(iv) The trace of the adjoint operator T † is the complex-conjugate of the trace of T :
     tr(T † ) = tr(T )∗ .
is called the density operator or density matrix (rarely: statistical operator ) of the
distribution µ. Eq. (20.5) is called the trace formula. It was discovered by John von
Neumann in 1927,43 except that von Neumann did not know POVMs and considered
only PVMs. In case the distribution µ is concentrated on discrete points on S(H ),
(20.6) becomes                               X
                           ρµ = E|ΨihΨ| =        µ(ψ) |ψihψ| .                   (20.7)
                                                 ψ
because, if we choose the basis {|ni} in (20.2) such that |1i = ψ, then the summands
in (20.2) are hn|ψihψ|E|ni, which for n = 1 is hψ|E|ψi and for n > 1 is zero because
hn|1i = 0. By linearity, we also have that
                      X                      X
                    tr     µ(ψj )|ψj ihψj | E =    µ(ψj ) hψj |E|ψj i ,        (20.9)
                          j                            j
which yields (20.5) for any µ that is concentrated on finitely many points ψj on S(H ).
One can prove (20.5) for arbitrary probability distribution µ by considering limits.
 43
   J. von Neumann: Wahrscheinlichkeitstheoretischer Aufbau der Quantenmechanik. Göttinger
Nachrichten 1(10): 245–272 (1927). Reprinted in John von Neumann: Collected Works Vol. I,
A.H. Taub (editor), Oxford: Pergamon Press (1961)
                                             116
    Now let us draw conclusions from the formula (20.5). It implies that the distribution
of the outcome Z depends on µ only through ρµ . Different distributions µa , µb can
have the same ρ = ρµa = ρµb ; for example, if H = C2 then the uniform distribution
over S(H ) has ρ = 12 I, and for every orthonormal basis |φ1 i, |φ2 i of C2 the probability
distribution
                                       1
                                         δ + 21 δφ2
                                       2 φ1
                                                                                   (20.10)
also has ρ = 12 I. Such two distributions µa , µb will lead to the same distribution of
outcomes for any experiment, and are therefore empirically indistinguishable.
                                            117
known as the von Neumann equation. The step from (20.11) to (20.12) is based on the
fact that
                                   d At
                                      e = AeAt = eAt A .                          (20.13)
                                   dt
    A density matrix is also often called a quantum state. If ρ = |ψihψ| with kψk = 1,
then ρ is usually called a pure quantum state, otherwise a mixed quantum state. A
probability distribution µ has ρµ = |ψihψ| if and only if µ is concentrated on Cψ, i.e.,
Ψ = eiΘ ψ with a random global phase factor.
    As we have seen, a density matrix ρ is always a positive operator with tr ρ = 1.
Conversely, every positive operator ρ with tr ρ = 1 is a density matrix, i.e., ρ = ρµ for
some probability distribution µ on S(H ). Here is one such µ: find an orthonormal basis
{|φn i : n ∈ N} of eigenvectors of ρ with eigenvalues pn ∈ [0, ∞). Then
                                      X
                                        pn = tr ρ = 1 .                           (20.14)
                                     n
Now let µ be the distribution that gives probability pn to φn ; its density matrix is just
the ρ we started with.
                                           118
21         Reduced Density Matrix and Partial Trace
There is another way in which density matrices arise, leading to what is called the
reduced density matrix. Suppose that the system under consideration consists of two
parts, system a and system b, so that its Hilbert space is H = Ha ⊗ Hb .
where trb means the partial trace over Hb . The reduced density matrix and the trace
formula for it were discovered by Lev Landau in 1927.44
where the inner products on the right hand side are inner products in Ha ⊗ Hb . We
will sometimes write                  ∞
                                     X
                                S=      hφbm |T |φbm i ,                    (21.5)
                                             m=1
  (i) It is linear:
                         trb (S + T ) = trb (S) + trb (T ) ,   trb (λT ) = λ trb (T )   (21.6)
  44
    L. Landau: Das Dämpfungsproblem in der Wellenmechanik. Zeitschrift für Physik 45: 430–441
(1927)
                                                 119
 (ii) tr(trb (T )) = tr(T ). Here, the first tr symbol means the trace in Ha , the second
      one the partial trace, and the last one the trace in Ha ⊗ Hb . This property follows
      from (21.4) by setting k = n and summing over n.
(iii) trb (T † ) = (trb T )† . The adjoint of the partial trace is the partial trace of the
      adjoint. In particular, if T is self-adjoint then so is trb T .
                                             120
the wave function ofPan object and an apparatus after a quantum measurement of
the observable A =      αPα . Suppose that Ψα , the contribution corresponding to the
outcome α, is of the form
                                  Ψα = cα ψα ⊗ φα ,                           (21.11)
where cα = kPα ψk, ψ is the initial object wave function ψ, ψα = Pα ψ/kPα ψk, and φα
with kφα k = 1 is a wave function of the apparatus after having measured α. Since the
φα have disjoint supports in configuration space, they are mutually orthogonal; thus,
they are a subset of some orthonormal basis {φn }. The reduced density matrix of the
object is                         X                   X
               ρΨ = trb |ΨihΨ| =     hφn |ΨihΨ|φn i =    |cα |2 |ψα ihψα | .  (21.12)
                                    n                         α
This is the same density matrix as the statistical density matrix associated with the
probability distribution µ of the collapsed wave function ψ 0 ,
                                         X
                                    µ=      |cα |2 δψα ,                      (21.13)
                                            α
since                                   X
                                 ρµ =       |cα |2 |ψα ihψα | .                   (21.14)
                                        α
    It is sometimes claimed that this fact solves the measurement problem. The argu-
ment is this: From (21.10) we obtain (21.12), which is the same as (21.14), which means
that the system’s wave function has distribution (21.13), so we have a random outcome
α. This argument is incorrect, as the mere fact that two situations—one with Ψ as
in (21.10), the other with random ψ 0 —define the same density matrix for the object
does not mean the two situations are physically equivalent. And obviously from (21.10),
the situation after a quantum measurement involves neither a random outcome nor a
random wave function. As John Bell once put it, “and is not or.”
    It is sometimes taken as the definition of decoherence that the reduced density matrix
is (approximately) diagonal in the eigenbasis of the relevant operator A. In a previous
lecture I had defined decoherence as the situation that two or more wave packets Ψα are
macroscopically disjoint in configuration space (and thus remain disjoint for the relevant
future). The connection between the two definitions is that the latter implies the former
if Ψα is of the form (21.11).
                                                121
21.5     The No-Signaling Theorem
The no-signaling theorem is a consequence of the quantum formalism: If system a is
located in Alice’s lab and system b in Bob’s, and if the two labs do not interact, then the
statistical reduced density matrix of system a is (i) not affected by any measurement
Bob performs, and (ii) does not depend on the Hamiltonian of system b.
    To verify (i), suppose that systems a and b together have wave function ψ ∈ Ha ⊗Hb ,
and that Bob measures the observable B, which is a self-adjoint operator on Hb . Let
β denote the eigenvalues of B and Pβ the projection to the eigenspace of eigenvalue β.
The probability that Bob obtains the outcome β is
If Bob obtains β then ψ collapses to ψ 0 /Z, where ψ 0 = (Ia ⊗ Pβ )ψ and the normalization
factor is given by Z = kψ 0 k = hψ|Ia ⊗ Pβ |ψi1/2 . Thus, the statistical reduced density
matrix of system a is
                                        X                  |ψ 0 ihψ 0 |
                         ρ0 = trb               P(Z = β)                           (21.16)
                                            β
                                                              Z2
                                X              h                          i
                            =               trb (Ia ⊗ Pβ )|ψihψ|(Ia ⊗ Pβ )         (21.17)
                                    β
                            (vii)   X           h                i
                             =               trb |ψihψ|(Ia ⊗ Pβ )                  (21.18)
                                        β
                                 h            X     i
                            = trb |ψihψ|(Ia ⊗   Pβ )                               (21.19)
                                                           β
   To verify (ii), note that in the absence of interaction the unitary time evolution
operator is Ut = Ua,t ⊗ Ub,t . Thus, the reduced density matrix evolves according to
which does not depend on Ub,t . The argument extends without difficulty to statistical
reduced density matrices.
                                                     122
21.6     Canonical Typicality
This is an application of reduced density matrices in quantum statistical mechanics.
The main goal of quantum statistical mechanics is to derive facts of thermodynamics
from a quantum mechanical analysis of systems with a macroscopic number of particles
(say, N > 1020 ). One of the rules of quantum statistical mechanics asserts that if a
quantum system S is in thermal equilibrium at absolute temperature T ≥ 0, then it has
density matrix
                                            1
                                    ρcan = e−βHS ,                                 (21.27)
                                            Z
where Hs is the system’s Hamiltonian, β = 1/kT with k = 1.38 · 10−23 J/K the
Boltzmann constant, and Z = tre−βH the normalizing factor; ρcan is called the canonical
density matrix with inverse temperature β.
    While this rule has long been used, its justification is rather recent (2006) and goes
as follows. Suppose that S is coupled to another system B (the “heat bath”), and
suppose that S and B together have wave function ψ ∈ HS ⊗ HB and Hamiltonian H
with pure point spectrum (this comes out for systems confined to finite volume). Let
Imc = [E, E + ∆E] be an energy interval whose length ∆E is small on the macroscopic
scale but large enough for Imc to contain very many eigenvalues of H; Imc is called a
micro-canonical energy shell. Let Hmc be the corresponding spectral subspace, i.e., the
range of 1Imc (H), and umc the uniform probability distribution over S(Hmc ).
H ≈ HS ⊗ IB + IS ⊗ HB , (21.28)
then for most ψ relative to umc , the reduced density matrix of S is approximately canon-
ical for some value of β, i.e.,
                                     trB |ψihψ| ≈ ρcan .                           (21.29)
                                           123
22      Quantum Logic
The expression “quantum logic” is used in the literature for (at least) three different
things:
    Logic is the collection of those statements and rules that are valid in every conceivable
universe and every conceivable situation. Some people have suggested that logic simply
consists of the rules for the connectives “and”, “or,” and “not”, with “∀x ∈ M ” an
extension of “and” and “∃x ∈ M ” an extension of “or” to (possibly infinite) ranges M .
I would say that viewpoint is not completely right (because of Gödel’s theorem45 ) and
not completely wrong. Be that as it may, let us focus for a moment on the operations
“and” (conjunction A ∧ B), “or” (disjunction A ∨ B), and “not” (negation ¬A), and let
us ignore infinite conjunctions or disjunctions.
    A Boolean algebra is a set A of elements A, B, C, . . . of which we can form A ∧ B,
A ∨ B, and ¬A, such that the following rules hold:
• ∧ and ∨ are associative, commutative, and idempotent (A∧A = A and A∨A = A).
    • There are elements 0 ∈ A (“false”) and 1 ∈ A (“true”) such that for all A ∈ A ,
      A ∧ 0 = 0, A ∧ 1 = A, A ∨ 0 = A, A ∨ 1 = 1.
• Complementation laws: A ∧ ¬A = 0, A ∨ ¬A = 1.
It follows from these axioms that ¬(¬A) = A, and that de Morgan’s laws hold, ¬A ∨
¬B = ¬(A ∧ B) and ¬A ∧ ¬B = ¬(A ∨ B).
    The laws of logic for “and,” “or,” and “not” are exactly the laws that hold in
every Boolean algebra, with A, B, C, . . . playing the role of statements or propositions
or conditions. Another case in which these axioms are satisfied is that A, B, C, . . . are
sets, more precisely subsets of some set Ω, A ∧ B means the intersection A ∩ B, A ∨ B
means the union A ∪ B, ¬A means the complement Ac = Ω \ A, 0 means the empty set
∅, and 1 means the full set Ω. That is, every family A of subsets of Ω that contains
Ω and is closed under complement and intersection (in particular, every σ-algebra) is
a Boolean algebra. (It turns out that also, conversely, every Boolean algebra can be
realized as a family of subsets of some set Ω.)
  45
    Gödel provides an exampe of a statement that is true about the natural numbers, so it follows from
the Peano axioms, but cannot be derived from the Peano axioms using the standard rules of logic, thus
showing that these rules are incomplete.
                                                 124
    Now let A, B, C, . . . be subspaces of a Hilbert space H (more precisely, closed sub-
spaces, which makes no difference in finite dimension where every subspace is closed); let
A ∧ B := A ∩ B, A ∨ B := span(A ∪ B) (the smallest closed subspace containing both A
and B), and ¬A := A⊥ = {ψ ∈ H : hψ|φi = 0 ∀φ ∈ A} be the orthogonal complement
of A; let 0 = {0} be the 0-dimensional subspace and 1 = H the full subspace. Then
all axioms except distributivity are satisfied. So this structure is no longer a Boolean
algebra; it is called an orthomodular lattice or simply lattice. Hence, a distributive lattice
is a Boolean algebra, and the closed subspaces form a non-distributive lattice L(H ).
    That is nice mathematics, and we will see more of that in a moment. The analogy I
mentioned holds between L(H ) and Boolean algebras, often understood as representing
the rules of logic. The analogy is that both are lattices. In order to emphasize the
analogy, some authors call the elements of L(H ) “propositions” and the operations
∧, ∨, and ¬ “and,” “or,” and “not.” They call L(H ) the “quantum logic” and say
things like, A ∈ L(H ) is a yes-no question that you can ask about a quantum system,
as you can carry out a quantum measurement of the projection to A and get result 0
(no) or 1 (yes).
    Here is why the analogy is rather limited. Let me give two examples.
   • First, consider a spin- 21 particle with spinor ψ ∈ C2 , and consider the words “ψ
     lies in C|upi.” These words sound very much like a proposition, let me call it P,
     and indeed they naturally correspond to a subspace of H = C2 , viz., C|upi. Now
     the negation of P is, of course, “ψ lies in H \ C|upi,” whereas the orthogonal
     complement of C|upi is C|downi. Let me say that again in different words: The
     negation of “spin is up” is not “spin is down,” but “spin is in any direction but
     up.”
   • Second, consider the delayed-choice experiment in the form discussed at the end of
     Section 18.4: forget about the interference region and consider just the two options
     of either putting detectors in the two slits or putting detectors far away. The first
     option has the PVM Pleft + Pright = I, the second to the PVM U † Pfar right U +
     U † Pfar left U = I, where U is the unitary time evolution from the slits to the
     far regions where the detectors are placed. The two PVMs are identical, as
     U † Pfar right U = Pleft (and likewise for the other projection); that is, we have two
     experiments associated with the same observable. If we think of subspaces as
     propositions, then it is natural to think of the particle passes through the left slit
     as a proposition and identify it with the subspace A that is the range of Pleft . But
     if we carry out the second option, detect the particle on the far right, and say that
     we have confirmed the proposition A and thus that the particle passed through
     the left slit, then we have committed Wheeler’s fallacy.
    The philosophical idea that I mentioned is that logic as we know it is false, that
it applies in classical physics but not in quantum physics, and that a different kind of
logic with different rules applies in quantum physics—a quantum logic. Why did I call
that a rather silly idea? Because logic is, by definition, what is true in every conceivable
                                             125
situation. So logic cannot depend on physical laws and cannot be revised by empirical
science. As Tim Maudlin once nicely said:
There is no point in arguing with somebody who does not believe in logic.
Bell wrote in Against “measurement” (1989, page 216 in the 2nd edition of Speakable
and unspeakable in quantum mechanics):
         When one forgets the role of the apparatus, as the word “measurement”
       makes all too likely, one despairs of ordinary logic—hence “quantum logic.”
       When one remembers the role of the apparatus, ordinary logic is just fine.
    This amazing parallel between probability measures and density matrices has led
some authors to call elements of L(H ) “events” (as one would call subsets of Ω). Again,
this is a rather limited analogy, for the same reasons as above.
  46
   A.M. Gleason: Measures on the closed subspaces of a Hilbert space. Indiana University Mathe-
matics Journal 6: 885–893 (1957)
                                             126
23     No-Hidden-Variables Theorems
This name refers to a collection of theorems that aim at proving the impossibility of
hidden variables. This aim may seem strange in view of the fact that Bohmian mechan-
ics is a hidden-variable theory, is consistent and makes predictions in agreement with
quantum mechanics. So how could hidden variables be impossible? A first observa-
tion concerns what is meant by “hidden variables.” Most no-hidden-variable theorems
(NHVTs) address the idea that every observable A (a self-adjoint operator) has a true
value vA in nature (the “hidden variable”), and that a quantum measurement of A yields
vA as its outcome. This idea should sound dubious to you because we have discussed
already that observables are really equivalence classes of experiments, not all of which
yield the same value. Moreover, we know that in Bohmian mechanics, a true value
is associated with position but not with every observable, in particular not with spin
observables. Hence, in this sense of “hidden variables,” Bohmian mechanics is really a
no-hidden-variables theory.
    But this is not the central reason why the NHVTs do not exclude Bohmian mechan-
ics. Suppose we choose, in Bohmian mechanics, one experiment from every equivalence
class. (The experiment could be specified by specifying the wave function and configu-
ration of the apparatus together with the joint Hamiltonian of object and apparatus as
well as the calibration function.) For example, for every spin observable n · σ we could
say we will measure it by a Stern-Gerlach experiment in the direction n and subsequent
detection of the object particle. Then the outcome Zn of the experiment is a function
of the object wave function ψ and the objection configuration Q, so we have associated
with every observable n · σ a “true value” which comes out if we choose to carry out the
experiment associated with n · σ. And it is this situation that NHVTs claim to exclude!
So we are back at an apparent conflict between Bohmian mechanics and NHVTs.
    It may occur to you that even a much simpler example than Bohmian mechanics will
prove the possibility of hidden-variable theories. Suppose we choose, as a trivial model,
for every self-adjoint operator A a random value vA independently of all other vA0 with
the Born distribution,
                                   P(vA = α) = kPα ψk2 .                           (23.1)
Then we have not provided a real theory of quantum mechanics as Bohmian mechanics
provides, but we have provided a clearly consistent possibility for which values the
variables vA could have that agrees with the probabilities seen in experiment. Therefore,
all NHVTs must make some further assumptions about the hidden variables vA that are
violated in the trivial model as well as in Bohmian mechanics. We now take a look at
several NHVTs and their assumptions.
                                          127
more importantly for us now, they may change whenever ψ collapses. That is, when
a quantum measurement of A is carried out, we should expect the vA0 (A0 6= A) to
change. However, there is an exception if we believe in locality. Then we should expect
that Alice’s measurement of α · σ a (on her particle a) will not alter the value of any
spin observable β · σ b acting on Bob’s particle. But Bell’s analysis shows that this is
impossible. To sum up:
Theorem 23.1. (Bell’s NHVT, 1964) Consider a joint distribution of random variables
vA , where A runs through the collection of observables
               A ∪ B = α · σ a : α ∈ S(R3 ) ∪ β · σ b : β ∈ S(R3 ) .
                                                 
                                                                             (23.2)
Suppose that a quantum measurement of A ∈ A yields vA and does not alter the value
of vB for any B ∈ B, and that a subsequent quantum measurement of B ∈ B yields
vB . Then the joint distribution of the outcomes satisfies Bell’s inequality (16.32). In
particular, it disagrees with the distribution predicted by quantum mechanics.
       In short, local hidden variables are impossible.
                                               128
have values that are eigenvalues of A, and its marginal distribution must be the Born
distribution. Von Neumann assumed in addition that whenever an observable C is a
linear combination of observables A and B,
C = αA + βB , α, β ∈ R , (23.3)
Theorem 23.3. (von Neumann’s NHVT, 1932) Suppose 2 ≤ dim H < ∞ and ψ ∈
S(H ), let A be the set of all self-adjoint operators on H , and consider a joint dis-
tribution of random variables vA for all A ∈ A . Suppose that (23.3) implies (23.4).
Then for some A the marginal distribution of vA disagrees with the Born distribution
associated with A and ψ.
  49
    J.S. Bell: On the problem of hidden variables in quantum mechanics. Reviews of Modern Physics
38: 447–452 (1966)
                                              129
Contents
1 Course Overview                                                                                                                      2
4 Classical Mechanics                                                               14
  4.1 Definition of Newtonian Mechanics . . . . . . . . . . . . . . . . . . . . . 14
  4.2 Properties of Newtonian Mechanics . . . . . . . . . . . . . . . . . . . . . 15
  4.3 Hamiltonian Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Bohmian Mechanics                                                                                                                   20
  6.1 Definition of Bohmian Mechanics . . . . . . . . . .                             .   .   .   .   .   .   .   .   .   .   .   .   20
  6.2 Historical Overview . . . . . . . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   22
  6.3 Equivariance . . . . . . . . . . . . . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   22
  6.4 The Double-Slit Experiment in Bohmian Mechanics                                 .   .   .   .   .   .   .   .   .   .   .   .   24
  6.5 Delayed Choice Experiments . . . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   25
9 Spin                                                                                                                                43
  9.1 Spinors and Pauli Matrices . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   43
  9.2 The Pauli Equation . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   44
  9.3 The Stern–Gerlach Experiment        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   45
  9.4 Bohmian Mechanics with Spin .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   46
  9.5 Is an Electron a Spinning Ball?     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   47
  9.6 Many-Particle Systems . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   47
                                          130
   9.7    Representations of SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                         48
14 Many Worlds                                                                                                                                   77
   14.1 Schrödinger’s Many-Worlds Theory .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   77
   14.2 Everett’s Many-Worlds Theory . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   79
   14.3 Bell’s First Many-Worlds Theory . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   80
   14.4 Bell’s Second Many-Worlds Theory .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   80
   14.5 Probabilities in Many-World Theories                     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   80
                                                     131
16 Nonlocality                                                                                                                              86
   16.1 Bell’s Experiment .    . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    87
   16.2 Bell’s 1964 Proof of   Nonlocality     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    90
   16.3 Bell’s 1976 Proof of   Nonlocality     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    91
   16.4 Photons . . . . . .    . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    93
132