Ding Thesis
Ding Thesis
Shanshan Ding
A DISSERTATION
in
Mathematics
in
2014
Dissertation Committee:
Jonathan Block, Professor of Mathematics
Robin Pemantle, Merriam Term Professor of Mathematics
Martha Yip, Lecturer of Mathematics
A RANDOM WALK IN REPRESENTATIONS
© COPYRIGHT
2014
Shanshan Ding
Acknowledgments
First I must thank my advisor Robin Pemantle for, among many other things,
the immensely satisfying topic of this dissertation, and keeping me on track while
supporting whatever I wish to pursue. I am also indebted to Martha Yip for her
indispensable technical assistance and general awesomeness; she always has answers
for Jim Haglund’s advice and encouragement in the later stages of this work.
The graduate chairs during my time at Penn have all been very sympathetic.
Tony Pantev and David Harbater helped me navigate the beginning and the end of
grad school, while Jonathan Block refused to let me give up on myself through all
I would like to thank Persi Diaconis for pioneering a beautiful field of mathemat-
iii
ics at the intersection of probability and representation theory. Just as importantly,
I thank Ehrhard Behrends, whom I have never met, but without whose wonderfully
clear introduction to this field I may still be searching for a thesis topic.
John Morgan, and Ken Ono. In fact, I am deeply grateful for all of the outstanding
teachers and mentors, in and out of math, that I have encountered from grade school
to grad school.
At the same time, I also thank my students, who have collectively taught me
much more than I can ever teach them. Their joy in moments of breakthrough
reminded me of why I chose to study math on those occasions when it was otherwise
difficult to remember.
together. They bring sense and warmth into a building that I think has been proved
to be topologically baffling and bleak. I wish Janet the very best in her upcoming
I am proud to have been part of the Penn math department’s graduate student
community, which has consistently been comprised of friendly, talented, and all-
of this thesis are too large, but I would be remiss to not at least reference a subset.
iv
In particular, I would like to thank Ying Zhao for being my officemate during an
unforgettable year. I am also much indebted to Hilaf Hasson for his wisdom and
it has been fun traveling half of the world with you, and here’s to hoping that we
get to the other half at some point. Paul Levande, I already miss your insight
and wit, and I would be so very sad to miss watching in person your reaction to
Ben Affleck’s Batman. Additionally, I am grateful for the friendship and extensive
mathematical assistance over the years from Adam Topaz, David Lonoff, Elaine So,
Matthew Wright, Matti Astrand, Ryan and Better Ryan (Eberhart and Manion,
not necessarily in that order), Taisong Jing, Torin Greenwood, Tyler Kelly, Ying
I would like to thank the Philadelphia Orchestra for affording students in the
area the weekly opportunity to attend world-class concerts. These concerts have
been reminders on many Saturdays that, high or low, light or heavy, all notes
Finally, it’s not always sunny in Philadelphia, and I extend my sincerest grat-
itude to all those whose optimism and support, in good times and bad, sustained
1209117.
v
ABSTRACT
Shanshan Ding
Robin Pemantle
The unifying objective of this thesis is to find the mixing time of the Markov chain
the random transposition walk, which starts at (1n ). By considering the Fourier
in [DS81] that the order of mixing for the random transposition walks is n ln n. We
adapt this approach to find an upper bound for the mixing time of the n-cycle-to-
number of fixed points for the chain using the method of moments. In the process,
of tensor powers of the defining representation of Sn . Along the way, we also look at
for the mixing time of the m = n − 1 case as well as characterize the expected
vi
Contents
1 Introduction 1
2 Technical Preparations 9
3 Upper Bound 28
vii
4.3 Lower bound for the m = n case . . . . . . . . . . . . . . . . . . . . 45
5 Further Considerations 50
Bibliography 56
viii
List of Figures
ix
Chapter 1
Introduction
Naturally, the answer depends on what we mean by “shuffle” and “mix”. Broadly
1
from the same distribution, then this chain is a random walk on Sn . The distribu-
when the total variation distance between this measure and the uniform measure
on Sn is small. Intuitively, mixing means that one can no longer infer the positions
Since the early 1900s and especially during the past 30 years, mathematical
analyses of card shuffling have inspired significant progress in the theory of Markov
chain mixing times, particularly in revealing its rich connections with algebraic
combinatorics. Markov himself had cited card shuffling as a leading example of his
eponymous processes, and his 1906 proof in [Mar06] for the convergence of finite-
state Markov chains implies that shuffling eventually mixes the deck. Poincaré then
Of course, eventual mixing has always been the implicit premise of card shuffling,
so the more pertinent question is how soon. The first significant breakthrough
in this topic came in 1981, when Diaconis (a former professional magician) and
Shahshahani showed in [DS81] that the order of mixing for the random transposition
shuffle, where one repeatedly chooses two random cards and exchanges them, is
2
domain to pointwise products in the frequency domain, and the “frequency domain”
for a non-abelian group is given by its group representations, so we can track the
groups, and it was used by Hildebrand ([Hil92]: random transvections in SLn (Fq )),
Meanwhile, further techniques arose from studies of Markov chains that more
realistically model human card shuffling. Aldous and Diaconis [AD86] introduced
the concept of strong stationary time to prove that the order of mixing for the top-
to-random shuffle, where the top card is removed and inserted into the deck at a
an upper bound of O(n2 ln n) for the overhand shuffle, where one shaves off packets of
cards from the top of the deck and stacks each packet on top of the previous one until
all cards have been transferred to the new pile. This bound was ultimately shown to
be tight by Jonasson [Jon06] using a method for establishing sharp lower bounds due
to Wilson [Wil04]. As for the riffle shuffle, the most common shuffling technique
where one divides a deck into two piles and interlaces them together, Bayer and
3
Diaconis [BD92] concluded that seven shuffles are necessary and sufficient to mix a
52-card deck and in the process related the underlying Markov chain to Solomon’s
the techniques used to study card shuffling have led to substantial progress in the
(see [S-C01]). For comprehensive surveys of the works produced, refer to [Dia01]
and [S-C04].
Research in card shuffling and related topics is active and ongoing. Recent areas
of only selected features, such as card values but not suits [CV06] or the location
of the original bottom card [ADS11]. Extensive effort has also been devoted to
exploring and leveraging the symbiotic connections between card shuffling and the
[DF09]), and Hopf algebras [DPR12]. The pervasive theme in this line of research
since Diaconis and Shahshahani’s analysis of the random transposition shuffle has
been the marrying of spectral and probabilistic phenomena and techniques, a theme
that reverberates in the modern studies of expander graphs (see [HLW06]) and
the field has built-in ramifications for potentially multiple areas of applied math.
4
Two standout applications derived specifically from the work on card shuffling are
DNA segments. Of course, we should not overlook the implications of card shuffling
research for card playing itself ([Tho73], [CH10]). Vegas certainly paid attention and
even invited Diaconis, the renegade magician, for a homecoming of sorts to assess
some new automated shuffling machines. For the findings of the said investigation,
see [DFH13], though we take this opportunity to put forth the disclaimer that no
Nearly the entirety of the the card shuffling literature that we just surveyed deals
with time-homogeneous Markov chains, where the same method of shuffling is re-
peated until the deck is mixed. The present thesis, on the other hand, is motivated
a deck of n cards, how many transpositions are needed to mix the deck?
by either splitting a cycle in two (if the two objects transposed are in the same
cycle) or joining two cycles as one (if the two objects are in different cycles), so
5
the time-homogeneous random transposition walk is one such chain that starts at
the partition (1n ), whereas the process we proposed is one that starts at the other
extreme, (n). Markov chains formed on partitions under random transpositions are
applications of which are surveyed in [Ald99], and a related chain whose eigenfunc-
in [DR12].
cess on Sn instead of on the partitions of n, though we do hope that our work can
transpositions. Formally:
chain {Xk } on the symmetric group Sn as follows: let X0 be the identity1 , set
k are of the same parity. Otherwise, Xk ∈ Sn \An . Let µk be the law of Xk , and let
1
Markov chains on finite groups are translation-invariant, so setting X0 to some other element
of Sn may affect parity, but not mixing time.
6
we follow Diaconis and Shahshahani’s approach and adapt their analysis of the
random transposition shuffle to obtain upper bounds for the mixing times of the
Fourier analysis, and representation theory are introduced and modified as necessary
in Chapter 2, while the computations are carried out in Chapter 3. The lower bound
was much trickier, but we ultimately obtain one for the m = n case in Chapter 4
by deriving the distribution of the number of fixed points using the method of
Theorem 3.1.1 and Corollary 4.3.5. For any c > 0, after one n-cycle and cn
transpositions,
e−2c e−2c
− o(1) ≤ kµcn+1 − Ucn+1 kTV ≤ √ + o(1)
e 2 1 − e−4c
as n goes to infinity.
Our arguably most significant contribution is that, while trying to compute the
moments of the fixed point distribution, we discovered a neat (in all senses of the
word) formula for the decomposition of tensor powers of the defining representation
7
given by
r
X
λ̄ i r
aλ,r =f ,
|λ̄| i
i=|λ̄|
kind.
one about the m-cycle-to-transpositions chain for arbitrary m, and the other the
class measures starts with one fixed point, then it will always average exactly one
fixed point.
We then conclude by reflecting on what could have been and what could still
be, enumerating questions that seem just out of reach and suggesting related topics
8
Chapter 2
Technical Preparations
As the King asked of the White Rabbit, we begin at the beginning. Specifically, we
begin with a very brief introduction to the central objects of this thesis: Markov
9
When a Markov chain is at state x, the next position is chosen according to a
fixed probability distribution P (x, ·). If every step is chosen according to the same
transition matrix P , then the chain is said to be time-homogeneous, and the k-step
with transition probabilities P (x, yx) := µ(y). This means that the chain moves via
Definition 2.1.3. Let T (x) be the set of times when it is possible for a chain
starting at state x to return to x. The period of x is the gcd of T (x). The chain is
the chain.
X 1 X 1 X 1
UΩ (y)P (y, x) = P (y, x) = µ(xy −1 ) = = UΩ (x) (2.1.2)
y∈Ω
|Ω| y∈Ω |Ω| y∈Ω |Ω|
10
for all x ∈ Ω, as the second to last equality is from the observation that the operation
for Markov chains on finite groups. Note that as a result, the distance to stationarity
does not depend on the initial state: a chain that starts at x is simply a translation
translation-invariant.
Most of the theory on finite-state Markov chains is concerned with the long-
term behavior of the chains. In particular, we would like to know whether a chain
converges to a stationary distribution and, if so, how quickly. To quantify the speed
probability distributions.
1X
kµ − νkTV = |µ(x) − ν(x)| = max|µ(A) − ν(A)|. (2.1.3)
2 x∈Ω A⊆Ω
Furthermore, there exist constants 0 < α < 1 and C > 0 such that
The convergence theorem states the sufficient condition for mixing and even
11
specifies that mixing is exponentially fast. However, it gives no information on how
on a case-by-case basis.
Before moving on, we should note that the Markov chain defined in Chapter 1
in the next section, it is not difficult to see that this chain alternates between
An and Sn \An , and that each of the two subsequences converges to the uniform
not affect whether a chain converges as long as the chain is time-homogeneous after
analyzing Markov chain mixing times. A detailed and accessible treatment of the
is Chapter 3 of [CST08].
momorphism from G to the set of d-by-d unitary matrices, that is, ρ(gh) = ρ(g)ρ(h)
12
Example. The 1-dimensional representation of Sn which is 1 on An and −1 on Sn \An
Definition 2.2.2. (1) Representations ρ0 and ρ00 of the same dimension d are equiv-
alent if there exists a d-by-d unitary matrix M such that ρ2 (g) = M ρ1 (g)M −1 for
all g ∈ G.
Remark. By Maschke’s theorem (see, for instance, Theorem 1.5.3 of [Sag01]), every
Remark. To go back and forth between Definitions 2.2.1 and 2.2.3, define the group
action g · v to be (ρ(g))(v).
2
If G is abelian, then all representations of G are 1-dimensional and Ĝ is a group itself, com-
monly referred to as the Pontryagin dual of G.
13
one representative from each equivalence class of irreps of G.
The key idea of Diaconis and Shahshahani’s approach is to translate the question
non-trivial3 irreps of G”. The following helps to start making this idea precise:
X 1 X
|f (g)|2 = dρ tr[fˆ(ρ)(fˆ(ρ))† ], (2.2.1)
g∈G
|G|
ρ∈Ĝ
where ϕρij is defined by assigning to ϕρij (g) the ij-th entry of ρ(g), is an orthonormal
basis for the space4 of L2 functions on G. The Peter-Weyl theorem applies to all
compact topological groups; a proof for the case of finite groups is given in Theorem
16.11 of [Beh00].
3
As we will see, the sign representation is also excluded for the m-cycle-to-transpositions chain
due to the parity of the chain.
4
P
This is a Hilbert space with the inner product hf1 , f2 iG = g∈G f1 (g)f2 (g).
14
Let µ and ν be measures on G. Rewriting Theorem 2.2.5 with f = µ − ν gives
X 1 X
(µ(g) − ν(g))2 = dρ tr[(µ̂(ρ) − ν̂(ρ))(µ̂(ρ) − ν̂(ρ))† ], (2.2.3)
g∈G
|G|
ρ∈Ĝ
convolution of υ.
υ(gh−1 )η(h).
P
defined by (υ ∗ η)(g) = h∈G
∗k = υ̂ k .
∗ η = υ̂ η̂. Thus υc
Proposition 2.2.7. For any υ and η on G, υ[
P
Proposition 2.2.9. If ρ is any non-trivial irrep of G, then g∈G ρ(g) = 0, and
hence U
cG (ρ) = 0 for the uniform measure UG on G.
Proof. 5 Since ρ is non-trivial, there exists g0 ∈ G such that ρ(g0 ) 6= Idρ , and
X X X
ρ(g) = ρ(g0 g) = ρ(g0 ) ρ(g). (2.2.4)
g∈G g∈G g∈G
5
Despite being widely used, we have not found a coherent proof of this proposition in any text.
The proof given here is adapted from the proof of Lemma 15.3 of [Beh00], which is for the special
case where G is abelian.
15
Consider V , the G-module that carries the representation ρ. It is straightforward
P
to verify that W = {( g∈G ρ(g))(v) : v ∈ V } is a G-submodule of V . Since ρ is
P
irreducible, W must be either trivial or V itself. If W is trivial, then g∈G ρ(g) = 0,
P P
and if W = V , then g∈G ρ(g) is invertible. But g∈G ρ(g) cannot be invertible
P
because ρ(g0 ) 6= Idρ in (2.2.4), so it must be that g∈G ρ(g) = 0.
X 2 1 X
υ ∗k (g) − UG (g) = dρ tr[(υ̂(ρ))2k ] (2.2.5)
g∈G
|G|
ρ∈Ĝ
ρ6=ρtriv
between υ ∗k and UG .
j j
X X
2
( xi ) ≤ j x2i , (2.2.6)
i=1 i=1
X
4kυ ∗k − UG k2TV ≤ dρ tr[(υ̂(ρ))2k ] (2.2.8)
ρ∈Ĝ
ρ6=ρtriv
16
Before we work on the right hand sides of (2.2.5) and (2.2.8), let us note a couple
of things. First of all, strictly speaking our Markov chain is not time-homogeneous.
This is not a big deal: let υm be the uniform measure on the m-cycles of Sn and υ2 be
uniform on the transpositions, then the law µk+1 of Xk+1 is given by µk+1 = υ2∗k ∗υm ,
Secondly, the limiting distribution of our Markov chain is not uniform on the
whole group Sn , but rather alternates between the uniform measure on An and
the uniform measure on Sn \An . This is slightly more problematic. Diaconis and
Shahshahani, as well as most of those who followed, avoided parity by making their
chain lazy. The trade-off is a small amount of precision in the ensuing computations.
is the following proposition, which we will prove at the end of the chapter:
on An . Then
X 1 X
(µ(g) − U (g))2 = dρ tr[µ̂(ρ)(µ̂(ρ))† ]. (2.2.9)
g∈Sn
n!
ρ∈Sc n
ρ6=ρtriv ,ρsign
X 1 X
(µk+1 (g) − Uk+1 (g))2 = dρ tr[((υb2 (ρ))k υc 2
m (ρ)) ] (2.2.10)
g∈Sn
n!
ρ∈Sc n
ρ6=ρtriv ,ρsign
17
and the total variation bound
1 X
4kµk+1 − Uk+1 k2TV ≤ dρ tr[((υb2 (ρ))k υc 2
m (ρ)) ]. (2.2.11)
2
ρ∈Sc n
ρ6=ρtriv ,ρsign
Proof. Equation (2.2.10) is clear from Propositions 2.2.7-2.2.9 and Lemma 2.2.10.
To see (2.2.11), observe that µk+1 (g) − Uk+1 (g) = 0 for half of Sn , so
!2
X n! X
|µk (g) − Uk (g)| ≤ (µk+1 (g) − Uk+1 (g))2 (2.2.12)
g∈Sn
2 g∈S
n
by Cauchy-Schwarz.
in (2.2.5) and (2.2.8) are all just scalars. Fortunately, even for a non-abelian G, a
Lemma 2.2.13. Let υ be a class measure. For every ρ ∈ Ĝ, we have that
!
1 X
υ̂(ρ) = υ(g)χρ (g) Idρ , (2.2.13)
dρ g
Remark. Since traces are similarity-invariant, χρ (g) = χρ (h) whenever g and h are
in the same conjugacy class. For elements of the symmetric group, this happens
18
Example (Diaconis and Shahshahani, [DS81]). Consider the (lazy) random transpo-
1 2 n(n−1)
measure υ that assigns mass n
to the identity and n2
to each of the 2
transpo-
The spectral interpretation of the right hand side of (2.2.15) is that the eigenvalues
1 (n − 1)χρ (τ )
+ , ρ∈S
cn , (2.2.16)
n ndρ
each occuring with multiplicity d2ρ . For more on the spectral theory of Markov
where π is any m-cycle and τ is any transposition. Corollary 2.2.11 then gives
2k 2
X 21 X χρ (τ ) χρ (π)
(µk+1 (g) − Uk+1 (g)) = d2ρ (2.2.18)
g∈Sn
n! dρ dρ
ρ∈Sc n
ρ6=ρtriv ,ρsign
19
and
2k 2
1 X χρ (τ ) χρ (π)
4kµk+1 − Uk+1 k2TV ≤ d2ρ . (2.2.19)
2 dρ dρ
ρ∈Sc n
ρ6=ρtriv ,ρsign
χρ
Expressions of the form dρ
are called normalized characters. The next step is to
[Sag01]) that the number of equivalence classes of irreps is equal to the number of
for arbitrary groups, for Sn we can index both the conjugacy classes and the irreps
canonical S
cn .
in left-justified rows, such that the row lengths are weakly decreasing. For each
20
boxes in its ith row.
Example. Figure 2.1 displays the Young diagrams corresponding to the partitions
of 4.
Young diagram of shape λ by filling its boxes with the numbers 1, 2, . . . , n bijec-
tively. A Young tableau is standard if the entries in each row and each column are
increasing.
At this point we shall briefly describe the construction of Specht modules, which
are indexed by partitions of n and form a complete set of irreps of Sn . See Chapter
Definition 2.3.3. Two Young tableaux t1 and t2 of the same shape are row equiv-
alent if corresponding rows of the two tableaux contain the same elements. For a
Young tableau t, the λ-tabloid {t} is the set of all Young tableaux that are row
equivalent to t.
21
Definition 2.3.4. Let λ ` n. The vector space over C whose basis is the list of
i.e. the subgroup of Sn that permutes only the elements within each column of t.
P
Definition 2.3.6. For a Young tableau t, define κt = σ∈Ct sign(σ)σ. Then the
Theorem 2.3.8. The Specht modules S λ for λ ` n form a complete set of irreps of
Sn over C.
n
We note here that S (n) is the trivial representation of Sn and that S (1 ) is the sign
tableaux of shape λ, which can be computed with the elegant hook length formula
22
Definition 2.3.9. Let (i, j) denote the j th box in the ith row of a Young diagram.
Its hook is the set of all boxes directly below and directly to the right (including
itself), i.e.
n!
dim S λ = Q . (2.3.3)
(i,j)∈λ hi,j
Example. Consider the Young diagram of shape (4, 4, 3). On the left of Figure 2.2,
the dotted boxes constitute the hook H1,2 . On the right, the number in each box is
the length of the hook of the box, from which we see that the dimension of S (4,4,3)
11!
is 6·52 ·42 ·32 ·22 ·12
.
• • • 6 5 4 2
• 5 4 3 1
• 3 2 1
Figure 2.2: H1,2 and the array of hook lengths for (4, 4, 3)
23
boxes, containing no subset of 2-by-2 blocks, that can be removed from λ to leave
a proper Young diagram with the same top left corner as λ. The leg length of ξ,
Example. The top half of Figure 2.3 shows several rim hooks of (4, 4, 3) along with
their leg lengths, while the bottom half gives several non-examples of rim hooks.
• •
• • • •
• •
• • • • • •
• • • • •
• •
We use λ\ξ to denote the Young diagram obtained from λ by removing the
rim hook ξ. In the top right diagram of Figure 2.3, for instance, we have that
(4, 4, 3)\ξ = (3, 2, 1). Also, for cycle type γ = (γ1 , γ2 , . . . , γr ), we use the notation
X λ\ξ
χλγ = (−1)ll(ξ) χγ\γ1 , (2.3.4)
ξ
24
where the sum is over all rim hooks ξ of λ with γ1 boxes.
Remark. This is a recursive formula. The first iteration is to remove from λ a rim
hook with γ1 boxes in all possible ways, the next iteration is to remove from each
remaining diagram a rim hook with γ2 boxes in all possible ways, and so on. The
size, so that the contribution of the corresponding character is zero, or when all
(4,4,3)
Example. Figure 2.4 illustrates how to compute the character χ(5,4,2) using the
Murnaghan-Nakayama rule. The sign of the rim hook being removed (±1 depending
We multiply together the signs along each path and add the products, so that
→ • • • → •
• •
• •
• • •
−1 −1 −1
•
• • →
• •
+1 0
(4,4,3)
Figure 2.4: Computing χ(5,4,2) with the Murnaghan-Nakayama rule
25
The stage is now set. We have the tools we need to compute the dimensions and
2.2.10 as promised.
corresponding to the Young diagram obtained by switching the rows and columns
Example. The partitions (4, 4, 3) and (3, 3, 3, 2) are conjugates, and the partition
(4, 3, 3, 1) is self-conjugate.
0
Remark. Note that by the hook length formula, dim S λ = dim S λ . Furthermore,
0
6.6 of [Jam78] implies that χλγ = ±χλγ , depending on the sign of γ.
the conjugacy classes of Sn that split in An , which have cycle types with all odd
Young diagram whose diagonal boxes have hook lengths γ1 , γ2 , . . . , γr . For instance,
0
Proposition 2.3.14. (1) If λ is not self-conjugate, then S λ |An = S λ |An , and this
is irreducible as a representation of An .
dim S λ
of dimension 2
. For conjugacy classes γ of Sn that do not correspond to λ as
26
described above (even if γ also splits in An ),
λ
χSγ
χρ1 (γ) = χρ2 (γ) = . (2.3.6)
2
the classes of An that it splits into, then χρ1 (ξ) = χρ2 (ξ 0 ) and χρ1 (ξ 0 ) = χρ2 (ξ), and
1 p n−r
(−1)q ± (−1)q γ1 γ2 . . . γr , where q = . (2.3.7)
2 2
n n
Proof of Lemma 2.2.10. First, observe that µ̂(S (1 ) ) and Û (S (1 ) ) are equal to 1 if
is uniform on An or on Sn \An .
S λ (g) = 0 and,
P P P
g∈An ρ1 (g) = 0 and g∈An ρ2 (g) = 0, we again have that g∈An
S λ (g) = 0.
P
analogously to above, that g∈Sn \An
27
Chapter 3
Upper Bound
Einmal ist keinmal, says Tomas to himself. What happens but once,
says the German adage, might as well not have happened at all.
Theorem 3.1.1. For any c > 0, after one n-cycle and cn transpositions,
e−4c
4kµcn+1 − Ucn+1 k2TV ≤ + o(1) (3.1.1)
1 − e−4c
as n goes to infinity.
The first and most critical step of the proof is the observation that, discounting
(n) and (1n ), χλ(n) = 0 for all λ except the L-shaped ones, for which λ2 = 1. This is
28
to remove a rim hook of size n from a Young diagram of size n unless the Young
diagram itself is the rim hook; we will discuss later what this means probabilistically.
Moreover, for an L-shaped λ, it is clear that χλ(n) is equal to 1 if λ has an odd number
where
χλ
(2,1n−2 )
The normalized characters dim S λ
have a simple description when λ ∈ Λn :
Proposition 3.1.2. Let λ ∈ Λn , and let j be one less than the number of rows of
n−1
λ. For 1 ≤ j ≤ 2
,
(n−j,1j )
χ(2,1n−2 ) n − 1 − 2j
= . (3.1.4)
dim S (n−j,1j ) n−1
29
Let ñ be the number of remaining boxes, i.e. n − 2. Observe that, for any partition
λ̃ of ñ, the character of λ̃ at (1ñ ) is exactly the number of standard Young tableaux
of shape λ̃, or the dimension of λ̃, which again can be computed with the hook
length formula:
(n−j−2,1j ) (n − 2)! n−3
χ(1n−2 ) = = (3.1.7)
(n − 2) · j!(n − j − 3)! j
and
(n−j,1j−2 ) (n − 2)! n−3
χ(1n−2 ) = = . (3.1.8)
(n − 2) · (j − 2)!(n − j − 1)! j−2
for j > 1.
For j = 1, dim S (n,1) = n − 1, and since there is only one way to remove a rim
(n−1,1)
hook of size two from (n − 1, 1), we see that χ(2,1n−2 ) = n − 3.
Remark. When λ̃ is an L-shaped partition of ñ, we can actually skip the hook length
formula and derive χλ̃(1ñ ) with the following simple combinatorial argument:
30
Let j̃ be one less than the number of boxes in the first column of λ̃. Removing
one box at a time according to the Murnaghan-Nakayama rule, j̃ boxes in the first
column are removed before we are left with a single row of boxes, at which point
there is only one way to remove the remaining boxes. The number of ways to get
to that point is the number of ways to pace the removal of the j̃ boxes throughout
the removal of an overall ñ − 1 boxes (the upper left box must be removed last),
ñ−1
that is, j̃
.
0
Thus Proposition 3.1.2 and the fact that χλγ = ±χλγ imply that, for large n,
(n−2)/2
P −4cj
λ
!2cn 2 e n is even
X χ(2,1n−2 )
j=1
∼ (3.1.11)
λ∈Λn
dim S λ
(n−3)/2
P −4cj
2 e n is odd.
j=1
as was to be shown.
31
Theorem 3.2.1. For any c > 0, after one (n − 1)-cycle and cn transpositions,
e−8c
4kµcn+1 − Ucn+1 k2TV ≤ + o(1) (3.2.1)
1 − e−4c
as n goes to infinity.
The proof is similar to the m = n case. We start with the observation that
χλ(n−1,1) = 0 for all λ except the ones with a 2-by-2 block of boxes in the upper left,
where
Proposition 3.2.2. Let λ ∈ Λn−1 , and let j be two less than the number of rows
n−4
of λ. For 0 ≤ j ≤ 2
,
(n−2−j,2,1j )
χ(2,1n−2 ) n − 4 − 2j
= . (3.2.4)
dim S (n−2−j,2,1j ) n
32
Proof. By the hook length formula,
j n!
dim S (n−2−j,2,1 ) = . (3.2.5)
(n − 1)(2 + j)(n − 2 − j) · j!(n − 4 − j)!
For j = 0, e.g. the leftmost diagram in Figure 3.1, there are two ways to remove
a rim hook of size two: from the first row, or from the second. The latter leaves a
(n−2,2)
single row and therefore contributes +1 to the value of χ(2,1n−2 ) , whereas the former
contributes
Thus
(n−2,2)
χ(2,1n−2 ) ((n − 2)(n − 5) + 2) 2(n − 1)(n − 2) · (n − 4)!
= ·
dim S (n−2,2) 2 n! (3.2.7)
n2 − 7n + 12 (n − 4)(n − 3) n−4
= = = .
n(n − 3) n(n − 3) n
For j = 1, e.g. the middle diagram in Figure 3.1, there is only one way to remove
a rim hook of size two, namely from the first row, so that
For j > 1, there are two ways to remove a rim hook of size two: from the first
33
row, or from the first column. This implies that
with
(n−4−j,2,1j ) (n − 2)!
χ(1n−2 ) = (3.2.11)
(n − 3)(2 + j)(n − 4 − j) · j!(n − 6 − j)!
and
(n−2−j,2,1j−2 ) (n − 2)!
χ(1n−2 ) = . (3.2.12)
j(n − 3)(n − 2 − j) · (j − 2)!(n − 4 − j)!
and
(n−2−j,2,1j )
χ(2,1n−2 ) (n − 2)!(n − 4 − 2j)
=
dim S (n−2−j,2,1j ) (2 + j)(n − 2 − j) · j!(n − 4 − j)!
(n − 1)(2 + j)(n − 2 − j) · j!(n − 4 − j)! (3.2.14)
·
n!
n − 4 − 2j
= ,
n
as promised.
34
and thus for large n,
(n−6)/2
P −2c(4+2j)
2 e n is even
!2cn
χλ(2,1n−2 )
X
j=0
∼ (3.2.16)
λ∈Λn−1
dim S λ
(n−5)/2
P −2c(4+2j)
2 e n is odd.
j=0
This gives
!2cn
1 X χλ(2,1n−2 ) e−8c
4kµcn+1 − Ucn+1 k2TV ≤ ∼ (3.2.17)
2 λ∈Λ dim S λ 1 − e−4c
n−1
as an upper bound.
We pause here for a few remarks. First, it is worth pointing out just how good
Theorems 3.1.1 and 3.2.1 are, in the sense that the only source of inequality comes
Secondly, the proofs of Propositions 3.1.2 and 3.2.2, while messy, are satisfying
in that only the hook length formula and the Murnaghan-Nakayama rule are used.
On the other hand, the results turned out to be essentially special cases of the
identity
χλ(2,1n−2 ) 2
P
i (λi − (2i − 1)λi )
= , (3.2.18)
dim S λ n(n − 1)
Thirdly, representation theory confirms what seems intuitive, that moving a lot
of cards in the beginning leads to the cards being mixed sooner. In particular, the
tions and lessening the contributions of the rest. However, we have also uncovered
35
something counterintuitive, that moving n − 1 cards in the beginning seems to lead
to even faster mixing than moving all n cards! We will verify this and propose an
36
Chapter 4
A question is like a knife that slices through the stage backdrop and
gives us a look at what lies hidden behind it.
For measures µ and ν on a set G, one of the classic approaches to finding a lower
since it is well-known (e.g. to Montmort three centuries ago in [Mon08]) that the
distribution of the number of fixed points with respect to the uniform measure on
37
Slightly less well-known6 , though unsurprising, is that the distribution of fixed
points with respect to the uniform measure on either An or Sn \An is also asymptot-
ically P(1). We will give an original proof for all of the Poisson limit laws mentioned
here in Section 4.3. For a brute-force combinatorial proof of the weaker result that
the mass, with respect to the uniform measure on Sn , An , as well as Sn \An , of fixed-
1
point-free permutations approaches e
as n approaches infinity, consult [AU08].
permutations with one or more fixed points, and finding µk (A) boils down to a
coupon collector’s problem. Let B be the event that, after k transpositions, at least
one card is untouched. It is not difficult to see that µk (A) ≥ P(B), where P(B)
is equal to the probability that at least one of n coupons is still missing after 2k
trials. The coupon collector’s problem is well-studied (see, for instance, Section
IV.2 of [Fel68]), so this immediately gives a lower bound for µk (A), which in turn7
The above argument is so short and simple that it was tagged on to the end of
inapplicable to our problem, since the initial (large) m-cycle obliterates the core of
the argument. To delve more deeply into the behavior of fixed points, we again
6
We have not actually found this documented anywhere but presume that it is known.
7
As µk (A) − U (A) ≥ 0, the inequality is in the desired direction.
38
4.2 The defining representation
Example. For S3 ,
1 0 0 0 1 0
%(e) = 0 1 0 , %(1, 2) = 1 0 0 ,
0 0 1 0 0 1
0 0 1 1 0 0
%(1, 3) = 0 1 0 , %(2, 3) = 0 0 1 , (4.2.1)
1 0 0 0 1 0
0 0 1 0 1 0
%(1, 2, 3) = 1 0 0 , %(1, 3, 2) = 0 0 1 .
0 1 0 1 0 0
The significance of % should be apparent: the fixed points can be read off of
the matrix diagonal, so that χ% (σ) is precisely the number of fixed points of σ. We
should also point out that % is reducible and decomposes as S (n−1,1) ⊕ S (n) (see
Examples 1.4.3, 1.9.5, and 2.3.8 of [Sag01]), so that the character of S (n−1,1) at σ is
one less than the number of fixed points of σ. The representation S (n−1,1) is often
Heuristically, the connection between S (n−1,1) and fixed points vouches for the
quality of the lower bound obtained via fixed points, since S (n−1,1) is in some sense
the representation closest to the trivial representation and usually contributes the
largest normalized character to the sum in (2.2.19). Moreover, this connection sheds
39
light on why the m = n − 1 case seems to converge faster: it is an atypical case
where the contribution from S (n−1,1) is zero! Informally, this is the representation-
theoretic analogue of the probabilistic intuition that, since the expected number of
fixed points is one under the uniform distribution, a chain that starts with exactly
one fixed point is closer to uniformity than a chain that starts with none.
Now, heuristics aside, we would like to find the mass of fixed-point-free permuta-
tions under µk+1 . Since this cannot be done directly, we will in fact prove something
more general: we will fully characterize the distribution of χ% with respect to µk+1
by deriving all moments of χ% with respect to µk+1 . The pivotal observation, in-
Proposition 4.2.2. Let Eµ denote expectation with respect to µ, and let aλ,r be
M M
%⊗r = aλ,r S λ := (S λ )⊕aλ,r . (4.2.2)
λ`n λ`n
X
Eµ ((χ% )r ) = aλ,r tr(µ̂(S λ )) (4.2.3)
λ`n
Proof. Since the tensor product has the property that the trace of the product is
40
equal to the product of the traces,
X X
Eµ ((χ% )r ) = µ(σ)[tr(%(σ))]r = µ(σ)tr(%⊗r (σ))
σ∈Sn σ∈Sn
! !
X M X X
= µ(σ)tr aλ,r S λ (σ) = µ(σ) aλ,r tr(S λ (σ)) (4.2.4)
σ∈Sn λ`n σ∈Sn λ`n
!
X X X
= aλ,r µ(σ)tr(S λ (σ)) = aλ,r tr(µ̂(S λ )),
λ`n σ∈Sn λ`n
Remark. The first line of (4.2.4) is clearly true for any representation ρ of Sn , and
which we have already computed for all λ in the m = n and m = n − 1 cases while
working on the upper bound! Thus in light of Proposition 4.2.2, if we find aλ,r for
all λ and r, then we would know all moments of χ% with respect to µk+1 .
r
X
λ̄ i r
aλ,r =f , (4.2.6)
|λ̄| i
i=|λ̄|
41
kind, i.e. the number of ways to partition r objects into i non-empty subsets.
Remark. Since f λ̄ = dim S (λ2 ,...,λl ) can be computed with the hook length formula
r i !
n! X i 1X i−j i
aλ,r = Q (−1) jr , (4.2.8)
(x,y)∈λ hx,y |λ̄| i! j=1 j
i=|λ̄|
Proof. Theorem 4.2.3 owes its existence to the recent work of Goupil and Chauve,
X xr f λ̄ ex −1 x
aλ,r = e (e − 1)|λ̄| (4.2.9)
r! |λ̄|!
r≥|λ̄|
for λ ` n and n ≥ r + λ2 .
X s xs (ex − 1)j
= (4.2.10)
s≥j
j s! j!
and
X xt x
Bt = ee −1 , (4.2.11)
t≥0
t!
Pt t
where B0 := 1 and Bt = q=1 q
is the t-th Bell number, so we obtain from (4.2.9)
42
that
aλ,r X Bt s
λ̄
=f , (4.2.12)
r! s+t=r
s!t! |λ̄|
and thus
r−|λ̄|
aλ,r X rr − t
= Bt
f λ̄ t=0
t |λ̄|
r−| Xλ̄| X t
r t r r−t
= + (4.2.13)
|λ̄| t=1 q=1
q t |λ̄|
r−|
Xλ̄| r−|
Xλ̄| t rr − t
r
= + .
|λ̄| q=1 t=q
q t |λ̄|
r−|λ̄|
X t r r−t q + |λ̄| r
= , (4.2.14)
t=q
q t |λ̄| |λ̄| q + |λ̄|
so that
r−|
Xλ̄| q + |λ̄| r
aλ,r r
= +
f λ̄ |λ̄| q=1
|λ̄| q + |λ̄|
r X r (4.2.15)
r X i r i r
= + = ,
|λ̄| |λ̄| i |λ̄| i
i=|λ̄|+1 i=|λ̄|
as we rejoice.
M M
(S (n−1,1) )⊗r = bλ,r S λ := (S λ )⊕bλ,r . (4.2.16)
λ`n λ`n
43
Goupil and Chauve also derived the generating function
X xr f λ̄ ex −x−1 x
bλ,r = e (e − 1)|λ̄| , (4.2.17)
r! |λ̄|!
r≥|λ̄|
so from Theorem 4.2.3 we can obtain a decent formula for the irrep decomposition
so that
t r r−s s
bλ,r X (−1) aλ,s X (−1) f λ̄
X i s
= = , (4.2.20)
r! s+t=r
s!t! s!(r − s)! |λ̄| i
s=|λ̄| i=|λ̄|
Remark. Corollary 4.2.4 is very similar to Proposition 2 of [GC06], but our result is
a bit cleaner, as it does not involve associated Stirling numbers of the second kind.
44
4.3 Lower bound for the m = n case
not all probability distributions are uniquely determined by their moments. For
uniqueness.
Theorem 4.3.1. Let mr denote the r-th moment of the distribution of a random
r
variable Y . If the moment-generating function E(etY ) = mr tr! has a positive
P
r≥0
radius of convergence, then there is no other distribution with the same moments.
of mean ν is
which satisfies the uniqueness condition, so Poisson distributions are indeed deter-
that Y has moments of all orders, and that E(Yir ) tends to E(Y r ) for all r, then Yi
converges in distribution to Y .
45
Proof. See Theorem 30.2 of [Bil95].
We are now ready to prove several Poisson limit laws. First, as promised, we
Theorem 4.3.3. (1) Let USn denote the uniform measure on Sn . As n approaches
λ
Proof. For (1), recall Proposition 2.2.9, which implies that U
dSn (S ) is 1 if λ = (n)
and 0 otherwise. Thus the combination of Proposition 4.2.2 and Theorem 4.2.3
r
r
X r
EUSn ((χ% ) ) = a(n),r = = Br , (4.3.2)
i=0
i
which by (4.2.11) and (4.3.1) is exactly the r-th moment of P(1). This means that
the first n moments of χ% with respect to USn match those of P(1), and therefore
λ n
For (2), recall from the proof of 2.2.10 that U
d An (S ) is 1 if λ is (n) or (1 ) and
λ n
Sn \An (S ) is 1 if λ = (n), −1 if λ = (1 ), and 0 otherwise.
0 otherwise. Moreover, U\
Hence
46
As before, a(n),r = Br for 1 ≤ r ≤ n. Meanwhile, for 1 ≤ r ≤ n − 1,
r
X i r
a(1n ),r = , (4.3.4)
i=n−1
n − 1 i
Returning to the Markov chain mixing rate problem, the next Poisson limit law
will finally give a satisfactory lower bound for the mixing rate of the n-cycle-to-
transpositions shuffle.
the number of fixed points after one n-cycle and cn transpositions converges to
P(1 − e−2c ).
Proof. One can deduce from the moment-generating function, or just look up in
Pr r
[Rio37], that the r-th moment of P(ν) is i=1 i
ν i . As it went with the proof of
(n)
Theorem 4.3.3, µ[
cn+1 (S ) = 1, and we will ignore the alternating representation
because it suffices to consider the first n − 2 moments. For the non-trivial and
synthesize Proposition 3.1.2, (3.1.10) with n instead of 2n, and (4.2.5) with the
47
By Theorem 4.2.3 (second line below) and (4.3.5) (fourth line), for 1 ≤ r ≤ n−2,
X
Eµcn+1 ((χ% )r ) = a(n),r + aλ,r µ[ λ
cn+1 (S )
λ∈Λn
r n−2 X
r
X r X r i λ
= + µ[
cn+1 (S )
i=1
i i |λ̄|
|λ̄|=1 i=|λ̄|
r r Xi
X r X r i λ
= + µ[
cn+1 (S )
i=1
i i=1
i |λ̄|
|λ̄|=1
r r X i (4.3.6)
X r X r i
∼ + (−e−2c )|λ̄|
i=1
i i=1 |λ̄|=1
i |λ̄|
r i
X r X i
= 1+ (−e−2c )|λ̄|
i=1
i |λ̄|
|λ̄|=1
r
X r
= (1 − e−2c )i .
i=1
i
This shows that the first n − 2 moments of χ% with respect to µcn+1 approach those
of P(1 − e−2c ), and once again convergence follows from Theorem 4.3.2.
Corollary 4.3.5. For any c > 0, after one n-cycle and cn transpositions,
e−2c
kµcn+1 − Ucn+1 kTV ≥ − o(1) (4.3.7)
e
as n goes to infinity.
as was to be shown.
48
Remark. Together with Theorem 3.1.1, we have that
e−2c e−2c
− o(1) ≤ kµcn+1 − Ucn+1 kTV ≤ √ + o(1) (4.3.9)
e 2 1 − e−4c
as n goes to infinity. The gap is especially respectable if e−4c is small. Also, recall
e−4c
kµcn+1 − Ucn+1 kTV ≤ √ + o(1), (4.3.10)
2 1 − e−4c
which is smaller than even the lower bound for the m = n case so long as c is
at least approximately 0.262. Hence we can reasonably say that starting with an
(n − 1)-cycle is indeed more beneficial for mixing than starting with an n-cycle.
49
Chapter 5
Further Considerations
In this section we use S (n−1,1) to derive two more results about expected numbers of
fixed points. Recall that the character of S (n−1,1) at σ is one less than the number
of fixed points of σ.
whose increment distributions are class measures: if a chain starts with one fixed
according to any class measure supported on the set of permutations with one fixed
50
point. For k ≥ 2, set Xk = τk Xk−1 , where τk is selected according to any class
measure on Sn (the measure can be different for each k). Then the expected number
Proof. Let ν1 be a class measure supported on the set of permutations with one
µk (S (n−1,1) )]
Eµk (χS (n−1,1) ) = tr[c
(5.1.1)
= tr[νb1 (S (n−1,1) )νb2 (S (n−1,1) ) · · · νbk (S (n−1,1) )],
where
!
1 X
νb1 (S (n−1,1) ) = ν1 (σ)χS (n−1,1) (σ) In−1 (5.1.2)
n − 1 σ∈S
n
by Lemma 2.2.13. Consider the anatomy of the partition (n − 1, 1): under the
Murnaghan-Nakayama rule, the only way for a single box to remain at the end is
for the box in the second row to have been removed as a singleton, which requires
a cycle type with at least two fixed points. This means that χS (n−1,1) (σ) = 0 if σ
has exactly one fixed point. On the other hand, if σ does not have exactly one
fixed point, then ν1 (σ) = 0. Thus νb1 (S (n−1,1) ) = 0, which in turn implies that
Eµk (χS (n−1,1) ) = 0, and hence the expected number of fixed points with respect to
expected number of fixed points for the general case where m is defined by an
51
arbitrary function m(n) of n.
Proof. The m(n) = n (Theorem 4.3.4) and m(n) = n−1 (special case of Proposition
(n−1,1)
χ(m(n),1n−m(n) ) = n − m(n) − 1, (5.1.4)
so by (4.2.5),
(n−1,1) !k
(n−1,1)
χ(2,1n−2 )
Eµk (χS (n−1,1) ) = χ(m(n),1n−m(n) )
dim S (n−1,1)
(5.1.5)
k
n−3
= (n − m(n) − 1) .
n−1
n
Setting k = cn + 2
ln(n − m(n) − 1), we have that
cn+ n2 ln(n−m(n)−1)
n−3
lim Eµk (χS (n−1,1) ) = lim (n − m(n) − 1)
n→∞ n→∞ n−1
(5.1.6)
=(n − m(n) − 1)e−2c e− ln(n−m(n)−1) = e−2c ,
52
5.2 Open questions
Question (1). What is the lower bound for the mixing time of the m = n − 1 case
case, the distribution of the number of fixed points is not quite Poisson. Indeed,
after an initial (n − 1)-cycle, the expected number of fixed points is always one.
On the other hand, we can compute from either Corollary 4.3.5 or, as a more fun
which along with Proposition 3.2.2 implies that the variance of the number of fixed
mean does not match the variance, the distribution is not Poisson. Nevertheless, it
must be very close to Poisson, and one may be able to compute a few more moments
and use brute force to bound the mass of fixed-point-free permutations, which in
In general, when finding a lower bound for the mixing time of a Markov chain, the
53
the other hand, we got lucky with Theorems 4.3.3 and 4.3.4 in the sense that we
moments do not match up with those of any well-known distributions, there is the
additional task of extracting information about the distribution from its moments.8
(excluding n and n − 1), is n ln(n − m(n) − 1) the correct order of mixing time?
consider that O(n ln(n − m(n) − 1)) is the right mixing time for m(n) = 2, i.e.
the random transposition shuffle. Proving a general upper bound with only the
techniques from this thesis is likely difficult, but the order of a lower bound may be
the variance of the number of fixed points for arbitrary m(n) like Theorem 5.1.2 did
for the expected value. If we have both the first and the second moments, then we
may be able to derive the order of a lower bound using Chebyshev’s inequality or
statistics.
Question (3). For a Markov chain on Sn whose increment distributions are class
measures, what conditions are sufficient for its fixed point distribution to be asymp-
8
The classical moment problem is oft-studied, but predominantly for determinancy conditions,
and most of the work on reconstruction has been for continuous distributions. See the introduction
of [MH09] for a survey of results.
54
A necessary condition appears to be that the initial step does not create exactly
one fixed point. Is it also sufficient? By simply playing around with (5.2.1), we may
be able to identify a class of Markov chains whose fixed point distributions have
asymptotically the same mean and variance, which would be a small step toward
Question (4). What is the contribution, if any, of Theorem 4.2.3 and Corollary
In particular, can Theorem 4.2.3 and Corollary 4.2.4 shed any insight on the
the multiplicities in the tensor product decomposition of two irreps; see [BI08] for
tions of higher tensor powers are related to plethysms of symmetric functions, and
Macdonald polynomials. See Appendix 2 and Exercise 7.74 following the Chapter 7
of [Sta99] for an introduction to plethysms and [LR11] for their connections to the
Macdonald polynomials, connections that delve into some of the deepest and most
We have now ventured into a field of intricately connected ideas with much
potential for further exploration. Any of these topics is sure to lead us down a
wondrous rabbit hole. However, to quote Dostoevsky, that might be the subject of
55
Bibliography
[AD86] D. J. Aldous and P. Diaconis, Shuffling cards and stopping times, Amer.
[BD92] D. Bayer and P. Diaconis, Trailing the dovetail shuffle to its lair, Ann.
56
[Beh00] E. Behrends, Introduction to Markov Chains (with Special Emphasis on
[Bil95] P. Billingsley, Probability and Measure, 3rd ed., Wiley, New York, 1995.
Chains, Cambridge Stud. Adv. Math. 108, Cambridge Univ. Press, Cam-
bridge, 2008.
[CH10] M. A. Conger and J. A. Howald, A better way to deal the cards, Amer.
Lecture Notes Monogr. Ser. 11, Inst. Math. Statist., Hayward, CA, 1988.
57
[DF09] P. Diaconis and J. Fulman, Carries, shuffling, and symmetric functions,
725-750.
[Ful00] J. Fulman, Semisimple orbits of Lie algebras and card shuffling measures
96-122.
58
[FH91] W. Fulton and J. Harris, Representation Theory: A First Course, GTM
Berlin/Heidelberg, 2012.
[HLW06] S. Hoory, N. Linial, and A. Wigderson, Expander graphs and their appli-
59
[Jon06] J. Jonasson, The overhand shuffle mixes in Θ(n2 ln n) steps, Ann. Appl.
ogy – CRYPTO 2002 (M. Yung, ed.), LNCS 2442, 304-319, Springer,
Berlin/Heidelberg, 2002.
moments, IMS Lecture Notes Monogr. Ser. 57, 252-265, Inst. Math.
60
[MNP12] B. Morris, W. Ning, and Y. Peres, Mixing time of the card-cyclic-to-
59 (1937), 739-753.
411-423.
[Pem94] R. Pemantle, A shuffle that mixes sets of any fixed size much faster than
it mixes the whole deck, Rand. Struct. Alg. 5 (1994), no. 5, 609-625.
1912.
[Rio37] J. Riordan, Moment recurrence relations for binomial, poisson and hy-
103-111.
61
[Sag01] B. E. Sagan, The Symmetric Group: Representations, Combinatorial Al-
crete Structures (H. Kesten, ed.), Encyclopaedia Math. Sci. 110, 263-346,
CA, 1986, Cambridge Stud. Adv. Math. 49, reprinted by Cambridge Univ.
62
[Tho73] E. O. Thorp, Nonrandom shuffling with applications to the game of Faro,
[Wil04] D. B. Wilson, Mixing times of Lozenge tiling and card shuffling Markov
63