Second Moment
Second Moment
4 Second Moment
Question 4.1.1
For which 𝑝 = 𝑝 𝑛 does 𝐺 (𝑛, 𝑝) contain a triangle with probability 1 − 𝑜(1)?
Proposition 4.1.2
If 𝑛𝑝 → 0, then 𝐺 (𝑛, 𝑝) is triangle-free with probability 1 − 𝑜(1).
Proof. Let 𝑋 be the number of triangles in 𝐺 (𝑛, 𝑝). We know from linearity of
expectations that
𝑛 3
E𝑋 = 𝑝 ≍ 𝑛3 𝑝 3 = 𝑜(1).
3
Thus, by Markov’s inequality,
P(𝑋 ≥ 1) ≤ E𝑋 = 𝑜(1).
In other words, when 𝑝 ≪ 1/𝑛, 𝐺 (𝑛, 𝑝) is triangle-free with high probaiblity (recall
that 𝑝 ≪ 1/𝑛 means 𝑝 = 𝑜(1/𝑛); see asymptotic notation guide at the beginning of
these notes).
What about when 𝑝 ≫ 1/𝑛? Can we conclude that 𝐺 (𝑛, 𝑝) contains a triangle with
high probability? In this case E𝑋 → ∞, but this is not enough to conclude that
37
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
P(𝑋 ≥ 1) = 1 − 𝑜(1), since we have not ruled out the probability that 𝑋 is almost
always zero but extremely large with some tiny probability.
An important technique in probabilistic combinatorics is to show that some random
variable is concentrated around its mean. This would then imply that outliers are
unlikely.
We will see many methods in this course on proving concentrations of random vari-
ables. In this chapter, we begin with the simplest method. It is usually easiest to
execute and it requires not much hypotheses. The downside is that it only produces
relatively weak (though still useful enough) concentration bounds.
Second moment method: show that a random variable is concentrated near its mean
by bounding its variance.
(Exercise: check the second equality in the definitions of variance and covariance
above).
Remark 4.1.4 (Notation convention). It is standard to use the Greek letter 𝜇 for the
mean, and 𝜎 2 for the variance. Here 𝜎 ≥ 0 is the standard deviation.
The following basic result provides a concentration bound based on the variance.
E[(𝑋 − 𝜇) 2 ] 1
𝐿𝐻𝑆 = P(|𝑋 − 𝜇| 2 ≥ 𝜆2 𝜎 2 ) ≤ 2 2
= 2. □
𝜆 𝜎 𝜆
Remark 4.1.6. Concentration bounds that show small probability of deviating from
the mean are called tail bounds (more precisely: upper tail for 𝑋 ≥ 𝜇 + 𝑎 and lower tail
38
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
for P(𝑋 ≤ 𝜇 − 𝑎)). Chebyshev’s inequality gives tail bounds that decays quadratically.
Later on we will see tools that give much better decay (usually exponential) provided
additional assumptions on the random variable (e.g., independence).
We are often interested in upper bounding the probability of non-existence, i.e., P(𝑋 =
0). Chebyshev’s inequality yields the following bound.
Var 𝑋
P(𝑋 = 0) ≤ .
(E𝑋) 2
Var 𝑋
P(𝑋 = 0) ≤ P(|𝑋 − 𝜇| ≥ |𝜇|) ≤ . □
𝜇2
Corollary 4.1.8
If E𝑋 > 0 and Var 𝑋 = 𝑜(E𝑋) 2 , then 𝑋 > 0 and 𝑋 ∼ E𝑋 with probability 1 − 𝑜(1).
In many situations, it is not too hard to compute the second moment. We have
Var[𝑋] = Cov[𝑋, 𝑋]. Also, covariance is bilinear, i.e., for random variables 𝑋1 , . . .
and 𝑌1 , . . . (no assumptions needed on their independence, etc.) and constants 𝑎 1 , . . .
and 𝑏 1 , . . . , one has
" #
∑︁ ∑︁ ∑︁
Cov 𝑎𝑖 𝑋𝑖 , 𝑏 𝑗𝑌𝑗 = 𝑎𝑖 𝑏 𝑗 Cov[𝑋𝑖 , 𝑌 𝑗 ].
𝑖 𝑗 𝑖, 𝑗
We are often dealing with 𝑋 being the cardinality of some random set. We can usually
write this as a sum of indicator functions, such as 𝑋 = 𝑋1 + · · · + 𝑋𝑛 , so that
𝑛
∑︁ 𝑛
∑︁ ∑︁
Var 𝑋 = Cov[𝑋, 𝑋] = Cov[𝑋𝑖 , 𝑋 𝑗 ] = Var 𝑋𝑖 + 2 Cov[𝑋𝑖 , 𝑋 𝑗 ].
𝑖, 𝑗=1 𝑖=1 𝑖< 𝑗
39
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
We have Cov[𝑋, 𝑌 ] = 0 if 𝑋 and 𝑌 are independent. Thus in the sum we only need to
consider dependent pairs (𝑖, 𝑗).
Let us now return to the problem of determining when 𝐺 (𝑛, 𝑝) contains a triangle
whp.
Theorem 4.1.11
If 𝑛𝑝 → ∞, then 𝐺 (𝑛, 𝑝) contains a triangle with probability 1 − 𝑜(1).
Proof. Label the vertices by [𝑛]. Let 𝑋𝑖 𝑗 be the indicator random variable of the edge
𝑖 𝑗, so that 𝑋𝑖 𝑗 = 1 if the edge is present, and 𝑋𝑖 𝑗 = 0 if the edge is not present in the
random graph. Let us write
𝑋𝑖 𝑗 𝑘 := 𝑋𝑖 𝑗 𝑋𝑖𝑘 𝑋 𝑗 𝑘 .
Then the number of triangles in 𝐺 (𝑛, 𝑝) is given by
∑︁
𝑋= 𝑋𝑖 𝑗 𝑋𝑖𝑘 𝑋 𝑗 𝑘 .
𝑖< 𝑗 <𝑘
Now we compute Var 𝑋. Note that the summands of 𝑋 are not all independent.
If 𝑇1 and 𝑇2 are each 3-vertex subsets, then
Cov[𝑋𝑇1 , 𝑋𝑇2 ] = E[𝑋𝑇1 𝑋𝑇2 ] − E[𝑋𝑇1 ]E[𝑋𝑇2 ] = 𝑝 𝑒(𝑇1 ∪𝑇2 ) − 𝑝 𝑒(𝑇1 )+𝑒(𝑇2 )
0 if |𝑇1 ∩ 𝑇2 | ≤ 1,
= 𝑝 5 − 𝑝 6 if |𝑇1 ∩ 𝑇2 | = 2,
𝑝 3 − 𝑝 6 if 𝑇1 = 𝑇2 .
40
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
The number of pairs (𝑇1 , 𝑇2 ) of triangles sharing exactly on edge is 𝑂 (𝑛4 ). Thus
∑︁
Var 𝑋 = Cov[𝑋𝑇1 , 𝑋𝑇2 ] = 𝑂 (𝑛3 )( 𝑝 3 − 𝑝 6 ) + 𝑂 (𝑛4 )( 𝑝 5 − 𝑝 6 )
𝑇1 ,𝑇2
≲ 𝑛3 𝑝 3 + 𝑛4 𝑝 5 = 𝑜(𝑛6 𝑝 6 ) as 𝑛𝑝 → ∞.
We say that 1/𝑛 is a threshold for containing a triangle, in the sense that if 𝑝 grows
asymptotically faster than this threshold, i.e., 𝑝 ≫ 1/𝑛, then the event occurs with
probability 1 − 𝑜(1), while if 𝑝 ≪ 1/𝑛, then the event occurs with probability 𝑜(1).
Note that the definition of a threshold ignores leading constant factors (so that it is also
correct to say that 2/𝑛 is also a threshold for containing a triangle). Determining the
thresholds of various properties in random graphs (as well as other random settings)
is a central topic in probabilistic combinatorics. We will discuss thresholds in more
depth later in this chapter.
What else might you want to know about the probability that 𝐺 (𝑛, 𝑝) contains a
triangle?
Remark 4.1.12 (Poisson limit). What if 𝑛𝑝 → 𝑐 > 0 for some constant 𝑐 > 0? It
turns out in this case that the number of triangles of 𝐺 (𝑛, 𝑝) approaches a Poisson
distribution with constant mean. You will show this in the homework. It will be
done via the method of moments: if 𝑍 is some random variable with sufficiently
nice properties (known as “determined by moments”, which holds for many common
distributions such as the Poisson distribution and the normal distribution), and 𝑋𝑛 is
a sequence of random variables such that E𝑋𝑛𝑘 → E𝑍 𝑘 for all nonnegative integers 𝑘,
then 𝑋𝑛 converges in distribution to 𝑍.
41
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
In the homework, you will prove the asymptotic normality of 𝑋 using a later-found
method of projections. The idea is to show that that 𝑋 close to another random variable
that is already known to be asymptotically normal by checking that their difference
has negligible variance. For triangle counts, when 𝑝 ≫ 𝑛−1/2 , we can compare the
number of triangles to the number of edges after a normalization. The method can be
further modified for greater generality. See the §6.4 in the book Random Graphs by
Janson, Łuczak, and Rucinski (2000).
Remark 4.1.14 (Better tail bounds). Later on we will use more powerful tools (in-
cluding martingale methods and Azuma-Hoeffding inequalities, and also Janson in-
equalities) to prove better tail bounds on triangle (and other subgraph) counts.
Question 4.2.1
What is the threshold for containing a fixed 𝐻 as a subgraph?
42
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
Remark 4.2.3. (a) For many applications with an underlying symmetry between
the events, the sum in the definition of Δ∗ does not actually depend on 𝑖.
(b) In the definition of the dependency graph (𝑖 ∼ 𝑗) above, we are only considering
pairwise dependence. Later on when we study the Lovász Local Lemma, we
will need a strong notion of a dependency graph.
(c) This method is appropriate for a collection of events with few dependencies. It
is not appropriate for where there are many weak dependencies (e.g., Section 4.5
on the Hardy–Ramanujan theorem on the number of distinct prime divisors).
(Here 𝐴𝑖 𝐴 𝑗 is the shorthand for 𝐴𝑖 ∧ 𝐴 𝑗 , meaning that both events occur.) Also
Cov[𝑋𝑖 , 𝑋 𝑗 ] = 0 if 𝑖 ≠ 𝑗 and 𝑖 ≁ 𝑗 .
Thus
𝑚
∑︁ 𝑚
∑︁ 𝑚
∑︁ ∑︁
Var 𝑋 = Cov[𝑋𝑖 , 𝑋 𝑗 ] ≤ P( 𝐴𝑖 ) + P( 𝐴𝑖 ) P( 𝐴 𝑗 | 𝐴𝑖 )
𝑖, 𝑗=1 𝑖=1 𝑖=1 𝑗: 𝑗∼𝑖
≤ E𝑋 + (E𝑋)Δ∗ .
Recall from Corollary 4.1.8 that E𝑋 > 0 and Var 𝑋 = 𝑜(E𝑋) 2 imply 𝑋 > 0 and
𝑋 ∼ E𝑋 whp. So we have the following.
Lemma 4.2.4
In the above setup, if E𝑋 → ∞ and Δ∗ = 𝑜(E𝑋), then 𝑋 > 0 and 𝑋 ∼ E𝑋 whp.
Theorem 4.2.5
The threshold for containing 𝐾4 is 𝑛−2/3 .
43
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
Now suppose 𝑝 ≫ 𝑛−2/3 , so E𝑋 → ∞. For each 4-vertex subset 𝑆, let 𝐴𝑆 be the event
that 𝑆 is a clique in 𝐺 (𝑛, 𝑝).
For each fixed 𝑆, one has 𝐴𝑆 ∼ 𝐴𝑆′ if and only if |𝑆 ∩ 𝑆′ | ≥ 2.
• The number of 𝑆′ that share exactly 2 vertices with 𝑆 is 6 𝑛−2 2
2 = 𝑂 (𝑛 ), and for
each such 𝑆′ one has P( 𝐴𝑆′ | 𝐴𝑆 ) = 𝑝 5 (as there are 5 additional edges not in the
𝑆-clique that need to appear clique to form the 𝑆′-clique).
• The number of 𝑆′ that share exactly 3 vertices with 𝑆 is 4(𝑛 − 4) = 𝑂 (𝑛), and
for each such 𝑆′ one has P( 𝐴𝑆′ | 𝐴𝑆 ) = 𝑝 3 .
Summing over all above 𝑆′, we find
∑︁
Δ∗ = P( 𝐴𝑆′ | 𝐴𝑆 ) ≲ 𝑛2 𝑝 5 + 𝑛𝑝 3 ≪ 𝑛4 𝑝 6 ≍ E𝑋.
𝑆 ′ :|𝑆 ′ ∩𝑆|∈{2,3}
For both 𝐾3 and 𝐾4 , we saw that any choice of 𝑝 = 𝑝 𝑛 with E𝑋 → ∞ one has 𝑋 > 0
whp. Is this generally true?
Thus the threshold for 𝐻 = is actually 𝑛−2/3 , and not 𝑛−5/7 as one might have
naively predicted from the first moment alone.
Why didn’t E𝑋𝐻 → ∞ give 𝑋𝐻 > 0 whp in our proof strategy? In the calculation
of Δ∗ , one of the terms is ≍ 𝑛𝑝 (from two copies of 𝐻 with a 𝐾4 -overlap), and
𝑛𝑝 3 𝑛5 𝑝 7 ≍ E𝑋𝐻 if 𝑝 ≪ 𝑛−2/3 .
The above example shows that the threshold is not always necessarily determined by
the expectation. For the property of containing 𝐻, the example suggests that we should
look at the “densest” subgraph of 𝐻 rather than containing 𝐻 itself.
44
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
Definition 4.2.7
Define the edge-vertex ratio of a graph 𝐻 by
𝑒𝐻
𝝆(𝑯) := .
𝑣𝐻
𝒎(𝑯) := max
′
𝜌(𝐻 ′).
𝐻 ⊆𝐻
Example 4.2.8. Let 𝐻 = . We have 𝜌(𝐻) = 7/5 whereas 𝜌(𝐾4 ) = 3/2 > 7/5.
It is not hard to check that 𝑚(𝐻) = 𝜌(𝐾4 ) = 3/2 as 𝐾4 is the subgraph of 𝐻 with the
maximum edge-vertex ratio.
Remark 4.2.9 (Algorithm). Goldberg (1984) found a polynomial time algorithm for
computing 𝑚(𝐻) via network flow algorithms.
The next theorem completes determines the threshold for containing some fixed graph
𝐻. Basically, it is determined by the expected number of copies of 𝐻 ′, where 𝐻 ′ is the
“denest” subgraph of 𝐻 (i.e., with the maximum edge-vertex ratio).
45
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
since
′ ′ ′
𝑝 ≫ 𝑛−1/𝑚(𝐻) ≥ 𝑛−1/𝜌(𝐽∩𝐽 ) = 𝑛−|𝑉 (𝐽)∩𝑉 (𝐽 )|/|𝐸 (𝐽)∩𝐸 (𝐽 )| .
It then follows, after considering all the possible ways that 𝐽 ′ can overlap with 𝐽, that
Δ∗ ≪ 𝑛 |𝑉 (𝐽)| 𝑝 |𝐸 (𝐽)| ≍ E𝑋𝐻 . So Lemma 4.2.4 yields the result. □
Remark 4.2.11. The proof also gives that if 𝑝 ≫ 𝑛−1/𝑚(𝐻) , then the number 𝑋𝐻 of
copies of 𝐻 is concentrated near its mean, i.e., with probability 1 − 𝑜(1),
𝑛 𝑣 𝐻 ! 𝑒 𝐻 𝑛𝑣 𝐻 𝑝 𝑒 𝐻
𝑋𝐻 ∼ E𝑋𝐻 = 𝑝 ∼ .
𝑣 𝐻 aut(𝐻) aut(𝐻)
4.3 Thresholds
Previously, we computed the threshold for containing a fixed 𝐻 as a subgraph. In
this section, we take a detour from the discussion of the second moment method and
discuss thresholds in more detail.
We begin by discussing the concept more abstractly by first defining the threshold of
any monotone property on subsets. Then we show that thresholds always exist.
Thresholds form a central topic in probabilistic combinatorics. For any given property,
it is natural to ask the following questions:
1. Where is the threshold?
2. Is the transition sharp? (And more precisely, what is width of the transition
window?)
We understand thresholds well for many basic graph properties, but for many others,
it can be a difficult problem. Also, one might think that one must first understand
the location of the threshold before determining the nature of the phase transition, but
surprisingly this is actually not always the case. There are powerful results that can
sometimes show a sharp threshold without identifying the location of the threshold.
46
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4.3 Thresholds
Here are some examples of increasing properties for subgraphs of a given set of
vertices:
• Contains some given subgraph
• Connected
• Has perfect matching
• Hamiltonian
• non-3-colorable
A family F ⊆ P (Ω) of subsets of Ω is called an up-set if whenever 𝐴 ∈ F and 𝐴 ⊆ 𝐵,
then 𝐵 ∈ F . Increasing property is the same as being an element of an up-set. We
will use these two terms interchangeably.
Remark 4.3.2. The above definition is only for increasing properties. We can similarly
define the threshold for decreasing properties by an obvious modification. An example
of a non-monotone property is sontaining some 𝐻 as an induced subgraph. Some (but
not all) non-monotone properties also have thresholds, though we will not discuss it
here.
Remark 4.3.3. From the definition, we see that if 𝑟 𝑛 and 𝑟 𝑛′ are both thresholds of
the same property, then they must be within a constant factor of each other (exercise:
check this). Thus it makes sense to say “the threshold” rather than “a threshold.”
Existence of threshold
47
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
How would a monotone property not have a threshold? Perhaps one could have
P(Ω1/𝑛 ∈ F ) and P(Ω (log 𝑛)/𝑛 ∈ F ) ∈ [1/10, 9/10] for all sufficiently large 𝑛?
Before answer this question, let us consider an even more elementary claim.
Let us give two related proofs of this basic fact. Both are quite instructive. Both are
based on coupling of random processes.
Proof 1. Let 0 ≤ 𝑝 < 𝑞 ≤ 1. Consider the following process to generate two random
subsets of Ω. For each 𝑥, generate uniform 𝑡𝑥 ∈ [0, 1] independently at random. Let
𝐴 = {𝑥 ∈ Ω : 𝑡𝑥 ≤ 𝑝} and 𝐵 = {𝑥 ∈ Ω : 𝑡𝑥 ≤ 𝑞} .
Then 𝐴 has the same distribution as Ω 𝑝 and 𝐵 has the same distribution as Ω𝑞 .
Furthermore, since 𝑝 < 𝑞, we always have 𝐴 ⊆ 𝐵. Since F is monotone, 𝐴 ∈ F
implies 𝐵 ∈ F . Thus
To see that the inequality strict, we simply have to observe that with positive probability,
one has 𝐴 ∉ F and 𝐵 ∈ F (e.g., if all 𝑡 𝑥 ∈ ( 𝑝, 𝑞], then 𝐴 = ∅ and 𝐵 = Ω). □
In the second proof, the idea is to reveal a random subset of Ω in independent random
stages.
Proof 2. (By two-round exposure) Let 0 ≤ 𝑝 < 𝑞 ≤ 1. Note that 𝐵 = Ω𝑞 has the same
distribution as the union of two independent 𝐴 = Ω 𝑝 and 𝐴′ = Ω 𝑝 ′ , where 𝑝′ is chosen
to satisfy 1 − 𝑞 = (1 − 𝑝) (1 − 𝑝′) (check that the probability that each element occurs
is the same in the two processes). Thus
P( 𝐴 ∈ F ) ≤ P( 𝐴 ∪ 𝐴′ ∈ F ) = P(𝐵 ∈ F ).
Like earlier, to observe that the inequality is strict, one observes that with positive
probability, one has 𝐴 ∉ F and 𝐴 ∪ 𝐴′ ∈ F . □
The above technique (generalized from two round exposure to multiple round ex-
posures) gives a nice proof of the following theorem (originally proved using the
Kruskal–Katona theorem).1
1 (Thresholds for random subspaces of F𝑞𝑛 ) The proof of the Bollobás–Thomason paper using the
48
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4.3 Thresholds
Proof. Consider 𝑚 independent copies of Ω 𝑝/𝑚 , and let 𝑌 be their union. Since F is
monotone increasing, if 𝑌 ∉ F , then none of the 𝑚 copies lie in F . Hence
Kruskal–Katona theorem is still relevant. For example, there is an interesting analog of this concept
for properties of subspaces of F𝑞𝑛 , i.e., random linear codes instead of random graphs. The analogue
of the Bollobás–Thomason theorem was proved by Rossman (2020) via the the Kruskal–Katona
approach. The multiple round exposure proof does not seem to work in the random subspace setting,
as one cannot write a subspace as a union of independent copies of smaller subspaces.
As an aside, I disagree with the use of the term “sharp threshold” in Rossman’s paper for describing
all thresholds for subspaces—one really should be looking at the cardinality of the subspaces rather
than their dimensions. In a related work by Guruswami, Mosheiff, Resch, Silas, and Wootters
(2022), they determine thresholds for random linear codes for properties that seem to be analogous
to the property that a random graph contains a given fixed subgraph. Here again I disagree with
them calling it a “sharp threshold.” It is much more like a coarse threshold once you parameterize
by the cardinality of the subspace, which gives you a much better analogy to the random graph
setting.
Thresholds for random linear codes seems to an interesting topic that has only recently been
studied. I think there is more to be done here.
49
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
(here we write Ω𝑡 = Ω if 𝑡 > 1.) Indeed, applying Lemma 4.3.7 with 𝑝 = 𝑝 𝑐 , we have
Examples
We will primarily be studying monotone graph properties. In the previous notation,
Ω = [𝑛]
2 , and we are only considering properties that depend on the isomorphism
class of the graph.
Example 4.3.8 (Containing a triangle). We saw earlier in the chapter that the threshold
for containing a triangle is 1/𝑛:
0 if 𝑛𝑝 → 0,
3
P(𝐺 (𝑛, 𝑝) contains a triangle) → 1 − 𝑒 −𝑐 /6 if 𝑛𝑝 → 𝑐 ∈ (0, ∞)
1
if 𝑛𝑝 → ∞.
In this case, the threshold is determined by the expected number of triangles Θ(𝑛3 𝑝 3 ),
and whether this quantity goes to zero or infinity (in the latter case, we used a second
moment method to show that the number of triangles is positive with high probability).
What if 𝑝 = Θ(1/𝑛)? If 𝑛𝑝 → 𝑐 for some constant 𝑐 > 0, then (you will show in the
homework via the method of moments) that the number of triangles is asymptotically
Poisson distributed, and in particular the limit probability of containing a triangle
increases from 0 to 1 as a continuous function of 𝑐 ∈ (0, ∞). So, in particular, as
𝑝 increases, it goes through a “window of transition” of width Θ(1/𝑛) in order for
P(𝐺 (𝑛, 𝑝) contains a triangle) to increase from 0.01 to 0.99. The width of this window
is on the same order as the threshold. In this case, we call it a coarse transition.
Example 4.3.9 (Containing a subgraph). Theorem 4.2.10 tells us that the threshold
for containing a fixed subgraph 𝐻 is 𝑛−1/𝑚(𝐻) . Here the threshold is not always
determined by the expected number of copies of 𝐻. Instead, we need to look at
the “densest subgraph” 𝐻 ′ ⊆ 𝐻 with the largest edge-vertex ratio (i.e., equivalent to
largest average degree). The threshold is determined by whether the expected number
of copies of 𝐻 ′ goes to zero or infinity.
Similar to the triangle case, we have a coarse threshold.
50
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4.3 Thresholds
The analysis can also be generalized to containing one of several fixed subgraphs
𝐻1 , . . . , 𝐻 𝑘 .
The original Erdős–Rényi (1959) paper on random graphs already studied several
thresholds, such as the next two examples.
log 𝑛 + 𝑐 𝑛
Example 4.3.11 (No isolated vertices). With 𝑝 = ,
𝑛
0 if 𝑐 𝑛 → −∞
−𝑐
P (𝐺 (𝑛, 𝑝) has no isolated vertices) → 1 − 𝑒 𝑒 if 𝑐 𝑛 → 𝑐
1
if 𝑐 𝑛 → ∞
It is a good exercise (and included in the problem set) to check the above claims. The
cases 𝑐 𝑛 → −∞ and 𝑐 𝑛 → ∞ can be shown using the second moment method. More
precisely, when 𝑐 𝑛 → 𝑐, by comparing moments one can show that the number of
isolated vertices is asymptotically Poisson.
In this example, the threshold is at (log 𝑛)/𝑛. As we see above, the transition window
is Θ(1/𝑛), much narrower the magnitude of the threshold. In particular, the event
probability goes from 𝑜(1) to 1 − 𝑜(1) when 𝑝 increases from (1 − 𝛿)(log 𝑛)/𝑛 to
(1 + 𝛿)(log 𝑛)/𝑛 for any fixed 𝛿 > 0. In this case, we say that the property has a sharp
threshold at (log 𝑛)/𝑛 (here the leading constant factor is relevant, unlike the earlier
example of a coarse threshold).
log 𝑛 + 𝑐 𝑛
Example 4.3.12 (Connectivity). With 𝑝 =
𝑛
0 if 𝑐 𝑛 → −∞
−𝑐
P (𝐺 (𝑛, 𝑝) is connected) → 1 − 𝑒 𝑒 if 𝑐 𝑛 → 𝑐
1
if 𝑐 𝑛 → ∞
In fact, a much stronger statement is true, connecting the above two examples: consider
a process where one adds a random edges one at a time, then with probability 1 − 𝑜(1),
51
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
the graph becomes connected as soon as there are no more isolated vertices. Such
stronger characterization is called a hitting time result.
A similar statement is true if we replace “is connected” by “has a perfect matching”
(assuming 𝑛 even).
Sharp transition
In some of the examples, the probability that 𝐺 (𝑛, 𝑝) satisfies the property changes
quickly and dramatically as 𝑝 crosses the threshold (physical analogy: similar to how
the structure of water changes dramatically as the temperature drops below freezing).
For example, while for connectivity, while 𝑝 = log 𝑛/𝑛 is a threshold, we see that
𝐺 (𝑛, 0.99 log 𝑛/𝑛) is whp not connected and 𝐺 (𝑛, 1.01 log 𝑛/𝑛) is whp connected,
unlike the situation for containing a triangle earlier. We call this the sharp threshold
phenomenon.
52
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4.3 Thresholds
o
Pt P
o S Igoe P
Dpet
Figure 4.1: Examples of coarse and sharp thresholds. The vertical axis is the
probability that 𝐺 (𝑛, 𝑝) satisfies the property.
53
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
We need “close to” since the property could be “contains a triangle and has at least
log 𝑛 edges”, which is not exactly local but it is basically the same as “contains a
triangle.”
There is some subtlety here since we can allow very different properties depending on
the value of 𝑛. E.g., P could be the set of all 𝑛-vertex graphs that contain a 𝐾3 if 𝑛 is
odd and 𝐾4 if 𝑛 is even. Friedgut’s theorem tells us that if there is a threshold, then
there is a partition N = N1 ∪ · · · ∪ N 𝑘 such that on each N𝑖 , P is approximately the
form described in the previous paragraph.
In the last section, we derived that the property of containing some fixed 𝐻 has
threshold 𝑛−1/𝑚(𝐻) for some rational number 𝑚(𝐻). It follows as a corollary of
Friedgut’s theorem that every coarse threshold must have this form.
In particular, if (log 𝑛)/𝑛 is a threshold of some monotone graph property (e.g., this is
the case for connectivity), then we automatically know that it must be a sharp threshold,
even without knowing anything else about the property. Likewise if the threshold has
the form 𝑛−𝛼 for some irrational 𝛼.
The exact statement of Friedgut’s theorem is more cumbersome. We refer those who
are interested to Friedgut’s original 1999 paper and his later survey for details and
applications. This topic is connected more generally to an area known as the analysis
of boolean functions.
Also, it is known that the transition window of every monotone graph property is
(log 𝑛) −2+𝑜(1) (Friedgut––Kalai (1996), Bourgain–Kalai (1997)).
Curiously, tools such as Friedgut’s theorem sometimes allow us to prove the existence
of a sharp threshold without being able to identify its exact location. For example, it is
an important open problem to understand where exactly is the transition for a random
graph to be 𝑘-colorable.
54
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
On the other hand, it is not known whether lim𝑛→∞ 𝑑 𝑘 (𝑛) exists, which would imply
Conjecture 4.3.16. Further bounds on 𝑑 𝑘 (𝑛) are known, e.g. the landmark paper of
Achlioptas and Naor (2006) showing that for each fixed 𝑑 > 0, whp 𝜒(𝐺 (𝑛, 𝑑/𝑛) ∈
{𝑘 𝑑 , 𝑘 𝑑 + 1} where 𝑘 𝑑 = min{𝑘 ∈ N : 2𝑘 log 𝑘 > 𝑑}. Also see the later work of
Coja-Oghlan and Vilenchik (2013).
Question 4.4.1
What is the clique number of 𝐺 (𝑛, 1/2)?
Let us first do a rough estimate to see what is the critical 𝑘 to get 𝑓 (𝑛, 𝑘) large or small.
𝑛 𝑘
𝑘
Recall that 𝑒𝑘 ≤ 𝑛𝑘 ≤ 𝑒𝑛𝑘 . We have
𝑘
log2 𝑓 (𝑛, 𝑘) = 𝑘 log2 𝑛 − log2 𝑘 − + 𝑂 (1) .
2
And so the transition point is at some 𝑘 ∼ 2 log2 𝑛 in the sense that if 𝑘 ≥ (2+ 𝛿) log2 𝑛,
then 𝑓 (𝑛, 𝑘) → 0 while if 𝑘 ≤ (2 − 𝛿) log2 𝑛, then 𝑓 (𝑛, 𝑘) → ∞.
The next result gives us a lower bound on the typical clique number.
55
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
Proof sketch. The first claim follows from Markov’s inequality as P(𝑋 ≥ 1) ≤ E𝑋.
For the second claim, we bound the variance using Setup 4.2.2. For each 𝑘-element
subset 𝑆 of vertices, let 𝐴𝑆 be the event that 𝑆 is a clique. Let 𝑋𝑆 be the indicator
random variable for 𝐴𝑆 . Recall
∑︁
Δ∗ := max
P 𝐴 𝑗 𝐴𝑖 .
𝑖
𝑗: 𝑗∼𝑖
It then follows from Lemma 4.2.4 that 𝑋 > 0 (i.e., 𝜔(𝐺) ≥ 𝑘) whp. □
We can a two-point concentration result for the clique number of 𝐺 (𝑛, 1/2). This
result is due to Bollobás–Erdős 1976 and Matula 1976.
𝑓 (𝑛, 𝑘 + 1) 𝑛 − 𝑘 −𝑘
= 2 = 𝑛−1+𝑜(1) .
𝑓 (𝑛, 𝑘) 𝑘 +1
Then 𝑓 (𝑛, 𝑘 0 − 1) → ∞ and 𝑓 (𝑛, 𝑘 0 + 1) = 𝑜(1). By Theorem 4.4.2, the clique number
of 𝐺 (𝑛, 1/2) is whp in {𝑘 0 − 1, 𝑘 0 }. □
Remark 4.4.4. By a more careful analysis, one can show that outside a very sparse
56
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
subset of integers, one has 𝑓 (𝑛, 𝑘 0 ) → ∞, in which case one has one-point concentra-
tion 𝜔(𝐺 (𝑛, 1/2)) = 𝑘 0 whp.
By taking the complement of the graph, we also get a two-point concentration result
about the independence number of 𝐺 (𝑛, 1/2). Bohman and Hofstad (2024) extended
the two-point concentration result for the independence number of 𝐺 (𝑛, 𝑝) to all
𝑝 ≥ 𝑛−2/3+𝜀 .
Remark 4.4.5 (Chromatic number). Since the chromatic number satisfies 𝜒(𝐺) ≥
𝑛/𝛼(𝐺), we have
𝑛
𝜒(𝐺 (𝑛, 1/2)) ≥ (1 + 𝑜(1)) whp.
2 log2 𝑛
In Theorem 8.3.2, using more advanced methods, we will prove 𝜒(𝐺 (𝑛, 1/2)) ∼
𝑛/(2 log2 𝑛) whp, a theorem due to Bollobás (1987).
In Section 9.3, using martingale concentration, we will show that 𝜒(𝐺 (𝑛, 𝑝)) is tightly
concentrated around its mean without a priori needing to know where the mean is
located.
The original proof of Hardy and Ramanujan was quite involved. Here we show a
“probabilistic” proof due to Turán (1934), which played a key role in the development
of probabilistic methods in number theory.
57
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
We have
𝜈(𝑥) − 10 ≤ 𝑋 (𝑥) ≤ 𝜈(𝑥)
since 𝑥 cannot have more than 10 prime factors > 𝑛1/10 . So it suffices to analyze 𝑋.
Since exactly ⌊𝑛/𝑝⌋ positive integers ≤ 𝑛 are divisible by 𝑝, we have
⌊𝑛/𝑝⌋ 1 1
E𝑋 𝑝 = = +𝑂 .
𝑛 𝑝 𝑛
We quote Merten’s theorem from analytic number theory:
∑︁
1/𝑝 = log log 𝑛 + 𝑂 (1).
𝑝≤𝑛
(A more precise result says that 𝑂 (1) error term converges to the Meissel–Mertens
constant.) So
∑︁ 1
1
E𝑋 = +𝑂 = log log 𝑛 + 𝑂 (1).
𝑝≤𝑀
𝑝 𝑛
Next we compute the variance. The intuition is that divisibility by distinct primes
should behave somewhat independently. Indeed, if 𝑝𝑞 divides 𝑛, then 𝑋 𝑝 and 𝑋𝑞 are
independent (e.g., by the Chinese Remainder Theorem). If 𝑝𝑞 does not divide 𝑛, but
𝑛 is large enough, then there is some small covariance contribution. (In contrast to the
earlier variance calculations in random graphs, here we have many weak dependices.)
If 𝑝 ≠ 𝑞, then 𝑋 𝑝 𝑋𝑞 = 1 if and only if 𝑝𝑞|𝑥. Thus
Thus
∑︁ 𝑀2
Cov[𝑋 𝑝 , 𝑋𝑞 ] ≲ ≲ 𝑛−4/5 .
𝑝≠𝑞
𝑛
58
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
Finally, recall that |𝑋 − 𝜈| ≤ 10. So the same asymptotic bound holds with 𝑋 replaced
by 𝜈. □
Í
The main idea from the above proof is that the number of prime divisors 𝑋 = 𝑝 𝑋 𝑝
(from the previous proof) behaves like a sum of independent random variables.
A sum of independent random variables often satisfy a central limit theorem (i.e.,
asymptotic normality, convergence to Gaussian), assuming some mild regularity hy-
potheses. In particular, we have the following corollary of the Lindenberg–Feller
central limit theorem (see Durrett, Theorem 3.4.10):
The original proof of Erdős and Kac verifies the above intuition using some more
involved results in analytic number theory. Simpler proofs have been subsequently
given, and we outline one such proof below, which is based on computing the moments
59
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
of the distribution. The idea of computing moments for this problem was first used by
Delange (1953), who was apparently not aware of the Erdős–Kacs paper. Also see a
more modern account by Granville and Soundararajan (2007).
The following tool from probability theory allows us to verify asymptotic normality
from convergence of moments.
Remark 4.5.5. The same conclusion holds for any probability distribution that is
“determined by its moments,” i.e., there are no other distributions sharing the same
moments. Many common distributions that arise in practice, e.g., the Poisson distri-
bution, satisfy this property. There are various sufficient conditions for guaranteeing
this moments property, e.g., Carleman’s condition tells us that any probability distri-
bution whose moments do not increase too quickly is determined by its moments. (See
Durrett §3.3.5).
𝜇 := E𝑌 ∼ E𝑋 ∼ log log 𝑛
and
𝜎 2 := Var 𝑌 ∼ Var 𝑋 ∼ log log 𝑛.
60
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
Comparing expansions of 𝑋
e𝑘 in terms of the 𝑋 𝑝 ’s (𝑛𝑜(1) terms), we get
It follows that 𝑋
e is asymptotically normal. □
Question 4.6.1
For each 𝑘, what is the smallest 𝑛 so that there exists 𝑆 ⊆ [𝑛] with |𝑆| = 𝑘 and all 2 𝑘
subset sums of 𝑆 are distinct?
Let us use the second moment to give a modest improvement on the earlier pigeonhole
argument. The main idea here is that, by second moment, most of the subset sums lie
within an 𝑂 (𝜎)-interval, so that we can improve on the pigeonhole estimate ignoring
outlier subset sums.
Theorem 4.6.3
√
If there is a 𝑘-element subset of [𝑛] with distinct subset sums. Then 𝑛 ≳ 2 𝑘 / 𝑘.
61
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
1
P(|𝑋 − 𝜇| ≥ 2𝜎) ≤ ,
4
and thus
√ 3
P(|𝑋 − 𝜇| < 𝑛 𝑘) = P(|𝑋 − 𝜇| < 2𝜎) ≥ .
4
−𝑘
√ every (𝜀 1 , . . . , 𝜀 𝑘 ) ∈ {0, 1} , we√have P(𝑋
Since 𝑋 takes distinct values for √ = 𝑥) ≤ 2
𝑘
for all 𝑥. Since there are ≤ 2𝑛 𝑘 elements in the interval (𝜇 − 𝑛 𝑘, 𝜇 + 𝑛 𝑘), we have
√ √
P(|𝑋 − 𝜇| < 𝑛 𝑘) ≤ 2𝑛 𝑘2−𝑘 .
√
Putting the upper and lowers bounds on P(|𝑋 − 𝜇| < 𝑛 𝑘) together, we get
√ 3
2𝑛 𝑘2−𝑘 ≤ .
4
√
So 𝑛 ≳ 2 𝑘 / 𝑘. □
Dubroff, Fox, and Xu (2021) gave another short proof of this result by applying Harper’s
vertex-isoperimetric inequality on the cube (this is an example of “concentration of
measure”, which we will explore more later this course).
Consider the graph representing the 𝑛-dimensional boolean cube, with vertex set {0, 1}𝑛
with an edge between every pair of 𝑛-tuples that differ in exactly one coordinate. Given
𝐴 ⊆ {0, 1}𝑛 , write 𝜕 𝐴 for the set of all vertices outside 𝐴 that is adjacent to some
vertex of 𝐴.
62
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
Remark 4.6.5. A stronger form of Harper’s theorem gives the precise value of
min |𝜕 𝐴|
𝐴⊆{0,1} 𝑛 :| 𝐴|=𝑚
for every (𝑛, 𝑚). Basically, the minimum is achieved when 𝐴 is a a Hamming ball, or, if
𝑚 is not exactly the size of some Hamming ball, then 𝐴 consists of the lexicographically
first 𝑚 elements of {0, 1}𝑛 .
Remark 4.6.7. The above bound has the currently best known leading constant factor,
matching an earlier result by Aliev (2009).
63
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
is chosen as some polynomials peaks at 𝑥 = 𝑖/𝑛 and then decays as 𝑥 moves away from
𝑖/𝑛.
For each 𝑥 ∈ [0, 1], the binomial distribution Binomial(𝑛, 𝑥) has mean 𝑛𝑥 and variance
𝑛𝑥(1 − 𝑥) ≤ 𝑛. By Chebyshev’s inequality,
∑︁
𝐸𝑖 (𝑥) = P(|Binomial(𝑛, 𝑥) − 𝑛𝑥| > 𝑛2/3 ) ≤ 𝑛−1/3 .
𝑖:|𝑖−𝑛𝑥|>𝑛2/3
(In the next chapter, we will see a much better tail bound.)
Since [0, 1] is compact, 𝑓 is uniformly continuous and bounded. By multiplying by
a constant, we assume that | 𝑓 (𝑥)| ≤ 1 for all 𝑥 ∈ [0, 1]. Also there exists 𝛿 > 0 such
that | 𝑓 (𝑥) − 𝑓 (𝑦)| ≤ 𝜀/2 for all 𝑥, 𝑦 ∈ [0, 1] with |𝑥 − 𝑦| ≤ 𝛿.
Take 𝑛 > max{64𝜀 −3 , 𝛿−3 }. Then for every 𝑥 ∈ [0, 1] (note that 𝑛𝑗=0 𝐸 𝑗 (𝑥) = 1),
Í
𝑛
∑︁
|𝑃𝑛 (𝑥) − 𝑓 (𝑥)| ≤ 𝐸𝑖 (𝑥)| 𝑓 (𝑖/𝑛) − 𝑓 (𝑥)|
𝑖=0
∑︁ ∑︁
≤ 𝐸𝑖 (𝑥)| 𝑓 (𝑖/𝑛) − 𝑓 (𝑥)| + 2𝐸𝑖 (𝑥)
𝑖:|𝑖/𝑛−𝑥|<𝑛 −1/3 <𝛿 𝑖:|𝑖−𝑛𝑥|>𝑛2/3
𝜀
≤ + 2𝑛−1/3 ≤ 𝜀. □
2
Exercises
1. Let 𝑋 be a nonnegative real-valued random variable. Suppose P(𝑋 = 0) < 1.
Prove that
Var 𝑋
P(𝑋 = 0) ≤ .
E[𝑋 2 ]
64
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
2. Let 𝑋 be a random variable with mean 𝜇 and variance 𝜎 2 . Prove that for all
𝜆 > 0,
𝜎2
P(𝑋 ≥ 𝜇 + 𝜆) ≤ 2 .
𝜎 + 𝜆2
3. Threshold for 𝑘-APs. Let [𝑛] 𝑝 denote the random subset of {1, . . . , 𝑛} where
every element is included with probability 𝑝 independently. For each fixed
integer 𝑘 ≥ 3, determine the threshold for [𝑛] 𝑝 to contain a 𝑘-term arithmetic
progression.
4. What is the threshold function for 𝐺 (𝑛, 𝑝) to contain a cycle?
5. Show that, for each fixed positive integer 𝑘, there is a sequence 𝑝 𝑛 such that
Hint: Make the random graph contain some specific subgraph but not some others.
6. Poisson limit. Let 𝑋 be the number of triangles in 𝐺 (𝑛, 𝑐/𝑛) for some fixed
𝑐 > 0.
a) For every nonnegative integer 𝑘, determine the limit of E 𝑋𝑘 as 𝑛 → ∞.
b) Let 𝑌 ∼ Binomial(𝑛, 𝜆/𝑛) for some fixed 𝜆 > 0. For every nonnegative
integer 𝑘, determine the limit of E 𝑌𝑘 as 𝑛 → ∞, and show that it agrees
with the limit in (a) for some 𝜆 = 𝜆(𝑐).
We know that 𝑌 converges to the Poisson distribution with mean 𝜆. Also,
the Poisson distribution is determined by its moments.
c) Compute, for fixed every nonnegative integer 𝑡, the limit of P(𝑋 = 𝑡) as
𝑛 → ∞.
(In particular, this gives the limit probability that 𝐺 (𝑛, 𝑐/𝑛) contains a
triangle, i.e., lim𝑛→∞ P(𝑋 > 0). This limit increases from 0 to 1 continu-
ously when 𝑐 ranges from 0 to +∞, thereby showing that the property of
containing a triangle has a coarse threshold.)
7. Central limit theorem for triangle counts. Find a real (non-random) sequence 𝑎 𝑛
so that, letting 𝑋 be the number of triangles and 𝑌 be the number of edges in the
random graph 𝐺 (𝑛, 1/2), one has
65
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
4 Second Moment
or first write the edge indicator variables as 𝑋𝑖 𝑗 = (1 + 𝑌𝑖 𝑗 )/2 and then expand. The latter
approach likely yields a cleaner computation.)
11. Nearly perfect triangle factor. Prove that, with probability approaching 1 as
𝑛 → ∞,
a) 𝐺 (𝑛, 𝑛−2/3 ) has at least 𝑛/100 vertex-disjoint triangles.
b) Simple nibble. 𝐺 (𝑛, 𝐶𝑛−2/3 ) has at least 0.33𝑛 vertex-disjoint triangles,
for some constant 𝐶.
iterate (a)
Hint: view a random graph as the union of several independent random graphs &
√
13. Let 𝑣 1 = (𝑥 1 , 𝑦 1 ), . . . , 𝑣 𝑛 = (𝑥 𝑛 , 𝑦 𝑛 ) ∈ Z2 with |𝑥𝑖 | , |𝑦𝑖 | ≤ 2𝑛/2 /(100 𝑛) for all
𝑖 ∈ [𝑛]. Show that there are two disjoint sets 𝐼, 𝐽 ⊆ [𝑛], not both empty, such
Í Í
that 𝑖∈𝐼 𝑣 𝑖 = 𝑗 ∈𝐽 𝑣 𝑗 .
66
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao
14. ★ Prove that there is an absolute constant 𝐶 > 0 so that the following holds. For
every prime 𝑝 and every 𝐴 ⊆ Z/𝑝Z with | 𝐴| = 𝑘, there exists √ an integer 𝑥 so
that {𝑥𝑎 : 𝑎 ∈ 𝐴} intersects every interval of length at least 𝐶 𝑝/ 𝑘 in Z/𝑝Z.
15. ★ Prove that there is a constant 𝑐 > 0 so that every hyperplane containing the
origin in R𝑛 intersects at least 𝑐-fraction of the 2𝑛 closed unit balls centered at
{−1, 1}𝑛 .
67
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.