0% found this document useful (0 votes)
23 views32 pages

Second Moment

Uploaded by

kjadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views32 pages

Second Moment

Uploaded by

kjadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

4.1 Does a typical random graph contain a triangle?


We begin with the following motivating question. Recall that the Erdős–Rényi random
graph 𝐺 (𝑛, 𝑝) is the 𝑛-vertex graph with edge probability 𝑝.

Question 4.1.1
For which 𝑝 = 𝑝 𝑛 does 𝐺 (𝑛, 𝑝) contain a triangle with probability 1 − 𝑜(1)?

(We sometimes abbreviate “with probability 1 − 𝑜(1) by “with high probability” or


simply “whp”. In some literature, this is also called “asymptotically almost surely” or
“a.a.s.”)
By computing E𝑋 (also known as the first moment), we deduce the following.

Proposition 4.1.2
If 𝑛𝑝 → 0, then 𝐺 (𝑛, 𝑝) is triangle-free with probability 1 − 𝑜(1).

Proof. Let 𝑋 be the number of triangles in 𝐺 (𝑛, 𝑝). We know from linearity of
expectations that  
𝑛 3
E𝑋 = 𝑝 ≍ 𝑛3 𝑝 3 = 𝑜(1).
3
Thus, by Markov’s inequality,

P(𝑋 ≥ 1) ≤ E𝑋 = 𝑜(1).

In other words, 𝑋 = 0 with probability 1 − 𝑜(1). □

In other words, when 𝑝 ≪ 1/𝑛, 𝐺 (𝑛, 𝑝) is triangle-free with high probaiblity (recall
that 𝑝 ≪ 1/𝑛 means 𝑝 = 𝑜(1/𝑛); see asymptotic notation guide at the beginning of
these notes).
What about when 𝑝 ≫ 1/𝑛? Can we conclude that 𝐺 (𝑛, 𝑝) contains a triangle with
high probability? In this case E𝑋 → ∞, but this is not enough to conclude that

37
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

P(𝑋 ≥ 1) = 1 − 𝑜(1), since we have not ruled out the probability that 𝑋 is almost
always zero but extremely large with some tiny probability.
An important technique in probabilistic combinatorics is to show that some random
variable is concentrated around its mean. This would then imply that outliers are
unlikely.
We will see many methods in this course on proving concentrations of random vari-
ables. In this chapter, we begin with the simplest method. It is usually easiest to
execute and it requires not much hypotheses. The downside is that it only produces
relatively weak (though still useful enough) concentration bounds.
Second moment method: show that a random variable is concentrated near its mean
by bounding its variance.

Definition 4.1.3 (Variance)


The variance of a random variable 𝑋 is

Var[𝑋] := E[(𝑋 − E𝑋) 2 ] = E[𝑋 2 ] − E[𝑋] 2 .

The covariance of two random variables 𝑋 and 𝑌 (jointly distributed) is

Cov[𝑋, 𝑌 ] := E[(𝑋 − E𝑋)(𝑌 − E𝑌 )] = E[𝑋𝑌 ] − E[𝑋]E[𝑌 ].

(Exercise: check the second equality in the definitions of variance and covariance
above).

Remark 4.1.4 (Notation convention). It is standard to use the Greek letter 𝜇 for the
mean, and 𝜎 2 for the variance. Here 𝜎 ≥ 0 is the standard deviation.

The following basic result provides a concentration bound based on the variance.

Theorem 4.1.5 (Chebyshev’s inequality)


Let 𝑋 be a random variable with mean 𝜇 and variance 𝜎 2 . For any 𝜆 > 0

P(|𝑋 − 𝜇| ≥ 𝜆𝜎) ≤ 𝜆−2 .

Proof. By Markov’s inequality,

E[(𝑋 − 𝜇) 2 ] 1
𝐿𝐻𝑆 = P(|𝑋 − 𝜇| 2 ≥ 𝜆2 𝜎 2 ) ≤ 2 2
= 2. □
𝜆 𝜎 𝜆

Remark 4.1.6. Concentration bounds that show small probability of deviating from
the mean are called tail bounds (more precisely: upper tail for 𝑋 ≥ 𝜇 + 𝑎 and lower tail

38
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.1 Does a typical random graph contain a triangle?

for P(𝑋 ≤ 𝜇 − 𝑎)). Chebyshev’s inequality gives tail bounds that decays quadratically.
Later on we will see tools that give much better decay (usually exponential) provided
additional assumptions on the random variable (e.g., independence).

We are often interested in upper bounding the probability of non-existence, i.e., P(𝑋 =
0). Chebyshev’s inequality yields the following bound.

Corollary 4.1.7 (Chebyshev bound on the probability of non-existence)


For any random variable 𝑋,

Var 𝑋
P(𝑋 = 0) ≤ .
(E𝑋) 2

Proof. By Chebyshev inequality, writing 𝜇 = E𝑋,

Var 𝑋
P(𝑋 = 0) ≤ P(|𝑋 − 𝜇| ≥ |𝜇|) ≤ . □
𝜇2

Corollary 4.1.8
If E𝑋 > 0 and Var 𝑋 = 𝑜(E𝑋) 2 , then 𝑋 > 0 and 𝑋 ∼ E𝑋 with probability 1 − 𝑜(1).

Remark 4.1.9 (Asymptotic statements). The above statement is really referring to


not a single random variable, but a sequence of random variables 𝑋𝑛 . It is saying that
if E𝑋𝑛 > 0 and Var 𝑋𝑛 = 𝑜(E𝑋𝑛 ) 2 , then P(𝑋𝑛 > 0) → 1 as 𝑛 → ∞, and for any fixed
𝛿 > 0, P(|𝑋𝑛 − 𝐸 𝑋𝑛 | > 𝛿E𝑋𝑛 ) → 0 as 𝑛 → ∞.

In many situations, it is not too hard to compute the second moment. We have
Var[𝑋] = Cov[𝑋, 𝑋]. Also, covariance is bilinear, i.e., for random variables 𝑋1 , . . .
and 𝑌1 , . . . (no assumptions needed on their independence, etc.) and constants 𝑎 1 , . . .
and 𝑏 1 , . . . , one has
" #
∑︁ ∑︁ ∑︁
Cov 𝑎𝑖 𝑋𝑖 , 𝑏 𝑗𝑌𝑗 = 𝑎𝑖 𝑏 𝑗 Cov[𝑋𝑖 , 𝑌 𝑗 ].
𝑖 𝑗 𝑖, 𝑗

We are often dealing with 𝑋 being the cardinality of some random set. We can usually
write this as a sum of indicator functions, such as 𝑋 = 𝑋1 + · · · + 𝑋𝑛 , so that
𝑛
∑︁ 𝑛
∑︁ ∑︁
Var 𝑋 = Cov[𝑋, 𝑋] = Cov[𝑋𝑖 , 𝑋 𝑗 ] = Var 𝑋𝑖 + 2 Cov[𝑋𝑖 , 𝑋 𝑗 ].
𝑖, 𝑗=1 𝑖=1 𝑖< 𝑗

39
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

We have Cov[𝑋, 𝑌 ] = 0 if 𝑋 and 𝑌 are independent. Thus in the sum we only need to
consider dependent pairs (𝑖, 𝑗).

Example 4.1.10 (Sum of independent Bernoulli). Suppose 𝑋 = 𝑋1 + · · · + 𝑋𝑛 with


each 𝑋𝑖 being an independent Bernoulli random variables with P(𝑋𝑖 = 1) = 𝑝 and
P(𝑋𝑖 = 0) = 1 − 𝑝. Then 𝜇 = 𝑛𝑝 and 𝜎 2 = 𝑛𝑝(1 − 𝑝) (note that Var[𝑋𝑖 ] = 𝑝 − 𝑝 2 and
Cov[𝑋𝑖 , 𝑋 𝑗 ] = 0 if 𝑖 ≠ 𝑗). If 𝑛𝑝 → ∞, then 𝜎 = 𝑜(𝜇), and thus 𝑋 = 𝜇 + 𝑜(𝜇) whp.
Note that the above computation remains identical even if we only knew that the 𝑋𝑖 ’s
are pairwise uncorrelated (much weaker than assuming full independence).
Here the “tail probability” (the bound hidden in “whp”) decays polynomially in the
deviation. Later on we will derive much sharper rates of decay (exponential) using
more powerful tools such as the Chernoff bound when the r.v.’s are independent.

Let us now return to the problem of determining when 𝐺 (𝑛, 𝑝) contains a triangle
whp.

Theorem 4.1.11
If 𝑛𝑝 → ∞, then 𝐺 (𝑛, 𝑝) contains a triangle with probability 1 − 𝑜(1).

Proof. Label the vertices by [𝑛]. Let 𝑋𝑖 𝑗 be the indicator random variable of the edge
𝑖 𝑗, so that 𝑋𝑖 𝑗 = 1 if the edge is present, and 𝑋𝑖 𝑗 = 0 if the edge is not present in the
random graph. Let us write
𝑋𝑖 𝑗 𝑘 := 𝑋𝑖 𝑗 𝑋𝑖𝑘 𝑋 𝑗 𝑘 .
Then the number of triangles in 𝐺 (𝑛, 𝑝) is given by
∑︁
𝑋= 𝑋𝑖 𝑗 𝑋𝑖𝑘 𝑋 𝑗 𝑘 .
𝑖< 𝑗 <𝑘

Now we compute Var 𝑋. Note that the summands of 𝑋 are not all independent.
If 𝑇1 and 𝑇2 are each 3-vertex subsets, then

Cov[𝑋𝑇1 , 𝑋𝑇2 ] = E[𝑋𝑇1 𝑋𝑇2 ] − E[𝑋𝑇1 ]E[𝑋𝑇2 ] = 𝑝 𝑒(𝑇1 ∪𝑇2 ) − 𝑝 𝑒(𝑇1 )+𝑒(𝑇2 )
0 if |𝑇1 ∩ 𝑇2 | ≤ 1,






= 𝑝 5 − 𝑝 6 if |𝑇1 ∩ 𝑇2 | = 2,

 𝑝 3 − 𝑝 6 if 𝑇1 = 𝑇2 .


40
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.1 Does a typical random graph contain a triangle?

The number of pairs (𝑇1 , 𝑇2 ) of triangles sharing exactly on edge is 𝑂 (𝑛4 ). Thus
∑︁
Var 𝑋 = Cov[𝑋𝑇1 , 𝑋𝑇2 ] = 𝑂 (𝑛3 )( 𝑝 3 − 𝑝 6 ) + 𝑂 (𝑛4 )( 𝑝 5 − 𝑝 6 )
𝑇1 ,𝑇2

≲ 𝑛3 𝑝 3 + 𝑛4 𝑝 5 = 𝑜(𝑛6 𝑝 6 ) as 𝑛𝑝 → ∞.

Thus Var 𝑋 = 𝑜(E𝑋) 2 , and hence 𝑋 > 0 whp by Corollary 4.1.8. □

Here is what we have learned so far: for 𝑝 = 𝑝 𝑛 and as 𝑛 → ∞,


(
0 if 𝑛𝑝 → 0,
P(𝐺 (𝑛, 𝑝) contains a triangle) →
1 if 𝑛𝑝 → ∞.

We say that 1/𝑛 is a threshold for containing a triangle, in the sense that if 𝑝 grows
asymptotically faster than this threshold, i.e., 𝑝 ≫ 1/𝑛, then the event occurs with
probability 1 − 𝑜(1), while if 𝑝 ≪ 1/𝑛, then the event occurs with probability 𝑜(1).
Note that the definition of a threshold ignores leading constant factors (so that it is also
correct to say that 2/𝑛 is also a threshold for containing a triangle). Determining the
thresholds of various properties in random graphs (as well as other random settings)
is a central topic in probabilistic combinatorics. We will discuss thresholds in more
depth later in this chapter.
What else might you want to know about the probability that 𝐺 (𝑛, 𝑝) contains a
triangle?

Remark 4.1.12 (Poisson limit). What if 𝑛𝑝 → 𝑐 > 0 for some constant 𝑐 > 0? It
turns out in this case that the number of triangles of 𝐺 (𝑛, 𝑝) approaches a Poisson
distribution with constant mean. You will show this in the homework. It will be
done via the method of moments: if 𝑍 is some random variable with sufficiently
nice properties (known as “determined by moments”, which holds for many common
distributions such as the Poisson distribution and the normal distribution), and 𝑋𝑛 is
a sequence of random variables such that E𝑋𝑛𝑘 → E𝑍 𝑘 for all nonnegative integers 𝑘,
then 𝑋𝑛 converges in distribution to 𝑍.

Remark 4.1.13 (Asymptotic normality). Suppose 𝑛𝑝 → ∞. From the above proof,


we also deduce that 𝑋 ∼ E𝑋, i.e., the number of triangles is concentrated around its
mean. In fact, we know much more. It turns out that the number 𝑋 of triangles in
𝐺 (𝑛, √
𝑝) is asymptotically normal, meaning that it satisfies a central limit theorem: (𝑋 −
E𝑋)/ Var 𝑋 converges in distribution to the standard normal 𝑁 (0, 1) in distribution.
This was shown by Rucinski√ (1988) via the method of moments, by computing the
𝑘-th moment of (𝑋 − E𝑋)/ Var 𝑋 in the limit, and showing that it agrees with the
𝑘-th moment of the standard normal.

41
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

In the homework, you will prove the asymptotic normality of 𝑋 using a later-found
method of projections. The idea is to show that that 𝑋 close to another random variable
that is already known to be asymptotically normal by checking that their difference
has negligible variance. For triangle counts, when 𝑝 ≫ 𝑛−1/2 , we can compare the
number of triangles to the number of edges after a normalization. The method can be
further modified for greater generality. See the §6.4 in the book Random Graphs by
Janson, Łuczak, and Rucinski (2000).

Remark 4.1.14 (Better tail bounds). Later on we will use more powerful tools (in-
cluding martingale methods and Azuma-Hoeffding inequalities, and also Janson in-
equalities) to prove better tail bounds on triangle (and other subgraph) counts.

4.2 Thresholds for fixed subgraphs


In the last section, we determined the threshold for 𝐺 (𝑛, 𝑝) to contain a triangle. What
about other subgraphs instead of a triangle? In this section, we give a complete answer
to this question for any fixed subgraph.

Question 4.2.1
What is the threshold for containing a fixed 𝐻 as a subgraph?

In other words, we wish to find some sequence 𝑞 𝑛 so that:


• (0-statement) if 𝑝 𝑛 /𝑞 𝑛 → 0 (i.e., 𝑝 𝑛 ≪ 𝑞 𝑛 ), then 𝐺 (𝑛, 𝑝 𝑛 ) contains 𝐻 with
probability 𝑜(1);
• (1-statement) if 𝑝 𝑛 /𝑞 𝑛 → ∞ (i.e., 𝑝 𝑛 ≫ 𝑞 𝑛 ), then 𝐺 (𝑛, 𝑝 𝑛 ) contains 𝐻 with
probability 1 − 𝑜(1).
(It is not a priori clear why such a threshold exists in the first place. In fact, threshold
always exist for monotone properties, as we will see in the next section.)
Building on our calculations for triangles from previous section, let us consider a more
general setup for estimating the variance so that we can be more organized in our
calculations.

Setup 4.2.2 (for variance bound with few dependencies)


Suppose 𝑋 = 𝑋1 + · · · + 𝑋𝑚 where 𝑋𝑖 is the indicator random variable for event 𝐴𝑖 .
Write 𝑖 ∼ 𝑗 if 𝑖 ≠ 𝑗 and the pair of events ( 𝐴𝑖 , 𝐴 𝑗 ) are not independent. Define
∑︁
𝚫∗ := max

P 𝐴 𝑗 𝐴𝑖 .
𝑖
𝑗: 𝑗∼𝑖

42
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.2 Thresholds for fixed subgraphs

Remark 4.2.3. (a) For many applications with an underlying symmetry between
the events, the sum in the definition of Δ∗ does not actually depend on 𝑖.
(b) In the definition of the dependency graph (𝑖 ∼ 𝑗) above, we are only considering
pairwise dependence. Later on when we study the Lovász Local Lemma, we
will need a strong notion of a dependency graph.
(c) This method is appropriate for a collection of events with few dependencies. It
is not appropriate for where there are many weak dependencies (e.g., Section 4.5
on the Hardy–Ramanujan theorem on the number of distinct prime divisors).

We have the bound

Cov[𝑋𝑖 , 𝑋 𝑗 ] = E[𝑋𝑖 𝑋 𝑗 ] − E[𝑋𝑖 ]E[𝑋 𝑗 ] ≤ E[𝑋𝑖 𝑋 𝑗 ] = P[ 𝐴𝑖 𝐴 𝑗 ] = P( 𝐴𝑖 )P( 𝐴 𝑗 | 𝐴𝑖 ).

(Here 𝐴𝑖 𝐴 𝑗 is the shorthand for 𝐴𝑖 ∧ 𝐴 𝑗 , meaning that both events occur.) Also

Cov[𝑋𝑖 , 𝑋 𝑗 ] = 0 if 𝑖 ≠ 𝑗 and 𝑖 ≁ 𝑗 .

Thus
𝑚
∑︁ 𝑚
∑︁ 𝑚
∑︁ ∑︁
Var 𝑋 = Cov[𝑋𝑖 , 𝑋 𝑗 ] ≤ P( 𝐴𝑖 ) + P( 𝐴𝑖 ) P( 𝐴 𝑗 | 𝐴𝑖 )
𝑖, 𝑗=1 𝑖=1 𝑖=1 𝑗: 𝑗∼𝑖

≤ E𝑋 + (E𝑋)Δ∗ .

Recall from Corollary 4.1.8 that E𝑋 > 0 and Var 𝑋 = 𝑜(E𝑋) 2 imply 𝑋 > 0 and
𝑋 ∼ E𝑋 whp. So we have the following.

Lemma 4.2.4
In the above setup, if E𝑋 → ∞ and Δ∗ = 𝑜(E𝑋), then 𝑋 > 0 and 𝑋 ∼ E𝑋 whp.

Let us now determine the threshold for containing 𝐾4 .

Theorem 4.2.5
The threshold for containing 𝐾4 is 𝑛−2/3 .

Proof. Let 𝑋 denote the number of copies of 𝐾4 in 𝐺 (𝑛, 𝑝). Then


 
𝑛 6
E𝑋 = 𝑝 ≍ 𝑛4 𝑝 6 .
4

If 𝑝 ≪ 𝑛−2/3 then E𝑋 = 𝑜(1), and thus 𝑋 = 0 whp by Markov’s inequality.

43
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

Now suppose 𝑝 ≫ 𝑛−2/3 , so E𝑋 → ∞. For each 4-vertex subset 𝑆, let 𝐴𝑆 be the event
that 𝑆 is a clique in 𝐺 (𝑛, 𝑝).
For each fixed 𝑆, one has 𝐴𝑆 ∼ 𝐴𝑆′ if and only if |𝑆 ∩ 𝑆′ | ≥ 2.
• The number of 𝑆′ that share exactly 2 vertices with 𝑆 is 6 𝑛−2 2

2 = 𝑂 (𝑛 ), and for
each such 𝑆′ one has P( 𝐴𝑆′ | 𝐴𝑆 ) = 𝑝 5 (as there are 5 additional edges not in the
𝑆-clique that need to appear clique to form the 𝑆′-clique).
• The number of 𝑆′ that share exactly 3 vertices with 𝑆 is 4(𝑛 − 4) = 𝑂 (𝑛), and
for each such 𝑆′ one has P( 𝐴𝑆′ | 𝐴𝑆 ) = 𝑝 3 .
Summing over all above 𝑆′, we find
∑︁
Δ∗ = P( 𝐴𝑆′ | 𝐴𝑆 ) ≲ 𝑛2 𝑝 5 + 𝑛𝑝 3 ≪ 𝑛4 𝑝 6 ≍ E𝑋.
𝑆 ′ :|𝑆 ′ ∩𝑆|∈{2,3}

Thus 𝑋 > 0 whp by Lemma 4.2.4. □

For both 𝐾3 and 𝐾4 , we saw that any choice of 𝑝 = 𝑝 𝑛 with E𝑋 → ∞ one has 𝑋 > 0
whp. Is this generally true?

Example 4.2.6 (First moment is not enough). Let 𝐻 = . We have E𝑋𝐻 ≍ 𝑛5 𝑝 7 .


If E𝑋 = 𝑜(1) then 𝑋 = 0 whp. But what if E𝑋 → ∞, i.e., 𝑝 ≫ 𝑛−5/7 ?
We know that if 𝑛−5/7 ≪ 𝑝 ≪ 𝑛−2/3 , then 𝑋𝐾4 = 0 whp, so 𝑋𝐻 = 0 whp since 𝐾4 ⊆ 𝐻.
On the other hand, if 𝑝 ≫ 𝑛−2/3 , then whp can find 𝐾4 , and pick an arbitrary edge to
extend to 𝐻 (we’ll prove this).

Thus the threshold for 𝐻 = is actually 𝑛−2/3 , and not 𝑛−5/7 as one might have
naively predicted from the first moment alone.
Why didn’t E𝑋𝐻 → ∞ give 𝑋𝐻 > 0 whp in our proof strategy? In the calculation
of Δ∗ , one of the terms is ≍ 𝑛𝑝 (from two copies of 𝐻 with a 𝐾4 -overlap), and
𝑛𝑝 3 𝑛5 𝑝 7 ≍ E𝑋𝐻 if 𝑝 ≪ 𝑛−2/3 .

The above example shows that the threshold is not always necessarily determined by
the expectation. For the property of containing 𝐻, the example suggests that we should
look at the “densest” subgraph of 𝐻 rather than containing 𝐻 itself.

44
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.2 Thresholds for fixed subgraphs

Definition 4.2.7
Define the edge-vertex ratio of a graph 𝐻 by
𝑒𝐻
𝝆(𝑯) := .
𝑣𝐻

(This is the same as half the average degree.)


Define the maximum edge-vertex ratio of a subgraph of 𝐻:

𝒎(𝑯) := max

𝜌(𝐻 ′).
𝐻 ⊆𝐻

Example 4.2.8. Let 𝐻 = . We have 𝜌(𝐻) = 7/5 whereas 𝜌(𝐾4 ) = 3/2 > 7/5.
It is not hard to check that 𝑚(𝐻) = 𝜌(𝐾4 ) = 3/2 as 𝐾4 is the subgraph of 𝐻 with the
maximum edge-vertex ratio.

Remark 4.2.9 (Algorithm). Goldberg (1984) found a polynomial time algorithm for
computing 𝑚(𝐻) via network flow algorithms.

The next theorem completes determines the threshold for containing some fixed graph
𝐻. Basically, it is determined by the expected number of copies of 𝐻 ′, where 𝐻 ′ is the
“denest” subgraph of 𝐻 (i.e., with the maximum edge-vertex ratio).

Theorem 4.2.10 (Threshold for containing a fixed graph: Bollobás 1981)


Fix a graph 𝐻. Then 𝑝 = 𝑛−1/𝑚(𝐻) is a threshold for containing 𝐻 has a subgraph.

Proof. Let 𝐻 ′ be a subgraph of 𝐻 achieving the maximum edge-vertex ratio, i.e.,


𝜌(𝐻 ′) = 𝑚(𝐻). Let 𝑋𝐻 denote the number of copies of 𝐻 in 𝐺 (𝑛, 𝑝).
If 𝑝 ≪ 𝑛−1/𝑚(𝐻) , then E𝑋𝐻 ′ ≍ 𝑛𝑣 𝐻 ′ 𝑝 𝑒 𝐻 ′ = 𝑜(1), so 𝑋𝐻 ′ = 0 whp, hence 𝑋𝐻 = 0 whp.
Now suppose 𝑝 ≫ 𝑛−1/𝑚(𝐻) . Let us count labeled copies of the subgraph 𝐻 in
𝐺 (𝑛, 𝑝). Let 𝐽 be a labeled copy of 𝐻 in 𝐾𝑛 , and let 𝐴𝐽 denote the event that 𝐽 appears
in 𝐺 (𝑛, 𝑝). We have, for fixed 𝐽,
∑︁ ∑︁ ′
Δ∗ = P ( 𝐴𝐽 ′ | 𝐴𝐽 ) = 𝑝 |𝐸 (𝐽 )\𝐸 (𝐽)|
𝐽 ′ ∼𝐽 𝐽 ′ ∼𝐽

For any 𝐽 ′ ∼ 𝐽, we have


′ ′
𝑛 |𝑉 (𝐽 )\𝑉 (𝐽)| 𝑝 |𝐸 (𝐽 )\𝐸 (𝐽)| ≪ 𝑛 |𝑉 (𝐽)| 𝑝 |𝐸 (𝐽)|

45
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

since
′ ′ ′
𝑝 ≫ 𝑛−1/𝑚(𝐻) ≥ 𝑛−1/𝜌(𝐽∩𝐽 ) = 𝑛−|𝑉 (𝐽)∩𝑉 (𝐽 )|/|𝐸 (𝐽)∩𝐸 (𝐽 )| .
It then follows, after considering all the possible ways that 𝐽 ′ can overlap with 𝐽, that
Δ∗ ≪ 𝑛 |𝑉 (𝐽)| 𝑝 |𝐸 (𝐽)| ≍ E𝑋𝐻 . So Lemma 4.2.4 yields the result. □

Remark 4.2.11. The proof also gives that if 𝑝 ≫ 𝑛−1/𝑚(𝐻) , then the number 𝑋𝐻 of
copies of 𝐻 is concentrated near its mean, i.e., with probability 1 − 𝑜(1),
 
𝑛 𝑣 𝐻 ! 𝑒 𝐻 𝑛𝑣 𝐻 𝑝 𝑒 𝐻
𝑋𝐻 ∼ E𝑋𝐻 = 𝑝 ∼ .
𝑣 𝐻 aut(𝐻) aut(𝐻)

4.3 Thresholds
Previously, we computed the threshold for containing a fixed 𝐻 as a subgraph. In
this section, we take a detour from the discussion of the second moment method and
discuss thresholds in more detail.
We begin by discussing the concept more abstractly by first defining the threshold of
any monotone property on subsets. Then we show that thresholds always exist.
Thresholds form a central topic in probabilistic combinatorics. For any given property,
it is natural to ask the following questions:
1. Where is the threshold?
2. Is the transition sharp? (And more precisely, what is width of the transition
window?)
We understand thresholds well for many basic graph properties, but for many others,
it can be a difficult problem. Also, one might think that one must first understand
the location of the threshold before determining the nature of the phase transition, but
surprisingly this is actually not always the case. There are powerful results that can
sometimes show a sharp threshold without identifying the location of the threshold.

Here is some general setup, before specializing to graphs.


Let Ω be some finite set (ground set). Let Ω 𝑝 be a random subset of Ω where each
element is included with probability 𝑝 independently.
An increasing property, also called monotone property, on subsets of Ω is some
binary property so that if 𝐴 ⊆ Ω satisfies the property, any superset of 𝐴 automatically
satisfies the property.
A property is trivial if all subsets of Ω satisfy the property, or if all subsets of Ω do not
satisfy the property. From now on, we only consider non-trivial monotone properties.

46
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.3 Thresholds

A graph property is a property that only depends on isomorphism classes of graphs.


Whether the random graph 𝐺 (𝑛, 𝑝) satisfies a given property can be cast in our setup
by viewing 𝐺 (𝑛, 𝑝) as Ω 𝑝 with Ω = [𝑛]

2 .

Here are some examples of increasing properties for subgraphs of a given set of
vertices:
• Contains some given subgraph
• Connected
• Has perfect matching
• Hamiltonian
• non-3-colorable
A family F ⊆ P (Ω) of subsets of Ω is called an up-set if whenever 𝐴 ∈ F and 𝐴 ⊆ 𝐵,
then 𝐵 ∈ F . Increasing property is the same as being an element of an up-set. We
will use these two terms interchangeably.

Definition 4.3.1 (Threshold)


Let Ω = Ω (𝑛) be a finite set and F = F (𝑛) an monotone property of subsets of Ω. We
say that 𝑞 𝑛 is a threshold for F if,
(
0 if 𝑝 𝑛 /𝑞 𝑛 → 0,
P(Ω 𝑝 𝑛 ∈ F ) →
1 if 𝑝 𝑛 /𝑞 𝑛 → ∞.

Remark 4.3.2. The above definition is only for increasing properties. We can similarly
define the threshold for decreasing properties by an obvious modification. An example
of a non-monotone property is sontaining some 𝐻 as an induced subgraph. Some (but
not all) non-monotone properties also have thresholds, though we will not discuss it
here.

Remark 4.3.3. From the definition, we see that if 𝑟 𝑛 and 𝑟 𝑛′ are both thresholds of
the same property, then they must be within a constant factor of each other (exercise:
check this). Thus it makes sense to say “the threshold” rather than “a threshold.”

Existence of threshold

Question 4.3.4 (Existence of threshold)


Does every non-trivial monotone property have a threshold?

47
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

How would a monotone property not have a threshold? Perhaps one could have
P(Ω1/𝑛 ∈ F ) and P(Ω (log 𝑛)/𝑛 ∈ F ) ∈ [1/10, 9/10] for all sufficiently large 𝑛?
Before answer this question, let us consider an even more elementary claim.

Theorem 4.3.5 (Monotonicity of satisfying probability)


Let Ω be a finite set and F a non-trivial monotone property of subsets of Ω. Then
𝑝 ↦→ P(Ω 𝑝 ∈ F ) is a strictly increasing function of 𝑝 ∈ [0, 1].

Let us give two related proofs of this basic fact. Both are quite instructive. Both are
based on coupling of random processes.

Proof 1. Let 0 ≤ 𝑝 < 𝑞 ≤ 1. Consider the following process to generate two random
subsets of Ω. For each 𝑥, generate uniform 𝑡𝑥 ∈ [0, 1] independently at random. Let

𝐴 = {𝑥 ∈ Ω : 𝑡𝑥 ≤ 𝑝} and 𝐵 = {𝑥 ∈ Ω : 𝑡𝑥 ≤ 𝑞} .

Then 𝐴 has the same distribution as Ω 𝑝 and 𝐵 has the same distribution as Ω𝑞 .
Furthermore, since 𝑝 < 𝑞, we always have 𝐴 ⊆ 𝐵. Since F is monotone, 𝐴 ∈ F
implies 𝐵 ∈ F . Thus

P(Ω 𝑝 ∈ F ) = P( 𝐴 ∈ F ) ≤ P(𝐵 ∈ F ) = P(Ω𝑞 ∈ F ).

To see that the inequality strict, we simply have to observe that with positive probability,
one has 𝐴 ∉ F and 𝐵 ∈ F (e.g., if all 𝑡 𝑥 ∈ ( 𝑝, 𝑞], then 𝐴 = ∅ and 𝐵 = Ω). □

In the second proof, the idea is to reveal a random subset of Ω in independent random
stages.

Proof 2. (By two-round exposure) Let 0 ≤ 𝑝 < 𝑞 ≤ 1. Note that 𝐵 = Ω𝑞 has the same
distribution as the union of two independent 𝐴 = Ω 𝑝 and 𝐴′ = Ω 𝑝 ′ , where 𝑝′ is chosen
to satisfy 1 − 𝑞 = (1 − 𝑝) (1 − 𝑝′) (check that the probability that each element occurs
is the same in the two processes). Thus

P( 𝐴 ∈ F ) ≤ P( 𝐴 ∪ 𝐴′ ∈ F ) = P(𝐵 ∈ F ).

Like earlier, to observe that the inequality is strict, one observes that with positive
probability, one has 𝐴 ∉ F and 𝐴 ∪ 𝐴′ ∈ F . □

The above technique (generalized from two round exposure to multiple round ex-
posures) gives a nice proof of the following theorem (originally proved using the
Kruskal–Katona theorem).1
1 (Thresholds for random subspaces of F𝑞𝑛 ) The proof of the Bollobás–Thomason paper using the

48
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.3 Thresholds

Theorem 4.3.6 (Existence of thresholds: Bollobás and Thomason 1987)


Every sequence of nontrivial monotone properties has a threshold.

The theorem follows from the next non-asymptotic claim.

Lemma 4.3.7 (Multiple round exposure)


Let Ω be a finite set and F some non-trivial monotone property. If 𝑝 ∈ [0, 1] and 𝑚
is nonnegative integer. Then

P(Ω 𝑝 ∉ F ) ≤ P(Ω 𝑝/𝑚 ∉ F ) 𝑚 .

Proof. Consider 𝑚 independent copies of Ω 𝑝/𝑚 , and let 𝑌 be their union. Since F is
monotone increasing, if 𝑌 ∉ F , then none of the 𝑚 copies lie in F . Hence

P(𝑌 ∉ F ) ≤ P(Ω 𝑝/𝑚 ∉ F ) 𝑚 .

Note that 𝑌 has the same distribution as Ω𝑞 for some 𝑞 ≤ 𝑝. So P(Ω 𝑝 ∉ F ) ≤


P(Ω𝑞 ∉ F ) = P(𝑌 ∉ F ) by Theorem 4.3.5. Combining the two inequalities gives the
result. □

Proof of Theorem 4.3.6. Since 𝑝 ↦→ P(Ω 𝑝 ∈ F ) is a continuous strictly increasing


function from 0 to 1 as 𝑝 goes from 0 to 1 (in fact it is a polynomial in 𝑝), there is
some unique “critical probability” 𝑝 𝑐 so that P(Ω 𝑝 𝑐 ∈ F ) = 1/2.
It remains to check for every 𝜀 > 0, there is some 𝑚 = 𝑚(𝜀) (not depending on the
property) so that

P(Ω 𝑝 𝑐 /𝑚 ∉ F ) ≥ 1 − 𝜀 and P(Ω𝑚 𝑝 𝑐 ∉ F ) ≤ 𝜀.

Kruskal–Katona theorem is still relevant. For example, there is an interesting analog of this concept
for properties of subspaces of F𝑞𝑛 , i.e., random linear codes instead of random graphs. The analogue
of the Bollobás–Thomason theorem was proved by Rossman (2020) via the the Kruskal–Katona
approach. The multiple round exposure proof does not seem to work in the random subspace setting,
as one cannot write a subspace as a union of independent copies of smaller subspaces.
As an aside, I disagree with the use of the term “sharp threshold” in Rossman’s paper for describing
all thresholds for subspaces—one really should be looking at the cardinality of the subspaces rather
than their dimensions. In a related work by Guruswami, Mosheiff, Resch, Silas, and Wootters
(2022), they determine thresholds for random linear codes for properties that seem to be analogous
to the property that a random graph contains a given fixed subgraph. Here again I disagree with
them calling it a “sharp threshold.” It is much more like a coarse threshold once you parameterize
by the cardinality of the subspace, which gives you a much better analogy to the random graph
setting.
Thresholds for random linear codes seems to an interesting topic that has only recently been
studied. I think there is more to be done here.

49
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

(here we write Ω𝑡 = Ω if 𝑡 > 1.) Indeed, applying Lemma 4.3.7 with 𝑝 = 𝑝 𝑐 , we have

P(Ω 𝑝 𝑐 /𝑚 ∉ F ) ≥ P(Ω 𝑝 𝑐 ∉ F ) 1/𝑚 = 2−1/𝑚 ≥ 1 − 𝜀 if 𝑚 ≥ (log 2)/𝜀.

Applying Lemma 4.3.7 again with 𝑝 = 𝑚 𝑝 𝑐 , we have

P(Ω𝑚 𝑝 𝑐 ∉ F ) ≤ P(Ω 𝑝 𝑐 ∉ F ) 𝑚 = 2−𝑚 ≤ 𝜀 if 𝑚 ≥ log2 (1/𝜀).

Thus 𝑝 𝑐 is a threshold for F . □

Examples
We will primarily be studying monotone graph properties. In the previous notation,
Ω = [𝑛]

2 , and we are only considering properties that depend on the isomorphism
class of the graph.

Example 4.3.8 (Containing a triangle). We saw earlier in the chapter that the threshold
for containing a triangle is 1/𝑛:

0 if 𝑛𝑝 → 0,




3


P(𝐺 (𝑛, 𝑝) contains a triangle) → 1 − 𝑒 −𝑐 /6 if 𝑛𝑝 → 𝑐 ∈ (0, ∞)


1

 if 𝑛𝑝 → ∞.

In this case, the threshold is determined by the expected number of triangles Θ(𝑛3 𝑝 3 ),
and whether this quantity goes to zero or infinity (in the latter case, we used a second
moment method to show that the number of triangles is positive with high probability).
What if 𝑝 = Θ(1/𝑛)? If 𝑛𝑝 → 𝑐 for some constant 𝑐 > 0, then (you will show in the
homework via the method of moments) that the number of triangles is asymptotically
Poisson distributed, and in particular the limit probability of containing a triangle
increases from 0 to 1 as a continuous function of 𝑐 ∈ (0, ∞). So, in particular, as
𝑝 increases, it goes through a “window of transition” of width Θ(1/𝑛) in order for
P(𝐺 (𝑛, 𝑝) contains a triangle) to increase from 0.01 to 0.99. The width of this window
is on the same order as the threshold. In this case, we call it a coarse transition.

Example 4.3.9 (Containing a subgraph). Theorem 4.2.10 tells us that the threshold
for containing a fixed subgraph 𝐻 is 𝑛−1/𝑚(𝐻) . Here the threshold is not always
determined by the expected number of copies of 𝐻. Instead, we need to look at
the “densest subgraph” 𝐻 ′ ⊆ 𝐻 with the largest edge-vertex ratio (i.e., equivalent to
largest average degree). The threshold is determined by whether the expected number
of copies of 𝐻 ′ goes to zero or infinity.
Similar to the triangle case, we have a coarse threshold.

50
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.3 Thresholds

The analysis can also be generalized to containing one of several fixed subgraphs
𝐻1 , . . . , 𝐻 𝑘 .

Remark 4.3.10 (Monotone graph properties are characterized by subgraph con-


tainment). Every monotone graph property can be characterized as containing some
element of H for some H that could depend on the vertex set 𝑛. For example, the
property of connectivity corresponds to taking H to be all spanning trees. More
generally, one can take H to be the set of all minimal graphs satisfying the property.
When elements of H are unbounded in size, the problem of thresholds become quite
interesting and sometimes difficult.

The original Erdős–Rényi (1959) paper on random graphs already studied several
thresholds, such as the next two examples.

log 𝑛 + 𝑐 𝑛
Example 4.3.11 (No isolated vertices). With 𝑝 = ,
𝑛

0 if 𝑐 𝑛 → −∞




−𝑐


P (𝐺 (𝑛, 𝑝) has no isolated vertices) → 1 − 𝑒 𝑒 if 𝑐 𝑛 → 𝑐


1

 if 𝑐 𝑛 → ∞

It is a good exercise (and included in the problem set) to check the above claims. The
cases 𝑐 𝑛 → −∞ and 𝑐 𝑛 → ∞ can be shown using the second moment method. More
precisely, when 𝑐 𝑛 → 𝑐, by comparing moments one can show that the number of
isolated vertices is asymptotically Poisson.
In this example, the threshold is at (log 𝑛)/𝑛. As we see above, the transition window
is Θ(1/𝑛), much narrower the magnitude of the threshold. In particular, the event
probability goes from 𝑜(1) to 1 − 𝑜(1) when 𝑝 increases from (1 − 𝛿)(log 𝑛)/𝑛 to
(1 + 𝛿)(log 𝑛)/𝑛 for any fixed 𝛿 > 0. In this case, we say that the property has a sharp
threshold at (log 𝑛)/𝑛 (here the leading constant factor is relevant, unlike the earlier
example of a coarse threshold).

log 𝑛 + 𝑐 𝑛
Example 4.3.12 (Connectivity). With 𝑝 =
𝑛

0 if 𝑐 𝑛 → −∞




−𝑐


P (𝐺 (𝑛, 𝑝) is connected) → 1 − 𝑒 𝑒 if 𝑐 𝑛 → 𝑐


1

 if 𝑐 𝑛 → ∞

In fact, a much stronger statement is true, connecting the above two examples: consider
a process where one adds a random edges one at a time, then with probability 1 − 𝑜(1),

51
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

the graph becomes connected as soon as there are no more isolated vertices. Such
stronger characterization is called a hitting time result.
A similar statement is true if we replace “is connected” by “has a perfect matching”
(assuming 𝑛 even).

Example 4.3.13 (Perfect matching in a random hypergraph: Shamir’s problem).


Let 𝐺 (3) (𝑛, 𝑝) be a random hypergraph on 𝑛 vertices, where each triple of vertices
appears as an edge with probability 𝑝. Assume that 𝑛 is divisible by 3. What is the
threshold for the existence of a perfect matching (i.e., a set of 𝑛/3 edges covering all
vertices)?
It is easy to check that the property of having no isolated vertices has a sharp threshold
at 𝑝 = 2𝑛−2 log 𝑛. Is this also a threshold for having a perfect matching? So for smaller
𝑝, one cannot have a perfect matching due to having an isolated vertex. What about
larger 𝑝? This turns out to be a difficult problem known as “Shamir’s problem”.
A difficult result by Johansson, Kahn, and Vu (2008) (this paper won a Fulkerson Prize)
showed that there is some constant 𝐶 > 0 so that if 𝑝 ≥ 𝐶𝑛−2 log 𝑛 then 𝐺 (3) (𝑛, 𝑝)
contains a perfect matching with high probability. They also solved the problem much
generally for 𝐻-factors in random 𝑘-uniform hypergraphs.
Recent exciting breakthroughs on the Kahn–Kalai conjecture (2007) by Frankston,
Kahn, Narayanan, and Park (2021) and Park and Pham (2024) provide new and much
shorter proofs of this threshold for Shamir’s problem.
Recently, Kahn (2022) proved a sharp threshold result, and actually an even stronger
hitting time version, of Shamir’s problem, showing that with high probability, one has
a perfect matching as soon as there are no isolated vertices.

Sharp transition
In some of the examples, the probability that 𝐺 (𝑛, 𝑝) satisfies the property changes
quickly and dramatically as 𝑝 crosses the threshold (physical analogy: similar to how
the structure of water changes dramatically as the temperature drops below freezing).
For example, while for connectivity, while 𝑝 = log 𝑛/𝑛 is a threshold, we see that
𝐺 (𝑛, 0.99 log 𝑛/𝑛) is whp not connected and 𝐺 (𝑛, 1.01 log 𝑛/𝑛) is whp connected,
unlike the situation for containing a triangle earlier. We call this the sharp threshold
phenomenon.

52
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.3 Thresholds

Coarse threshold Sharp threshold

o
Pt P
o S Igoe P
Dpet
Figure 4.1: Examples of coarse and sharp thresholds. The vertical axis is the
probability that 𝐺 (𝑛, 𝑝) satisfies the property.

Definition 4.3.14 (Sharp thresholds)


We say that 𝑟 𝑛 is a sharp threshold for some property F on subsets of Ω if, for every
𝛿 > 0, (
0 if 𝑝 𝑛 /𝑟 𝑛 ≤ 1 − 𝛿,
P(Ω 𝑝 𝑛 ∈ F ) →
1 if 𝑝 𝑛 /𝑟 𝑛 ≥ 1 + 𝛿.
On the other hand, if there is some fixed 𝜀 > 0 and 0 < 𝑐 < 𝐶 so that P(Ω 𝑝 𝑛 ∈ F ) ∈
[𝜀, 1 − 𝜀] for whenever 𝑐 ≤ 𝑝 𝑛 /𝑟 𝑛 ≤ 𝐶, then we say that 𝑟 𝑛 is a coarse threshold.

As in Figure 4.1, the sharp/coarseness of a thresholds is about how quickly P(Ω 𝑝 ∈ F )


transitions from 𝜀 to 1 − 𝜀 as 𝑝 increases. How wide is the transition window for 𝑝? By
the Bollobás–Thomason theorem (Theorem 4.3.6) on the existence of thresholds, this
transition window always has width 𝑂 (𝑟 𝑛 ). If the transition window has width Θ(𝑟 𝑛 )
for some 𝜀 > 0, then we have a coarse threshold. On the other hand, if the transition
window has width 𝑜(𝑟 𝑛 ) for every 𝜀 > 0, then we have a sharp threshold.
From earlier examples, we saw coarse thresholds for the “local” property of containing
some given subgraph, as well as sharp thresholds for “global” properties such as
connectivity. It turns out that this is a general phenomenon.
Friedgut’s sharp threshold theorem (1999), a deep and important result, completely
characterizes when a threshold is coarse versus sharp. We will not state Friedgut’s
theorem precisely here since it is rather technical (and actually not always easy to
apply). Let us just give a flavor. Roughly speaking, the theorem says that:
All monotone graph properties with a coarse threshold may be approxi-
mated by a local property.
In other words, informally, if a monotone graph property P has a coarse threshold,
then there is finite list of graph 𝐺 1 , . . . , 𝐺 𝑚 such that P is “close to” the property of
containing one of 𝐺 1 , . . . , 𝐺 𝑚 as a subgraph.

53
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

We need “close to” since the property could be “contains a triangle and has at least
log 𝑛 edges”, which is not exactly local but it is basically the same as “contains a
triangle.”
There is some subtlety here since we can allow very different properties depending on
the value of 𝑛. E.g., P could be the set of all 𝑛-vertex graphs that contain a 𝐾3 if 𝑛 is
odd and 𝐾4 if 𝑛 is even. Friedgut’s theorem tells us that if there is a threshold, then
there is a partition N = N1 ∪ · · · ∪ N 𝑘 such that on each N𝑖 , P is approximately the
form described in the previous paragraph.
In the last section, we derived that the property of containing some fixed 𝐻 has
threshold 𝑛−1/𝑚(𝐻) for some rational number 𝑚(𝐻). It follows as a corollary of
Friedgut’s theorem that every coarse threshold must have this form.

Corollary 4.3.15 (of Friedgut’s sharp threshold theorem)


Suppose 𝑟 (𝑛) is a coarse threshold of some graph property. Then there is a partition
of N = N1 ∪ · · · ∪ N 𝑘 and rationals 𝛼1 , . . . , 𝛼 𝑘 > 0 such that 𝑟 (𝑛) ≍ 𝑛−𝛼 𝑗 for every
𝑛 ∈ N𝑗.

In particular, if (log 𝑛)/𝑛 is a threshold of some monotone graph property (e.g., this is
the case for connectivity), then we automatically know that it must be a sharp threshold,
even without knowing anything else about the property. Likewise if the threshold has
the form 𝑛−𝛼 for some irrational 𝛼.
The exact statement of Friedgut’s theorem is more cumbersome. We refer those who
are interested to Friedgut’s original 1999 paper and his later survey for details and
applications. This topic is connected more generally to an area known as the analysis
of boolean functions.
Also, it is known that the transition window of every monotone graph property is
(log 𝑛) −2+𝑜(1) (Friedgut––Kalai (1996), Bourgain–Kalai (1997)).
Curiously, tools such as Friedgut’s theorem sometimes allow us to prove the existence
of a sharp threshold without being able to identify its exact location. For example, it is
an important open problem to understand where exactly is the transition for a random
graph to be 𝑘-colorable.

Conjecture 4.3.16 ( 𝑘 -colorability threshold)


For every 𝑘 ≥ 3 there is some real constant 𝑑 𝑘 > 0 such that for any constant 𝑑 > 0,
(
1 if 𝑑 < 𝑑 𝑘 ,
P(𝐺 (𝑛, 𝑑/𝑛) is 𝑘-colorable) →
0 if 𝑑 > 𝑑 𝑘 .

We do know that there exists a sharp threshold for 𝑘-colorability.

54
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.4 Clique number of a random graph

Theorem 4.3.17 (Achlioptas and Friedgut 2000)


For every 𝑘 ≥ 3, there exists a function 𝑑 𝑘 (𝑛) such that for every 𝜀 > 0, and sequence
𝑑 (𝑛) > 0,
(
    1 if 𝑑 (𝑛) < 𝑑 𝑘 (𝑛) − 𝜀,
P 𝐺 𝑛, 𝑑 (𝑛)
𝑛 is 𝑘-colorable →
0 if 𝑑 (𝑛) > 𝑑 𝑘 (𝑛) + 𝜀.

On the other hand, it is not known whether lim𝑛→∞ 𝑑 𝑘 (𝑛) exists, which would imply
Conjecture 4.3.16. Further bounds on 𝑑 𝑘 (𝑛) are known, e.g. the landmark paper of
Achlioptas and Naor (2006) showing that for each fixed 𝑑 > 0, whp 𝜒(𝐺 (𝑛, 𝑑/𝑛) ∈
{𝑘 𝑑 , 𝑘 𝑑 + 1} where 𝑘 𝑑 = min{𝑘 ∈ N : 2𝑘 log 𝑘 > 𝑑}. Also see the later work of
Coja-Oghlan and Vilenchik (2013).

4.4 Clique number of a random graph


The clique number 𝝎(𝑮) of a graph is the maximum number of vertices in a clique
of 𝐺.

Question 4.4.1
What is the clique number of 𝐺 (𝑛, 1/2)?

Let 𝑋 be the number of 𝑘-cliques of 𝐺 (𝑛, 1/2). Define


 
𝑛 − ( 𝑘)
𝑓 (𝑛, 𝑘) := E𝑋 = 2 2 .
𝑘

Let us first do a rough estimate to see what is the critical 𝑘 to get 𝑓 (𝑛, 𝑘) large or small.
𝑛 𝑘
  𝑘
Recall that 𝑒𝑘 ≤ 𝑛𝑘 ≤ 𝑒𝑛𝑘 . We have
 
𝑘
log2 𝑓 (𝑛, 𝑘) = 𝑘 log2 𝑛 − log2 𝑘 − + 𝑂 (1) .
2

And so the transition point is at some 𝑘 ∼ 2 log2 𝑛 in the sense that if 𝑘 ≥ (2+ 𝛿) log2 𝑛,
then 𝑓 (𝑛, 𝑘) → 0 while if 𝑘 ≤ (2 − 𝛿) log2 𝑛, then 𝑓 (𝑛, 𝑘) → ∞.
The next result gives us a lower bound on the typical clique number.

55
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

Theorem 4.4.2 (Second moment bound for clique number)


Let 𝑘 = 𝑘 (𝑛) be some sequence of positive integers.
(a) If 𝑓 (𝑛, 𝑘) → 0, then 𝜔(𝐺 (𝑛, 1/2)) < 𝑘 whp.
(b) If 𝑓 (𝑛, 𝑘) → ∞, then 𝜔(𝐺 (𝑛, 1/2)) ≥ 𝑘 whp.

Proof sketch. The first claim follows from Markov’s inequality as P(𝑋 ≥ 1) ≤ E𝑋.

For the second claim, we bound the variance using Setup 4.2.2. For each 𝑘-element
subset 𝑆 of vertices, let 𝐴𝑆 be the event that 𝑆 is a clique. Let 𝑋𝑆 be the indicator
random variable for 𝐴𝑆 . Recall
∑︁
Δ∗ := max

P 𝐴 𝑗 𝐴𝑖 .
𝑖
𝑗: 𝑗∼𝑖

For fixed 𝑘-set 𝑆, consider all 𝑘-set 𝑇 with |𝑆 ∩ 𝑇 | ≥ 2:


𝑘−1      
∑︁ ∑︁ 𝑘 𝑛−𝑘 ) − ( 𝑘2) omitted 𝑛 − ( 𝑘)
2(
𝑖

Δ = P( 𝐴𝑇 | 𝐴𝑆 ) = 2 ≪ E𝑋 = 2 2 .
𝑖 𝑘 −𝑖 𝑘
𝑇 ∈ ( [𝑛]
𝑘 )
𝑖=2
2≤|𝑆∩𝑇 |≤𝑘−1

It then follows from Lemma 4.2.4 that 𝑋 > 0 (i.e., 𝜔(𝐺) ≥ 𝑘) whp. □

We can a two-point concentration result for the clique number of 𝐺 (𝑛, 1/2). This
result is due to Bollobás–Erdős 1976 and Matula 1976.

Theorem 4.4.3 (Two-point concentration for clique number)


There exists a 𝑘 = 𝑘 (𝑛) ∼ 2 log2 𝑛 such that 𝜔(𝐺 (𝑛, 1/2)) ∈ {𝑘, 𝑘 + 1} whp.

Proof. For 𝑘 ∼ 2 log2 𝑛,

𝑓 (𝑛, 𝑘 + 1) 𝑛 − 𝑘 −𝑘
= 2 = 𝑛−1+𝑜(1) .
𝑓 (𝑛, 𝑘) 𝑘 +1

Let 𝑘 0 = 𝑘 0 (𝑛) ∼ 2 log2 𝑛 be the value with

𝑓 (𝑛, 𝑘 0 ) ≥ 𝑛−1/2 > 𝑓 (𝑛, 𝑘 0 + 1).

Then 𝑓 (𝑛, 𝑘 0 − 1) → ∞ and 𝑓 (𝑛, 𝑘 0 + 1) = 𝑜(1). By Theorem 4.4.2, the clique number
of 𝐺 (𝑛, 1/2) is whp in {𝑘 0 − 1, 𝑘 0 }. □

Remark 4.4.4. By a more careful analysis, one can show that outside a very sparse

56
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.5 Hardy–Ramanujan theorem on the number of prime divisors

subset of integers, one has 𝑓 (𝑛, 𝑘 0 ) → ∞, in which case one has one-point concentra-
tion 𝜔(𝐺 (𝑛, 1/2)) = 𝑘 0 whp.
By taking the complement of the graph, we also get a two-point concentration result
about the independence number of 𝐺 (𝑛, 1/2). Bohman and Hofstad (2024) extended
the two-point concentration result for the independence number of 𝐺 (𝑛, 𝑝) to all
𝑝 ≥ 𝑛−2/3+𝜀 .

Remark 4.4.5 (Chromatic number). Since the chromatic number satisfies 𝜒(𝐺) ≥
𝑛/𝛼(𝐺), we have
𝑛
𝜒(𝐺 (𝑛, 1/2)) ≥ (1 + 𝑜(1)) whp.
2 log2 𝑛

In Theorem 8.3.2, using more advanced methods, we will prove 𝜒(𝐺 (𝑛, 1/2)) ∼
𝑛/(2 log2 𝑛) whp, a theorem due to Bollobás (1987).
In Section 9.3, using martingale concentration, we will show that 𝜒(𝐺 (𝑛, 𝑝)) is tightly
concentrated around its mean without a priori needing to know where the mean is
located.

4.5 Hardy–Ramanujan theorem on the number of


prime divisors
Let 𝝂(𝒏) denote the number of distinct primes dividing 𝑛 (not counting multiplicities).
The next theorem says that “almost all” 𝑛 have (1 + 𝑜(1)) log log 𝑛 prime factors

Theorem 4.5.1 (Hardy and Ramanujan 1917)


For every 𝜀 > 0, there exists 𝐶 such that for all sufficiently large 𝑛, all but 𝜀-fraction
of 𝑥 ∈ [𝑛] satisfy √︁
|𝜈(𝑥) − log log 𝑛| ≤ 𝐶 log log 𝑛

The original proof of Hardy and Ramanujan was quite involved. Here we show a
“probabilistic” proof due to Turán (1934), which played a key role in the development
of probabilistic methods in number theory.

Proof. Choose 𝑥 ∈ [𝑛] uniformly at random. For prime 𝑝, let


(
1 if 𝑝|𝑥,
𝑋𝑝 =
0 otherwise.

57
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

Set 𝑀 = 𝑛1/10 , and (the sum is taken over primes 𝑝).


∑︁
𝑋= 𝑋𝑝 .
𝑝≤𝑀

We have
𝜈(𝑥) − 10 ≤ 𝑋 (𝑥) ≤ 𝜈(𝑥)
since 𝑥 cannot have more than 10 prime factors > 𝑛1/10 . So it suffices to analyze 𝑋.
Since exactly ⌊𝑛/𝑝⌋ positive integers ≤ 𝑛 are divisible by 𝑝, we have

 
⌊𝑛/𝑝⌋ 1 1
E𝑋 𝑝 = = +𝑂 .
𝑛 𝑝 𝑛
We quote Merten’s theorem from analytic number theory:
∑︁
1/𝑝 = log log 𝑛 + 𝑂 (1).
𝑝≤𝑛

(A more precise result says that 𝑂 (1) error term converges to the Meissel–Mertens
constant.) So
∑︁  1  
1
E𝑋 = +𝑂 = log log 𝑛 + 𝑂 (1).
𝑝≤𝑀
𝑝 𝑛

Next we compute the variance. The intuition is that divisibility by distinct primes
should behave somewhat independently. Indeed, if 𝑝𝑞 divides 𝑛, then 𝑋 𝑝 and 𝑋𝑞 are
independent (e.g., by the Chinese Remainder Theorem). If 𝑝𝑞 does not divide 𝑛, but
𝑛 is large enough, then there is some small covariance contribution. (In contrast to the
earlier variance calculations in random graphs, here we have many weak dependices.)
If 𝑝 ≠ 𝑞, then 𝑋 𝑝 𝑋𝑞 = 1 if and only if 𝑝𝑞|𝑥. Thus

Cov[𝑋 𝑝 , 𝑋𝑞 ] = E[𝑋 𝑝 𝑋𝑞 ] − E[𝑋 𝑝 ]E[𝑋𝑞 ]


⌊𝑛/𝑝𝑞⌋ ⌊𝑛/𝑝⌋ ⌊𝑛/𝑞⌋
= −
𝑛 𝑛 𝑛
       
1 1 1 1 1 1
= +𝑂 − +𝑂 +𝑂
𝑝𝑞 𝑛 𝑝 𝑛 𝑞 𝑛
 
1
=𝑂 .
𝑛

Thus
∑︁ 𝑀2
Cov[𝑋 𝑝 , 𝑋𝑞 ] ≲ ≲ 𝑛−4/5 .
𝑝≠𝑞
𝑛

58
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.5 Hardy–Ramanujan theorem on the number of prime divisors

Also, Var 𝑋 𝑝 = E[𝑋 𝑝 ] − (E𝑋 𝑝 ) 2 = (1/𝑝)(1 − 1/𝑝) + 𝑂 (1/𝑛). Combining, we have


∑︁ ∑︁
Var 𝑋 = Var 𝑋 𝑝 + Cov[𝑋 𝑝 , 𝑋𝑞 ]
𝑝≤𝑀 𝑝≠𝑞
∑︁ 1
= + 𝑂 (1) = log log 𝑛 + 𝑂 (1) ∼ E𝑋.
𝑝≤𝑀
𝑝

Thus by Chebyshev’s inequality, for every constant 𝜆 > 0,


 √︁  Var 𝑋 1
P |𝑋 − log log 𝑛| ≥ 𝜆 log log 𝑛 ≤ = + 𝑜(1).
𝜆2 (log log 𝑛) 𝜆2

Finally, recall that |𝑋 − 𝜈| ≤ 10. So the same asymptotic bound holds with 𝑋 replaced
by 𝜈. □
Í
The main idea from the above proof is that the number of prime divisors 𝑋 = 𝑝 𝑋 𝑝
(from the previous proof) behaves like a sum of independent random variables.
A sum of independent random variables often satisfy a central limit theorem (i.e.,
asymptotic normality, convergence to Gaussian), assuming some mild regularity hy-
potheses. In particular, we have the following corollary of the Lindenberg–Feller
central limit theorem (see Durrett, Theorem 3.4.10):

Theorem 4.5.2 (Central limit theorem for sums of independent Bernoullis)


If 𝑋𝑛 is a sum of independent
√ Bernoulli random variables, and Var 𝑋𝑛 → ∞ as 𝑛 → ∞,
then (𝑋𝑛 − E𝑋𝑛 )/ Var 𝑋 converges to the normal distribution.

(Note that the divergent variance hypothesis is necessary and sufficient.)


In the setting of prime divisibility, we do not have genuine independence. Nevertheless,
it is natural to expect that 𝜈(𝑥) still satisfies a central limit theorem. This is indeed the
case, and can be proved by comparing moments against a genuine sum of independent
random Bernoulli random variables.

Theorem 4.5.3 (Asymptotic normality: Erdős and Kac 1940)


With 𝑥 ∈ [𝑛] uniformly chosen at random, 𝜈(𝑥) is asymptotically normal, i.e., for
every 𝜆 ∈ R,
! ∫ ∞
𝜈(𝑥) − log log 𝑛 1 2
lim P𝑥∈[𝑛] √︁ ≥𝜆 =√ 𝑒 −𝑡 /2 𝑑𝑡
𝑛→∞ log log 𝑛 2𝜋 𝜆

The original proof of Erdős and Kac verifies the above intuition using some more
involved results in analytic number theory. Simpler proofs have been subsequently
given, and we outline one such proof below, which is based on computing the moments

59
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

of the distribution. The idea of computing moments for this problem was first used by
Delange (1953), who was apparently not aware of the Erdős–Kacs paper. Also see a
more modern account by Granville and Soundararajan (2007).
The following tool from probability theory allows us to verify asymptotic normality
from convergence of moments.

Theorem 4.5.4 (Method of moments)


Let 𝑋𝑛 be a sequence of real valued random variables such that for every positive integer
𝑘, lim𝑛→∞ E[𝑋𝑛𝑘 ] equals to the 𝑘-th moment of the standard normal distribution. Then
𝑋𝑛 converges in distribution to the standard normal, i.e., lim𝑛→∞ P(𝑋𝑛 ≤ 𝑎) = P(𝑍 ≤
𝑎) for every 𝑎 ∈ R, where 𝑍 is a standard normal.

Remark 4.5.5. The same conclusion holds for any probability distribution that is
“determined by its moments,” i.e., there are no other distributions sharing the same
moments. Many common distributions that arise in practice, e.g., the Poisson distri-
bution, satisfy this property. There are various sufficient conditions for guaranteeing
this moments property, e.g., Carleman’s condition tells us that any probability distri-
bution whose moments do not increase too quickly is determined by its moments. (See
Durrett §3.3.5).

Proof of Erdős–Kacs Theorem 4.5.3. We compare higher moments of 𝑋 = 𝜈(𝑥) with


that of an idealized 𝑌 treating the prime divisors as truly random variables.
Set 𝑀 = 𝑛1/𝑠(𝑛) where 𝑠(𝑛) → ∞ sufficiently slowly. As earlier, 𝜈(𝑥) − 𝑠(𝑛) ≤ 𝜈(𝑥) ≤
𝑣(𝑥).
Í
We construct a “model random variable” mimicking 𝑋. Let 𝑌 = 𝑝≤𝑀 𝑌𝑝 , where
𝑌𝑝 ∼ Bernoulli(1/𝑝) independently for all primes 𝑝 ≤ 𝑀. We can compute:

𝜇 := E𝑌 ∼ E𝑋 ∼ log log 𝑛

and
𝜎 2 := Var 𝑌 ∼ Var 𝑋 ∼ log log 𝑛.

e = (𝑋 − 𝜇)/𝜎 and 𝑌e = (𝑌 − 𝜇)/𝜎.


Let 𝑋
A consequence of the Lindeberg–Feller central limit theorem is that a sum of inde-
pendent Bernoulli random variables with divergent variance satisfies the central limit
theorem. So 𝑌e → 𝑁 (0, 1) in distribution. In particular, E[𝑌e𝑘 ] ∼ E[𝑍 𝑘 ] (asymptotics
as 𝑛 → ∞) where 𝑍 is a standard normal.
Let us compare 𝑋 e𝑘 ] ∼ E[𝑌e𝑘 ].
e and 𝑌e. It suffices to show that for every fixed 𝑘, E[ 𝑋

60
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.6 Distinct sums

For every set of distinct primes 𝑝 1 , . . . 𝑝𝑟 ≤ 𝑀,


   
1 𝑛 1 1
E[𝑋 𝑝1 · · · 𝑋 𝑝𝑟 − 𝑌𝑝1 · · · 𝑌𝑝𝑟 ] = − =𝑂 .
𝑛 𝑝 1 · · · 𝑝𝑟 𝑝 1 · · · 𝑝𝑟 𝑛

Comparing expansions of 𝑋
e𝑘 in terms of the 𝑋 𝑝 ’s (𝑛𝑜(1) terms), we get

e𝑘 − 𝑌e𝑘 ] = 𝑛−1+𝑜(1) = 𝑜(1).


E[ 𝑋

It follows that 𝑋
e is asymptotically normal. □

4.6 Distinct sums


What is the largest subset of [𝑛] all of whose subsets have distinct sums? Equivalently:

Question 4.6.1
For each 𝑘, what is the smallest 𝑛 so that there exists 𝑆 ⊆ [𝑛] with |𝑆| = 𝑘 and all 2 𝑘
subset sums of 𝑆 are distinct?

E.g., 𝑆 = {1, 2, 22 , . . . , 2 𝑘−1 } (the greedy choice).


We begin with an easy pigeonhole argument. Since all 2 𝑘 sums are distinct and are at
most 𝑘𝑛, we have 2 𝑘 ≤ 𝑘𝑛. Thus 𝑛 ≥ 2 𝑘 /𝑘.
Erdős offered $300 for a proof or disproof of the following. It remains open.

Conjecture 4.6.2 (Erdős)


𝑛 ≳ 2𝑘

Let us use the second moment to give a modest improvement on the earlier pigeonhole
argument. The main idea here is that, by second moment, most of the subset sums lie
within an 𝑂 (𝜎)-interval, so that we can improve on the pigeonhole estimate ignoring
outlier subset sums.

Theorem 4.6.3

If there is a 𝑘-element subset of [𝑛] with distinct subset sums. Then 𝑛 ≳ 2 𝑘 / 𝑘.

Proof. Let 𝑆 = {𝑥 1 , . . . , 𝑥 𝑘 } be a 𝑘-element subset of [𝑛] with distinct subset sums.


Set
𝑋 = 𝜀1 𝑥1 + · · · + 𝜀 𝑘 𝑥 𝑘

61
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

where 𝜀𝑖 ∈ {0, 1} are chosen uniformly at random independently. We have


𝑥1 + · · · + 𝑥 𝑘
𝜇 := E𝑋 =
2
and
𝑥 12 + · · · + 𝑥 2𝑘 𝑛2 𝑘
𝜎 2 := Var 𝑋 = ≤ .
4 4
By Chebyshev’s inequality,

1
P(|𝑋 − 𝜇| ≥ 2𝜎) ≤ ,
4
and thus
√ 3
P(|𝑋 − 𝜇| < 𝑛 𝑘) = P(|𝑋 − 𝜇| < 2𝜎) ≥ .
4
−𝑘
√ every (𝜀 1 , . . . , 𝜀 𝑘 ) ∈ {0, 1} , we√have P(𝑋
Since 𝑋 takes distinct values for √ = 𝑥) ≤ 2
𝑘

for all 𝑥. Since there are ≤ 2𝑛 𝑘 elements in the interval (𝜇 − 𝑛 𝑘, 𝜇 + 𝑛 𝑘), we have
√ √
P(|𝑋 − 𝜇| < 𝑛 𝑘) ≤ 2𝑛 𝑘2−𝑘 .

Putting the upper and lowers bounds on P(|𝑋 − 𝜇| < 𝑛 𝑘) together, we get
√ 3
2𝑛 𝑘2−𝑘 ≤ .
4

So 𝑛 ≳ 2 𝑘 / 𝑘. □

Dubroff, Fox, and Xu (2021) gave another short proof of this result by applying Harper’s
vertex-isoperimetric inequality on the cube (this is an example of “concentration of
measure”, which we will explore more later this course).
Consider the graph representing the 𝑛-dimensional boolean cube, with vertex set {0, 1}𝑛
with an edge between every pair of 𝑛-tuples that differ in exactly one coordinate. Given
𝐴 ⊆ {0, 1}𝑛 , write 𝜕 𝐴 for the set of all vertices outside 𝐴 that is adjacent to some
vertex of 𝐴.

Theorem 4.6.4 (Vertex-isomperimetric inequality on the hypercube: Harper 1966)


Every 𝐴 ⊆ {0, 1} 𝑘 with | 𝐴| = 2 𝑘−1 has
 
𝑘
|𝜕 𝐴| ≥ .
⌊𝑘/2⌋

62
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.7 Weierstrass approximation theorem

Remark 4.6.5. A stronger form of Harper’s theorem gives the precise value of

min |𝜕 𝐴|
𝐴⊆{0,1} 𝑛 :| 𝐴|=𝑚

for every (𝑛, 𝑚). Basically, the minimum is achieved when 𝐴 is a a Hamming ball, or, if
𝑚 is not exactly the size of some Hamming ball, then 𝐴 consists of the lexicographically
first 𝑚 elements of {0, 1}𝑛 .

Theorem 4.6.6 (Dubroff–Fox–Xu 2021)


If there is a 𝑘-element subset of [𝑛] with distinct subset sums, then
  √︂ !
𝑘 2 2𝑘
𝑛≥ = + 𝑜(1) √ .
⌊𝑘/2⌋ 𝜋 𝑘

Remark 4.6.7. The above bound has the currently best known leading constant factor,
matching an earlier result by Aliev (2009).

Proof. Let 𝑆 = {𝑥 1 , . . . , 𝑥 𝑘 } be a subset of [𝑛] with distinct sums. Let


n 𝑥1 + · · · + 𝑥 𝑘 o
𝐴 = (𝜀1 , . . . , 𝜀 𝑘 ) ∈ {0, 1} 𝑘 : 𝜀1 𝑥1 + · · · + 𝜀1 𝑥 𝑘 < .
2
Note that due to the distinct sum hypothesis, one can never have 𝜀1 𝑥 1 + · · · + 𝜀 𝑘 𝑥 𝑘 =
(𝑥 1 + · · · + 𝑥 𝑛 )/2. It thus follows by symmetry that | 𝐴| = 2 𝑘−1 .
Note that every element of 𝜕 𝐴 corresponds to some sum of the form 𝑧 + 𝑥𝑖 > (𝑥 1 +
· · · + 𝑥 𝑙 )/2 for some 𝑧 < (𝑥 1 + · · · + 𝑥 𝑘 )/2, and thus 𝑧 + 𝑥𝑖 lies in the open interval
𝑥 + · · · + 𝑥 𝑥 + · · · + 𝑥 
1 𝑘 1 𝑘
, + max 𝑆 .
2 2
𝑘 
Since all subset sums are distinct, we must have 𝑛 ≥ |𝜕 𝐴| ≥ ⌊𝑘/2⌋ by Harper’s
theorem (Theorem 4.6.4). □

4.7 Weierstrass approximation theorem


We finish off the chapter with an application to analysis.
The Weierstrass approximation theorem says that every continuous real function on an
interval can be uniformly approximated by a polynomial.

63
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

Theorem 4.7.1 (Weierstrass approximation theorem 1885)


Let 𝑓 : [0, 1] → R be a continuous function. Let 𝜀 > 0. Then there is a polynomial
𝑝(𝑥) such that | 𝑝(𝑥) − 𝑓 (𝑥)| ≤ 𝜀 for all 𝑥 ∈ [0, 1].

Proof. (Bernstein 1912) The idea is to approximate 𝑓 by a sum of polynomials that


look like “bumps”:
𝑛
∑︁
𝑃𝑛 (𝑥) = 𝐸𝑖 (𝑥) 𝑓 (𝑖/𝑛)
𝑖=0
where
 
𝑛 𝑖
𝐸𝑖 (𝑥) = P(Binomial(𝑛, 𝑥) = 𝑖) = 𝑥 (1 − 𝑥) 𝑛−𝑖 for 0 ≤ 𝑖 ≤ 𝑛
𝑖

is chosen as some polynomials peaks at 𝑥 = 𝑖/𝑛 and then decays as 𝑥 moves away from
𝑖/𝑛.
For each 𝑥 ∈ [0, 1], the binomial distribution Binomial(𝑛, 𝑥) has mean 𝑛𝑥 and variance
𝑛𝑥(1 − 𝑥) ≤ 𝑛. By Chebyshev’s inequality,
∑︁
𝐸𝑖 (𝑥) = P(|Binomial(𝑛, 𝑥) − 𝑛𝑥| > 𝑛2/3 ) ≤ 𝑛−1/3 .
𝑖:|𝑖−𝑛𝑥|>𝑛2/3

(In the next chapter, we will see a much better tail bound.)
Since [0, 1] is compact, 𝑓 is uniformly continuous and bounded. By multiplying by
a constant, we assume that | 𝑓 (𝑥)| ≤ 1 for all 𝑥 ∈ [0, 1]. Also there exists 𝛿 > 0 such
that | 𝑓 (𝑥) − 𝑓 (𝑦)| ≤ 𝜀/2 for all 𝑥, 𝑦 ∈ [0, 1] with |𝑥 − 𝑦| ≤ 𝛿.
Take 𝑛 > max{64𝜀 −3 , 𝛿−3 }. Then for every 𝑥 ∈ [0, 1] (note that 𝑛𝑗=0 𝐸 𝑗 (𝑥) = 1),
Í

𝑛
∑︁
|𝑃𝑛 (𝑥) − 𝑓 (𝑥)| ≤ 𝐸𝑖 (𝑥)| 𝑓 (𝑖/𝑛) − 𝑓 (𝑥)|
𝑖=0
∑︁ ∑︁
≤ 𝐸𝑖 (𝑥)| 𝑓 (𝑖/𝑛) − 𝑓 (𝑥)| + 2𝐸𝑖 (𝑥)
𝑖:|𝑖/𝑛−𝑥|<𝑛 −1/3 <𝛿 𝑖:|𝑖−𝑛𝑥|>𝑛2/3
𝜀
≤ + 2𝑛−1/3 ≤ 𝜀. □
2

Exercises
1. Let 𝑋 be a nonnegative real-valued random variable. Suppose P(𝑋 = 0) < 1.
Prove that
Var 𝑋
P(𝑋 = 0) ≤ .
E[𝑋 2 ]

64
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.7 Weierstrass approximation theorem

2. Let 𝑋 be a random variable with mean 𝜇 and variance 𝜎 2 . Prove that for all
𝜆 > 0,
𝜎2
P(𝑋 ≥ 𝜇 + 𝜆) ≤ 2 .
𝜎 + 𝜆2

3. Threshold for 𝑘-APs. Let [𝑛] 𝑝 denote the random subset of {1, . . . , 𝑛} where
every element is included with probability 𝑝 independently. For each fixed
integer 𝑘 ≥ 3, determine the threshold for [𝑛] 𝑝 to contain a 𝑘-term arithmetic
progression.
4. What is the threshold function for 𝐺 (𝑛, 𝑝) to contain a cycle?
5. Show that, for each fixed positive integer 𝑘, there is a sequence 𝑝 𝑛 such that

P(𝐺 (𝑛, 𝑝 𝑛 ) has a connected component with exactly 𝑘 vertices) → 1 as 𝑛 → ∞.

Hint: Make the random graph contain some specific subgraph but not some others.

6. Poisson limit. Let 𝑋 be the number of triangles in 𝐺 (𝑛, 𝑐/𝑛) for some fixed
𝑐 > 0.

a) For every nonnegative integer 𝑘, determine the limit of E 𝑋𝑘 as 𝑛 → ∞.
b) Let 𝑌 ∼ Binomial(𝑛, 𝜆/𝑛) for some fixed 𝜆 > 0. For every nonnegative

integer 𝑘, determine the limit of E 𝑌𝑘 as 𝑛 → ∞, and show that it agrees
with the limit in (a) for some 𝜆 = 𝜆(𝑐).
We know that 𝑌 converges to the Poisson distribution with mean 𝜆. Also,
the Poisson distribution is determined by its moments.
c) Compute, for fixed every nonnegative integer 𝑡, the limit of P(𝑋 = 𝑡) as
𝑛 → ∞.
(In particular, this gives the limit probability that 𝐺 (𝑛, 𝑐/𝑛) contains a
triangle, i.e., lim𝑛→∞ P(𝑋 > 0). This limit increases from 0 to 1 continu-
ously when 𝑐 ranges from 0 to +∞, thereby showing that the property of
containing a triangle has a coarse threshold.)
7. Central limit theorem for triangle counts. Find a real (non-random) sequence 𝑎 𝑛
so that, letting 𝑋 be the number of triangles and 𝑌 be the number of edges in the
random graph 𝐺 (𝑛, 1/2), one has

Var(𝑋 − 𝑎 𝑛𝑌 ) = 𝑜(Var 𝑋).



Deduce that 𝑋 is asymptotically normal, that is, (𝑋 − E𝑋)/ Var 𝑋 converges to
the normal distribution.
(You can solve for the minimizing 𝑎 𝑛 directly similar to ordinary least squares linear regression,

65
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4 Second Moment

or first write the edge indicator variables as 𝑋𝑖 𝑗 = (1 + 𝑌𝑖 𝑗 )/2 and then expand. The latter
approach likely yields a cleaner computation.)

8. Isolated vertices. Let 𝑝 𝑛 = (log 𝑛 + 𝑐 𝑛 )/𝑛.


a) Show that, as 𝑛 → ∞,
(
0 if 𝑐 𝑛 → −∞,
P(𝐺 (𝑛, 𝑝 𝑛 ) has no isolated vertices) →
1 if 𝑐 𝑛 → ∞.

b) Suppose 𝑐 𝑛 → 𝑐 ∈ R, compute, with proof, the limit of LHS above as


𝑛 → ∞, by following the approach in 6.
9. ★ Is the threshold for the bipartiteness of a random graph coarse or sharp?
(You are not allowed to use any theorems that we did not prove in class/notes.)

10. Triangle packing. Prove that, with probability approaching 1 as 𝑛 → ∞,


𝐺 (𝑛, 𝑛 −1/2 ) has at least 𝑐𝑛3/2 edge-disjoint triangles, where 𝑐 > 0 is some
constant.
Hint: rephrase as finding a large independent set

11. Nearly perfect triangle factor. Prove that, with probability approaching 1 as
𝑛 → ∞,
a) 𝐺 (𝑛, 𝑛−2/3 ) has at least 𝑛/100 vertex-disjoint triangles.
b) Simple nibble. 𝐺 (𝑛, 𝐶𝑛−2/3 ) has at least 0.33𝑛 vertex-disjoint triangles,
for some constant 𝐶.
iterate (a)
Hint: view a random graph as the union of several independent random graphs &

12. Permuted correlation. Recall that the correlation of two non-constant


√︁ random
variables 𝑋 and 𝑌 is defined to be corr(𝑋, 𝑌 ) := Cov[𝑋, 𝑌 ]/ (Var 𝑋)(Var 𝑌 ).
Let 𝑓 , 𝑔 ∈ [𝑛] → R be two non-constant functions. Prove that there exist
permutations 𝜋 and 𝜏 of [𝑛] such that, with 𝑍 being a uniform random element
of [𝑛],
2
corr( 𝑓 (𝜋(𝑍)), 𝑔(𝑍)) − corr( 𝑓 (𝜏(𝑍)), 𝑔(𝑍)) ≥ √ .
𝑛−1
Furthermore, show that equality can be achieved for even 𝑛.
Hint: Compute the variance of the correlation for a random permutation.


13. Let 𝑣 1 = (𝑥 1 , 𝑦 1 ), . . . , 𝑣 𝑛 = (𝑥 𝑛 , 𝑦 𝑛 ) ∈ Z2 with |𝑥𝑖 | , |𝑦𝑖 | ≤ 2𝑛/2 /(100 𝑛) for all
𝑖 ∈ [𝑛]. Show that there are two disjoint sets 𝐼, 𝐽 ⊆ [𝑛], not both empty, such
Í Í
that 𝑖∈𝐼 𝑣 𝑖 = 𝑗 ∈𝐽 𝑣 𝑗 .

66
MIT OCW: Probabilistic Methods in Combinatorics — Yufei Zhao

4.7 Weierstrass approximation theorem

14. ★ Prove that there is an absolute constant 𝐶 > 0 so that the following holds. For
every prime 𝑝 and every 𝐴 ⊆ Z/𝑝Z with | 𝐴| = 𝑘, there exists √ an integer 𝑥 so
that {𝑥𝑎 : 𝑎 ∈ 𝐴} intersects every interval of length at least 𝐶 𝑝/ 𝑘 in Z/𝑝Z.
15. ★ Prove that there is a constant 𝑐 > 0 so that every hyperplane containing the
origin in R𝑛 intersects at least 𝑐-fraction of the 2𝑛 closed unit balls centered at
{−1, 1}𝑛 .

67
MIT OpenCourseWare
https://ocw.mit.edu

18.226 Probabilistic Methods in Combinatorics


Fall 2022

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

You might also like