The Degree Sequence of A Scale-Free Random Graph Process: B Ela Bollob As, Oliver Riordan, Joel Spencer, G Abor Tusn Ady
The Degree Sequence of A Scale-Free Random Graph Process: B Ela Bollob As, Oliver Riordan, Joel Spencer, G Abor Tusn Ady
ABSTRACT: Recently, Barabási and Albert [2] suggested modeling complex real-world net-
works such as the worldwide web as follows: consider a random graph process in which
vertices are added to the graph one at a time and joined to a fixed number of earlier ver-
tices, selected with probabilities proportional to their degrees. In [2] and, with Jeong, in [3],
Barabási and Albert suggested that after many steps the proportion Pd of vertices with de-
gree d should obey a power law Pd α d −γ . They obtained γ = 29 ± 01 by experiment and
gave a simple heuristic argument suggesting that γ = 3. Here we obtain Pd asymptotically
for all d ≤ n1/15 , where n is the number of vertices, proving as a consequence that γ = 3.
© 2001 John Wiley & Sons, Inc.   Random Struct. Alg., 18, 279–290, 2001
1. INTRODUCTION
Recently there has been considerable interest in using random graphs to model
complex real-world networks to gain an insight into their properties. One of the
                                                                                            279
280                                                                      BOLLOBÁS ET AL.
most basic properties of a graph or network is its degree sequence. For the stan-
dard random graph model n m of all graphs with m edges on a fixed set of n
vertices, introduced by Erdős and Rényi in [8] and studied in detail in [9], there is
a “characteristic” degree 2m/n: the vertex degrees have approximately a Poisson
or normal distribution with mean 2m/n. The same applies to the closely related
model n p introduced by Gilbert [10], where vertices are joined independently
with probability p. In contrast, Barabási and Albert [2], as well as several other
groups (see [4, 14] and the references therein), noticed that in many real-world ex-
amples the degree sequence has a “scale-free” power law distribution: the fraction
Pd of vertices with degree d is proportional over a large range to d −γ , where γ
is a constant independent of the size of the network. To explain this phenomenon,
Barabási and Albert [2] suggested the following random graph process as a
model.
   This process is intended as a highly simplified model of the growth of the world-
wide web, for example, the vertices representing sites or web pages, and the edges
links from sites to earlier sites. The preferential attachment assumption is based on
the idea that a new site is more likely to link to existing sites which are “popular” at
the time the site is added. For m = 1 this process is very similar to the nonuniform
random recursive tree process considered in [15, 17, 18]. An alternative model,
replacing the preferential attachment assumption by a notion of “link copying” is
given in [12, 14]. We shall discuss these models briefly in the final section.
   In [2, 3] it is stated that computer experiments for the process above suggest that
Pd α d −γ with γ = 29 ± 01. In [3], the following heuristic argument is given to
suggest that γ = 3: consider the degree di of the ith new vertex vi at time t, i.e.,
when there are t + m0 vertices and mt edges. When a new vertex is added, the
probability that it is joined to vi is mdi over the sum of the degrees, i.e., over 2mt.
This suggests the “mean-field theory”
                                      d di  d
                                           = i
                                       dt   2t
With the initial condition that di = m when t = i this gives di = mt/i1/2 , which
yields γ = 3.
   Here we show how one can calculate the exact distribution of di at time t and
obtain an asymptotic formula for Pd, d ≤ t 1/15 , which gives γ = 3 as a simple
consequence. The first step is to give an exact definition of a random graph process
that fits the rather vague description given above.
DEGREE SEQUENCE OF A RANDOM GRAPH                                                              281
2. THE MODEL
The description of the random graph process quoted above is rather imprecise.
First, as the degrees are initially zero, it is not clear how the process is supposed to
get started. More seriously, the expected number of edges linking a new vertex v to
earlier vertices is i ki  = 1, rather than m. Also, when choosing in one go a set
S of m earlier vertices as the neighbors of v, the distribution of S is not specified
by giving the marginal probability that each vertex lies in S. For a trivial example,
suppose that m = 2 and that the first four vertices form a four-cycle. Then for any
0 ≤ α ≤ 1/4 we could join the fifth vertex to each adjacent pair with probability
α and to each nonadjacent pair with probability 1/2 − 2α. This suggests that for
m > 1 we should choose the neighbors of v one at a time. Once doing so, it is very
natural to allow some of these neighbors to be the same, creating multiple edges in
the graph. Here we shall consider the precise model introduced in [6], which turns
out to be particularly pleasant to work with. This model fits the description above
except that it allows multiple edges and also loops—in terms of the interpretation
there is no reason to exclude these. Once the process gets started there will in any
case not be many loops or multiple edges, so they should have little effect overall.
The following definition is essentially as given in [6]; we write dG v for the total
(in plus out) degree of the vertex v in the graph G.
   We start with the case m = 1. Consider a fixed sequence of vertices v1  v2     
We shall inductively define a random graph process Gt1 t≥0 so that Gt1 is a directed
graph on 	vi 
 1 ≤ i ≤ t, as follows. Start with G01 the “graph” with no vertices, or
with G11 the graph with one vertex and one loop. Given Gt−1                  t
                                                                 1 , form G1 by adding
the vertex vt together with a single edge directed from vt to vi , where i is chosen
randomly with
                                  
                                      dGt−1
                                        1 vs 
                                                /2t − 1     1≤s ≤t−1
                    i = s =                                                                  (1)
                                      1/2t − 1              s = t
In other words, we send an edge e from vt to a random vertex vi , where the prob-
ability that a vertex is chosen as vi is proportional to its (total) degree at the time,
counting e as already contributing one to the degree of vt . For m > 1 we add m
edges from vt one at a time, counting the previous edges as well as the “outward
half” of the edge being added as already contributing to the degrees. Equivalently,
we define the process Gtm t≥0 by running the process Gt1  on a sequence v1  v2    ;
the graph Gtm is formed from Gmt                                                         
                                          1 by identifying the vertices v1  v2      vm to form
                                      
v1 , identifying vm+1  vm+2     v2m to form v2 , and so on.
                           n
   We shall write m            for the probability space of directed graphs on n vertices
v1  v2      vn  where a random Gnm ∈ m     n
                                                      has the distribution derived from the
process above. As Gm is defined in terms of Gmn
                             n
                                                           1 , for most of the time we shall
consider the case m = 1. As noted in [6], there is an alternative description of the
distribution of Gn1 in terms of pairings.
   An n-pairing  is a partition of the set 	1 2     2n into pairs, so there are
2n − 1!! = 2n!/n!2n  n-pairings. Thinking of the elements 1 2     2n of the
ground set as points on the x axis, and the pairs as chords joining them, we shall
speak of the left and right endpoint of each pair.
282                                                                      BOLLOBÁS ET AL.
In [2] it was suggested that the fraction of vertices of Gnm having degree d should fall
off as d −3 as d → ∞. We shall prove the following precise version of this statement,
writing #nm d for the number of vertices of Gnm with indegree equal to d, i.e., with
(total) degree m + d.
Theorem 1. Let m ≥ 1 be fixed, and let Gnm n≥0 be the random graph process defined
in Section 2. Let
                                          2mm + 1
                      αm d =                                 
                                d + md + m + 1d + m + 2
                                           #nm d
                         1 − αm d ≤            ≤ 1 + αm d
                                              n
for every d in the range 0 ≤ d ≤ n1/15 .
  In turns out that we only need to calculate the expectation of #nm d; the con-
centration result is then given by applying the following standard inequality due to
Azuma [1] and Hoeffding [11] (see also [5]).
   The strategy of the proof is as follows. First, as mentioned earlier, the results for
general m will follow from those for m = 1. We shall use the pairing model to find
explicitly the distribution of Dk , the sum of the first k degrees, in this case, and
also the distribution of the next degree, dGn1 vk+1 , given Dk . One could combine
these formulae to give a rather unilluminating expression for the distribution of
dGn1 vk+1 ; instead we show that Dk is concentrated about a certain value and hence
find approximately the probability that dGn1 vk+1  = d. Summing over k gives us the
expectation of #n1 d, and concentration follows from Lemma 2.
   Before turning to the distributions of the (total) degrees for m = 1, we note that
their expectations are easy to calculate exactly:
                                                            1
                                  ƐdGt1 vt  = 1 +           
                                                         2t − 1
Also, for s < t,
                                                                 dGt−1 vs 
                     Ɛ dGt1 vs   dGt−1 v s    = dG t−1 v  +
                                                              s
                                                                     1
                                                                               
                                      1                 1
                                                                    2t − 1
which implies that
                                               2t
                              Ɛ dGt1 vs  =          ƐdGt−1 vs 
                                               2t − 1     1
Thus, for 1 ≤ s ≤ n,
                    n
                                2i     4n−s+1 n!2 2s − 2! 
        Ɛ dGn1 vs  =               =                     = n/s 1 + O1/s 
                        i=s
                              2i − 1     2n!s − 1!2
partial pairings obtained in this way may arise as . Similarly, for  there are
                                      
                           2n − 2k − s 2n − 2k − 2s!
                                 s       2n−k−s n − k − s!
possibilities. Any possible  may be combined with any possible  to form  by
pairing off the unpaired elements of  with those of  in any of s! ways. Multiplying
together and dividing by the total number 2n!/2n n! of n-pairings we see that for
1 ≤ k ≤ n and 0 ≤ s ≤ n − k,
   We now turn to the probability that dk+1 = d + 1, i.e., that the indegree of vk+1
is d, given Dk . Suppose that 1 ≤ k ≤ n − 1 and 0 ≤ s ≤ n − k, and consider a left
partial pairing  as above. We have already seen that each such  has
                                       
                            2n − 2k − s
                       s!                 2n − 2k − 2s − 1!!
                                 s
DEGREE SEQUENCE OF A RANDOM GRAPH                                                  285
   It is easy to see that (4) also applies when k = s = 0, when we obtain d1 =
d + 1 = d2d nd /2nd+1 . For k ≥ 1 we can of course combine (2) and (4) to give
a rather unilluminating expression for dk+1 = d + 1. Instead, we shall use (3)
and (4) to estimate the expectation of #n1 d, the number of vertices of Gn1 with
indegree d. Above and in what follows the functions implied by o or ∼ notation
are to be interpreted as depending on n only, not on d or k. Also, the constant
implied by O notation is absolute.
   Let M = n4/5 / log n, let k = kn be any function satisfying M ≤ k ≤ n − M,
                                                         1/15
and
 √ let d =   dn be any function satisfying 0 ≤ d ≤ n . For any D with D −
2 kn ≤ 4 n log n we can use (4) to write dk+1 = d + 1  Dk = D as
                                                     √        
             √                                                        d
                                         d n + k − 2 kn + O n log n
           2 kn − 2k + O n log n2              √                    
                                            2n − 2 kn + O n log nd+1
                                                                 √       √    √
                                                                                  2
Using  the bounds on d and k we find that the ratio of n + k
                                                           √− 2 kn=  n − k
to
 d n log n tends√ to infinity as n → ∞, as does 2n − 2 kn/d n log n. Also,
   n log n = o2 kn − 2k, so the probability above is equal to
                         √        √     √   
d
                        2 kn − 2k 2 n − k2     √   √ d
             1 + o1       √         √       ∼ κ 1− κ 
                        2n − 2 kn 2n − kn
In particular, although it is not relevant for the proof, we note that for almost every
vertex the most likely indegree is zero.
  Keeping n and d fixed and varying k in the range M ≤ k ≤ n − M, as the
estimate above is uniform in k we find that the expected number of vertices vk+1 ,
M ≤ k ≤ n − M, with degree equal to d + 1 can be written as
                               n−M
                                                     
                      o1 +         1 + o1 k/n1 − k/nd 
                               k=M
286                                                                                               BOLLOBÁS ET AL.
                    √     √
  Writing f =        κ1 − κd , we have
                                       1 df   κ−1   d κ−1/2
                                            =     −            
                                       f dκ    2    2 1 − κ1/2
                                                        4n                      4n
 Ɛ#n1 d = OM + 1 + o1                                      ∼                       
                                               d + 1d + 2d + 3   d + 1d + 2d + 3
                                                                         d
                                                            
       dK+j+1 = d + 1  d1  d2      dK+j ∼ K + j/N 1 − K + j/N
                                                                 √     √
                                                             ∼    κ1 − κd 
                                                                               m √
                                                                                          √
         dk+1 = d + m = on−1  + 1 + o1                                         κ1 − κaj
                                                                    a1 +···+am =d j=1
                                                                               
                                                                       d + m − 1 m/2    √
                                   = on−1  + 1 + o1                         κ 1 − κd 
                                                                         m−1
Proceeding as before we can express the expectation of the number #nm d of ver-
tices of Gnm with indegree d in terms of
              1               √                    1                              m + 1!d!
                   κm/2 1 −       κd dκ = 2            1 − um+1 ud du = 2                   
            0                                     0                                d + m + 2!
DEGREE SEQUENCE OF A RANDOM GRAPH                                                     287
                                            2mm + 1n
                    Ɛ#nm d ∼                                                     (6)
                                   d + md + m + 1d + m + 2
Proof of Theorem 1. We return to considering the graph Gnm as one graph from the
process Gtm t≥0 . Fix m ≥ 1, n ≥ 1 and 0 ≤ d ≤ n1/15 , and consider the martingale
Xt = Ɛ#nm d  Gtm  for 0 ≤ t ≤ n. We have Xn = #nm d, while X0 = Ɛ#nm d.
We claim that the differences Xt+1 − Xt  are bounded by two. To see this note that
whether at stage t we join vt to vi or vj does not affect the degrees at later times of
vertices vk , k ∈
                / 	i j. More precisely, the joint distribution of all other degrees is
the same in either case. Since we are just counting vertices with a particular degree,
no matter how much the degrees of vi and vj are changed in Gnm , this changes
#nm d by at most two.
   An alternative way of seeing this is to say that at stage t we add a half edge
h2t−1 directed from vt/m paired with a half edge h2t directed to some other vertex,
and to consider h2t not as attached to a random vertex, but rather as associated
with equal probability to any of h1      h2t−1 . In the final graph a half edge h2t is
attached to vt/m , while a half edge h2t−1 is attached to the vertex the half edge it
is associated to is attached to. If we change the choice made at stage t, the effect
on the final graph is to move the half edge h2t−1 and all later half edges associated
directly or indirectly to h2t−1 together. This operation only affects two degrees.
   Applying Lemma 2, the Azuma–Hoeffding inequality, we find that for each d
with 0 ≤ d ≤ n1/15 we have
               	
                    
               
              
#nm d − Ɛ#nm d
 ≥ n log n ≤ e− log n/8 = on−1/15 
                                                          2mm+1n
Noting from (6) that in this range Ɛ#nm d ∼                           and that this is
                                                   d+md+m+1d+m+2
much larger than n log n, the result follows.
   It is natural to ask how far Theorem 1 can be extended to degrees d > n1/15 .
The bound d ≤ n1/15 was chosen to make the proof as simple as possible, and can
certainly be weakened considerably, by choosing a suitable cutoff and considering
“early” and “late” vertices separately. For large d, Eq. (6) suggests that the expected
                                                                                  2
number of vertices with degree at least d should √ be roughly mm + 1n/d and
hence that the maximum degree should be ) n. It turns out that this is indeed
the case, as could be proved using, for example, the analysis of the pairing model
given in [6].
4. UNIFORM ATTACHMENT
lines of Theorem 1. As the argument is similar to but much simpler than that given
above, we only give a rough outline.
   Consider a random process in which vertices are added one at a time, starting
from any given finite graph G. Suppose that when the vertex vi is added, it is joined
to m earlier vertices, in such a way that the expected number of edges from vi to
vk is the same for all k < i. (It does not matter whether we allow multiple edges
or not.) Then
           the expected indegree of vk when n vertices have been added is
exactly m ni=k+1 1i . For κ = k/n bounded away from 0 and 1, it is easy to see the
degree of vk has asymptotically a Poisson distribution with mean λ ∼ −m logκ.
Thus, arguing as before, for any fixed d the proportion of vertices with indegree d
is asymptotically
                  1   λd −λ     md  1                       md
                          e dκ =        − log κd κm dκ =            
                0      d!        d! 0                      m + 1d+1
5. CONCLUDING REMARKS
where ρ = t0 /t. At time 1, writing κ for t0 and d for i − m as before, this reduces
to the expression
                                       
                              d + m − 1 m/2        √
                                          κ 1 − κd
                                m−1
seen in Section 3. For constant d, integrating over κ as before suggests the bounds
on #nm d given in Theorem 1.
  Recall that when defining the process Gt1  above, it was convenient to start from
the “graph” with no vertices, or from the graph with one vertex and no loops. As far
as our results are concerned, however, it makes no difference where we start. Given
any finite graph G one can define a similar process to Gt1 , or Gtm , starting from
G. Now the joint distribution of the degrees of vs+1      vt in Gtm is independent
of Gsm . If G has s edges then, speaking loosely, for m = 1 we can just “pretend”
DEGREE SEQUENCE OF A RANDOM GRAPH                                                        289
that G has s vertices as well. As far as all new vertices are concerned, the process
starting from G is then indistinguishable from the process Gt1 t>s , so asymptotic
results such as Theorem 1 are unaffected by the starting graph G.
   We finish by comparing the model considered here with one much older random
process and two new ones. Graphs in which each vertex (apart from the first few)
is joined to a fixed number m of randomly chosen earlier vertices are known in the
literature as random recursive dags, or random recursive trees if m = 1 (see, e.g., [7]).
For m > 1 only uniform random recursive dags have been studied significantly. For
m = 1, however, nonuniform random recursive trees with attachment probabilities
proportional to the degrees (also known as random plane-oriented recursive trees)
have been studied; see [7, 16, 18], for example. These objects are very close to
the random graph Gn1 considered here. The only differences are that loops are not
allowed and that the root vertex is sometimes treated in a slightly different way.
The expected number of vertices of degree d = dn in these objects was found
to within an additive constant by Szymański [18]; a concentration result for d fixed
was given by Lu and Feng [15]. For a survey of results on random recursive trees
see [17].
   Finally, a rather different model for the worldwide web graph was introduced
in [12]. Again, vertices are born one at a time, but instead of preferential attach-
ment, each new vertex picks an old vertex to copy from and copies a randomly
selected part of its neighborhood, as well as choosing (uniformly) new neighbors of
its own. In [13, 14] it is shown that such models also give power-law degree distri-
butions, as well as explaining the high number of dense bipartite subgraphs found
in the web graph.
REFERENCES
 [1] K. Azuma, Weighted sums of certain dependent variables, Tôhoku Math J 3 (1967),
     357–367.
 [2] A.-L. Barabási and R. Albert, Emergence of scaling in random networks, Science 286
     (1999), 509–512.
 [3] A.-L. Barabási, R. Albert, and H. Jeong, Mean-field theory for scale-free random net-
     works, Physica A 272 (1999), 173–187.
 [4] A.-L. Barabási, R. Albert, and H. Jeong, Scale-free characteristics of random networks:
     the topology of the world-wide web, Physica A 281 (2000), 69–77.
 [5] B. Bollobás, Martingales, isoperimetric inequalities and random graphs, in Combina-
     torics (Eger, 1987), Colloq. Math. Soc. János Bolyai, Vol. 52, North-Holland, Amster-
     dam, 1988, 113–139.
 [6] B. Bollobás and O.M. Riordan, The diameter of a scale-free random graph, submitted
     for publication.
 [7] L. Devroye and J. Lu, The strong convergence of maximal degrees in uniform random
     recursive trees and dags, Random Structures Algorithms 7 (1995), 1–14.
 [8] P. Erdős and A. Rényi, On random graphs. I, Publ Math Debrecen 6 (1959), 290–297.
 [9] P. Erdős and A. Rényi, On the evolution of random graphs, Magyar Tud Akad Mat
     Kutató Int Kőzl 5 (1960), 17–61.
[10] E.N. Gilbert, Random graphs, Ann Math Statist 30 (1959), 1141–1144.
290                                                                        BOLLOBÁS ET AL.
[11] W. Hoeffding, Probability inequalities for sums of bounded random variables, J Amer
     Statist Assoc 58 (1963), 13–30.
[12] J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, The web as a
     graph: measurements, models, and methods, COCOON 1999.
[13] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, Extracting large scale knowl-
     edge bases from the web, VLDB 1999.
[14] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal,
     Stochastic models for the web graph, FOCS 2000.
[15] J. Lu and Q. Feng, Strong consistency of the number of vertices of given degrees in
     nonuniform random recursive trees, Yokohama Math J 45 (1998), 61–69.
[16] H.M. Mahmoud, R.T. Smythe, and J. Szymański, On the structure of random plane-
     oriented recursive trees and their branches, Random Structures Algorithms 4 (1993),
     151–176.
[17] H.M. Mahmoud and R.T. Smythe, A survey of recursive trees, Theory Probability Math
     Statist 51 (1995), 1–27.
[18] J. Szymański, On a nonuniform random recursive tree, Annals Discrete Math 33 (1987),
     297–306.