Newman PARETO ZIPF
Newman PARETO ZIPF
Contemporary Physics
Publication details, including instructions for authors and subscription information:
http://www.informaworld.com/smpp/title~content=t713394025
Power laws, Pareto distributions and Zipf's law
MEJ Newman a
a
Department of Physics and Center for the Study of Complex Systems, University of
Michigan, Ann Arbor, USA
M.E.J. NEWMAN*
Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor,
MI 48109, USA
When the probability of measuring a particular value of some quantity varies inversely as
a power of that value, the quantity is said to follow a power law, also known variously as
Zipf’s law or the Pareto distribution. Power laws appear widely in physics, biology, earth
and planetary sciences, economics and finance, computer science, demography and the
social sciences. For instance, the distributions of the sizes of cities, earthquakes, forest
fires, solar flares, moon craters and people’s personal fortunes all appear to follow power
laws. The origin of power-law behaviour has been a topic of debate in the scientific
community for more than a century. Here we review some of the empirical evidence for
the existence of power-law forms and the theories proposed to explain them.
Contemporary Physics
ISSN 0010-7514 print/ISSN 1366-5812 online ª 2005 Taylor & Francis Group Ltd
http://www.tandf.co.uk/journals
DOI: 10.1080/00107510500052444
324 M.E.J. Newman
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
Figure 1. Left: histogram of heights in centimetres of American males. Data from the National Health Examination Survey,
1959 – 1962 (US Department of Health and Human Services). Right: histogram of speeds in miles per hour of cars on UK
motorways. Data from Transport Statistics 2003 (UK Department for Transport).
Figure 2. Left: histogram of the populations of all US cities with population of 10000 or more. Right: another histogram of
the same data, but plotted on logarithmic scales. The approximate straight-line form of the histogram in the right panel
implies that the distribution follows a power law. Data from the 2000 US Census.
population much higher than the typical value, producing when plotted in this fashion, follows quite closely a straight
the long tail to the right of the histogram. This right-skewed line. This observation seems first to have been made by
form is qualitatively quite different from the histograms of Auerbach [1], although it is often attributed to Zipf [2].
people’s heights, but is not itself very surprising. Given that What does it mean? Let p(x) dx be the fraction of cities with
we know there is a large dynamic range from the smallest to population between x and x + dx. If the histogram is a
the largest city sizes, we can immediately deduce that there straight line on log – log scales, then ln p(x) = – a ln x + c,
can only be a small number of very large cities. After all, in where a and c are constants. (The minus sign is optional,
a country such as America with a total population of 300 but convenient since the slope of the line in figure 2 is
million people, you could at most have about 40 cities the clearly negative.) Taking the exponential of both sides, this
size of New York. And the 2700 cities in the histogram of is equivalent to
figure 2 cannot have a mean population of more than
pðxÞ ¼ Cxa ; ð1Þ
3 6 108/2700 = 110 000.
What is surprising on the other hand, is the right panel of
figure 2, which shows the histogram of city sizes again, but with C = exp(c).
this time replotted with logarithmic horizontal and vertical Distributions of the form (1) are said to follow a power
axes. Now a remarkable pattern emerges: the histogram, law. The constant a is called the exponent of the power law.
Power laws, Pareto distributions and Zipfs law 325
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
(The constant C is mostly uninteresting; once a is fixed, it is (a) of the figure shows a normal histogram of the numbers,
determined by the requirement that the distribution p(x) produced by binning them into bins of equal size 0.1. That
sum to 1; see section 3.1.) is, the first bin goes from 1 to 1.1, the second from 1.1 to
Power-law distributions occur in an extraordinarily 1.2, and so forth. On the linear scales used this produces a
diverse range of phenomena. In addition to city popula- nice smooth curve.
tions, the sizes of earthquakes [3], moon craters [4], solar To reveal the power-law form of the distribution it is
flares [5], computer files [6] and wars [7], the frequency of better, as we have seen, to plot the histogram on
use of words in any human language [2, 8], the frequency of logarithmic scales, and when we do this for the current
occurrence of personal names in most cultures [9], the data we see the characteristic straight-line form of the
numbers of papers scientists write [10], the number of power-law distribution, figure 3 (b). However, the plot is
citations received by papers [11], the number of hits on web in some respects not a very good one. In particular the
pages [12], the sales of books, music recordings and almost right-hand end of the distribution is noisy because of
every other branded commodity [13, 14], the numbers of sampling errors. The power-law distribution dwindles in
species in biological taxa [15], people’s annual incomes [16] this region, meaning that each bin only has a few samples
and a host of other variables all follow power-law in it, if any. So the fractional fluctuations in the bin
distributions*. counts are large and this appears as a noisy curve on the
Power-law distributions are the subject of this article. In plot. One way to deal with this would be simply to throw
the following sections, I discuss ways of detecting power- out the data in the tail of the curve. But there is often
law behaviour, give empirical evidence for power laws in a useful information in those data and furthermore, as we
variety of systems and describe some of the known will see in section 2.1, many distributions follow a power
mechanisms by which power-law behaviour can arise. law only in the tail, so we are in danger of throwing out
Readers interested in pursuing the subject further may the baby with the bathwater.
also wish to consult the recent reviews by Sornette [18] and An alternative solution is to vary the width of the bins
Mitzenmacher [19], as well as the bibliography compiled by in the histogram. If we are going to do this, we must also
Li{. normalize the sample counts by the width of the bins they
fall in. That is, the number of samples in a bin of width
Dx should be divided by Dx to get a count per unit interval
2. Measuring power laws
of x. Then the normalized sample count becomes
Identifying power-law behaviour in either natural or man- independent of bin width on average and we are free to
made systems can be tricky. The standard strategy makes vary the bin widths as we like. The most common choice
use of a result we have already seen: a histogram of a is to create bins such that each is a fixed multiple wider
quantity with a power-law distribution appears as a straight than the one before it. This is known as logarithmic
line when plotted on logarithmic scales. Just making a binning. For the present example, for instance, we might
simple histogram, however, and plotting it on log scales to choose a multiplier of 2 and create bins that span the
see if it looks straight is, in most cases, a poor way to intervals 1 to 1.1, 1.1 to 1.3, 1.3 to 1.7 and so forth (i.e.
proceed. the sizes of the bins are 0.1, 0.2, 0.4 and so forth). This
Consider figure 3. This example shows a fake data set: I means the bins in the tail of the distribution get more
have generated a million random real numbers drawn from samples than they would if bin sizes were fixed, and this
a power-law probability distribution p(x) = Cx – a with reduces the statistical errors in the tail. It also has the nice
exponent a = – 2.5, just for illustrative purposes{. Panel side-effect that the bins appear to be of constant width
when we plot the histogram on log scales.
*
Power laws also occur in many situations other than the statistical
I used logarithmic binning in the construction of figure 2
distributions of quantities. For instance, Newton’s famous 1/r2 law for (b), which is why the points representing the individual bins
gravity has a power-law form with exponent a = 2. While such laws are appear equally spaced. In figure 3 (c) I have done the same
certainly interesting in their own way, they are not the topic of this paper. for our computer-generated power-law data. As we can see,
Thus, for instance, there has in recent years been some discussion of the
the straight-line power-law form of the histogram is now
‘allometric’ scaling laws seen in the physiognomy and physiology of
biological organisms [17], but since these are not statistical distributions
much clearer and can be seen to extend for at least a decade
they will not be discussed here. further than was apparent in figure 3 (b).
{
http://linkage.rockefeller.edu/wli/zipf/. Even with logarithmic binning there is still some noise in
{
This can be done using the so-called transformation method. If we can the tail, although it is sharply decreased. Suppose the
generate a random real number r uniformly distributed in the range
bottom of the lowest bin is at xmin and the ratio of the
0 4 r 5 1, then x = xmin(1 – r)71/a71 is a random power-law-distributed
real number in the range xmin 4 x 5 ? with exponent a. Note that there
widths of successive bins is a. Then the kth bin extends
has to be a lower limit xmin on the range; the power-law distribution from xk – 1 = xminak – 1 to xk = xminak and the expected
diverges as x?0—see section 2.1. number of samples falling in this interval is
326 M.E.J. Newman
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
Figure 3. (a) Histogram of the set of 1 million random numbers described in the text, which have a power-law distribution
with exponent a = 2.5. (b) The same histogram on logarithmic scales. Notice how noisy the results get in the tail towards the
right-hand side of the panel. This happens because the number of samples in the bins becomes small and statistical
fluctuations are therefore large as a fraction of sample number. (c) A histogram constructed using ‘logarithmic binning’. (d) A
cumulative histogram or rank/frequency plot of the same data. The cumulative distribution also follows a power law, but
with an exponent of a – 1 = 1.5.
any information that was contained in the individual values Applying equation (5) to our present data gives an
of the samples within that range. Cumulative distributions estimate of a = 2.500 + 0.002 for the exponent, which
do not throw away any information; it is all there in the agrees well with the known value of 2.5.
plot.
Figure 3 (d) shows our computer-generated power-law
2.1 Examples of power laws
data as a cumulative distribution, and indeed we again see
the tell-tale straight-line form of the power law, but with a In figure 4 we show cumulative distributions of twelve
shallower slope than before. Cumulative distributions like different quantities measured in physical, biological,
this are sometimes also called rank/frequency plots for technological and social systems of various kinds. All have
reasons explained in Appendix A. Cumulative distributions been proposed to follow power laws over some part of their
with a power-law form are sometimes said to follow Zipf’s range. The ubiquity of power-law behaviour in the natural
law or a Pareto distribution, after two early researchers who world has led many scientists to wonder whether there is a
championed their study. Since power-law cumulative single, simple, underlying mechanism linking all these
distributions imply a power-law form for p(x), ‘Zipf’s different systems together. Several candidates for such
law’ and ‘Pareto distribution’ are effectively synonymous mechanisms have been proposed, going by names like ‘self-
with ‘power-law distribution’. (Zipf’s law and the Pareto organized criticality’ and ‘highly optimized tolerance’.
distribution differ from one another in the way the However, the conventional wisdom is that there are
cumulative distribution is plotted—Zipf made his plots actually many different mechanisms for producing power
with x on the horizontal axis and P(x) on the vertical one; laws and that different ones are applicable to different
Pareto did it the other way around. This causes much cases. We discuss these points further in section 4.
confusion in the literature, but the data depicted in the The distributions shown in figure 4 are as follows.
plots are of course identical*.)
We know the value of the exponent a for our artificial (a) Word frequency: Estoup [8] observed that the
data set since it was generated deliberately to have a frequency with which words are used appears to
particular value, but in practical situations we would often follow a power law, and this observation was
like to estimate a from observed data. One way to do this famously examined in depth and confirmed by Zipf
would be to fit the slope of the line in plots like figures 3 (b), [2]. Panel (a) of figure 4 shows the cumulative
(c) or (d), and this is the most commonly used method. distribution of the number of times that words occur
Unfortunately, it is known to introduce systematic biases in a typical piece of English text, in this case the text
into the value of the exponent [20], so it should not be relied of the novel Moby Dick by Herman Melville{. Similar
upon. For example, a least-squares fit of a straight line to distributions are seen for words in other languages.
figure 3 (b) gives a = 2.26 + 0.02, which is clearly (b) Citations of scientific papers: As first observed by
incompatible with the known value of a = 2.5 from which Price [11], the numbers of citations received by
the data were generated. scientific papers appear to have a power-law distribu-
An alternative, simple and reliable method for extracting tion. The data in panel (b) are taken from the Science
the exponent is to employ the formula Citation Index, as collated by Redner [23], and are for
" #1 papers published in 1981. The plot shows the
X
n
xi cumulative distribution of the number of citations
a¼1þn ln : ð5Þ
xmin received by a paper between publication and June
i¼1
1997.
(c) Web hits: The cumulative distribution of the number
Here the quantities xi, i = 1. . .n are the measured values of of ‘hits’ received by web sites (i.e. servers, not pages)
x and xmin is again the minimum value of x. (As discussed during a single day from a subset of the users of the
in the following section, in practical situations xmin usually AOL Internet service. The site with the most hits, by
corresponds not to the smallest value of x measured but to a long way, was yahoo.com. After Adamic and
the smallest for which the power-law behaviour holds.) The Huberman [12].
derivation of this formula is given in Appendix B. An error (d) Copies of books sold: The cumulative distribution of
estimate for a can be derived by a standard bootstrap or the total number of copies sold in America of the 633
jackknife resampling method [21]; for large data sets of the bestselling books that sold 2 million or more copies
type discussed in this paper, a bootstrap is normally the
more computationally economical of the two. {
The most common words in this case are, in order, ‘the’, ‘of’, ‘and’, ‘a’ and
‘to’, and the same is true for most written English texts. Interestingly,
*
See http://www.hpl.hp.com/research/idl/papers/ranking/ for a useful however, it is not true for spoken English. The most common words in
discussion of these and related points. spoken English are, in order, ‘I’, ‘and’, ‘the’, ‘to’ and ‘that’ [22].
328 M.E.J. Newman
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
Figure 4. Cumulative distributions or ‘rank/frequency plots’ of twelve quantities reputed to follow power laws. The
distributions were computed as described in Appendix A. Data in the shaded regions were excluded from the calculations of
the exponents in table 1. Source references for the data are given in the text. (a) Numbers of occurrences of words in the novel
Moby Dick by Hermann Melville. (b) Numbers of citations to scientific papers published in 1981, from time of publication
until June 1997. (c) Numbers of hits on web sites by 60000 users of the America Online Internet service for the day of 1
December 1997. (d) Numbers of copies of bestselling books sold in the US between 1895 and 1965. (e) Number of calls
received by AT&T telephone customers in the US for a single day. (f) Magnitude of earthquakes in California between
January 1910 and May 1992. Magnitude is proportional to the logarithm of the maximum amplitude of the earthquake, and
hence the distribution obeys a power law even though the horizontal axis is linear. (g) Diameter of craters on the moon.
Vertical axis is measured per square kilometre. (h) Peak gamma-ray intensity of solar flares in counts per second, measured
from Earth orbit between February 1980 and November 1989. (i) Intensity of wars from 1816 to 1980, measured as battle
deaths per 10000 of the population of the participating countries. (j) Aggregate net worth in dollars of the richest individuals
in the US in October 2003. (k) Frequency of occurrence of family names in the US in the year 1990. (l) Populations of US
cities in the year 2000.
Power laws, Pareto distributions and Zipfs law 329
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
between 1895 and 1965. The data were compiled CsI scintillation detector to measure gamma-rays
painstakingly over a period of several decades by from solar flares and the horizontal axis in the figure
Alice Hackett, an editor at Publisher’s Weekly [24]. is calibrated in terms of scintillation counts per
The best selling book during the period covered was second from this detector. The data are from the
Benjamin Spock’s The Common Sense Book of Baby NASA Goddard Space Flight Center, umbra.nas-
and Child Care. (The Bible, which certainly sold more com.nasa.gov/smm/hxrbs.html. See also Lu and
copies, is not really a single book, but exists in many Hamilton [5].
different translations, versions and publications, and (i) Intensity of wars: The cumulative distribution of the
was excluded by Hackett from her statistics.) intensity of 119 wars from 1816 to 1980. Intensity is
Substantially better data on book sales than Hack- defined by taking the number of battle deaths among
ett’s are now available from operations such as all participant countries in a war, dividing by the total
Nielsen BookScan, but unfortunately at a price this combined populations of the countries and multi-
author cannot afford. I should be very interested to plying by 10000. For instance, the intensities of the
see a plot of sales figures from such a modern source. First and Second World Wars were 141.5 and 106.3
(e) Telephone calls: The cumulative distribution of the battle deaths per 10000 respectively. The worst war of
number of calls received on a single day by 51 million the period covered was the small but horrifically
users of AT&T long distance telephone service in the destructive Paraguay-Bolivia war of 1932 – 1935 with
United States. After Aiello et al. [25]. The largest an intensity of 382.4. The data are from Small and
number of calls received by a customer on that day Singer [28]. See also Roberts and Turcotte [7].
was 375746, or about 260 calls a minute (obviously to (j) Wealth of richest Americans: The cumulative dis-
a telephone number that has many people manning tribution of the total wealth of the richest people in
the phones). Similar distributions are seen for the the United States. Wealth is defined as aggregate net
number of calls placed by users and also for the worth, i.e. total value in dollars at current market
numbers of e-mail messages that people send and prices of all an individual’s holdings, minus their
receive [26, 27]. debts. For instance, when the data were compiled in
(f) Magnitude of earthquakes: The cumulative distribu- 2003, America’s richest person, William H. Gates III,
tion of the Richter magnitude of earthquakes had an aggregate net worth of $46 billion, much of it
occurring in California between January 1910 and in the form of stocks of the company he founded,
May 1992, as recorded in the Berkeley Earthquake Microsoft Corporation. Note that net worth does not
Catalog. The Richter magnitude is defined as the actually correspond to the amount of money in-
logarithm, base 10, of the maximum amplitude of dividuals could spend if they wanted to: if Bill Gates
motion detected in the earthquake, and hence the were to sell all his Microsoft stock, for instance, or
horizontal scale in the plot, which is drawn as linear, otherwise divest himself of any significant portion of
is in effect a logarithmic scale of amplitude. The it, it would certainly depress the stock price. The data
power-law relationship in the earthquake distribution are from Forbes magazine, 6 October 2003.
is thus a relationship between amplitude and fre- (k) Frequencies of family names: Cumulative distribution
quency of occurrence. The data are from the National of the frequency of occurrence in the US of the 89000
Geophysical Data Center, www.ngdc.noaa.gov. most common family names, as recorded by the US
(g) Diameter of moon craters: The cumulative distribu- Census Bureau in 1990. Similar distributions are
tion of the diameter of moon craters. Rather than observed for names in some other cultures as well (for
measuring the (integer) number of craters of a given example in Japan [29]) but not in all cases. Korean
size on the whole surface of the moon, the vertical family names for instance appear to have an
axis is normalized to measure the number of craters exponential distribution [30].
per square kilometre, which is why the axis goes (l) Populations of cities: Cumulative distribution of the
below 1, unlike the rest of the plots, since it is entirely size of the human populations of US cities as
possible for there to be less than one crater of a given recorded by the US Census Bureau in 2000.
size per square kilometre. After Neukum and Ivanov
[4]. Few real-world distributions follow a power law over their
(h) Intensity of solar flares: The cumulative distribution entire range, and in particular not for smaller values of the
of the peak gamma-ray intensity of solar flares. The variable being measured. As pointed out in the previous
observations were made between 1980 and 1989 by section, for any positive value of the exponent a the function
the instrument known as the Hard X-Ray Burst p(x) = Cx – a diverges as x?0. In reality therefore, the
Spectrometer aboard the Solar Maximum Mission distribution must deviate from the power-law form below
satellite launched in 1980. The spectrometer used a some minimum value xmin. In our computer-generated
330 M.E.J. Newman
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
example of the last section we simply cut off the distribution Table 1. Parameters for the distributions shown in figure 4.
altogether below xmin so that p(x) = 0 in this region, but The labels on the left refer to the panels in the figure. Exponent
most real-world examples are not that abrupt. Figure 4 values were calculated using the maximum likelihood method
of equation (5) and Appendix B, except for the moon craters
shows distributions with a variety of behaviours for small (g), for which only cumulative data were available. For this
values of the variable measured; the straight-line power-law case the exponent quoted is from a simple least-squares fit and
form asserts itself only for the higher values. Thus one often should be treated with caution. Numbers in parentheses give
hears it said that the distribution of such-and-such a the standard error on the trailing figures.
quantity ‘has a power-law tail’.
Quantity Minimum Exponent
Extracting a value for the exponent a from distributions
like these can be a little tricky, since it requires us to make xmin a
a judgement, sometimes imprecise, about the value xmin (a) frequency of use of words 1 2.20(1)
(b) number of citations to papers 100 3.04(2)
above which the distribution follows the power law. Once (c) number of hits on web sites 1 2.40(1)
this judgement is made, however, a can be calculated (d) copies of books sold in the US 2000000 3.51(16)
simply from equation (5)*. (Care must be taken to use the (e) telephone calls received 10 2.22(1)
correct value of n in the formula; n is the number of (f) magnitude of earthquakes 3.8 3.04(4)
samples that actually go into the calculation, excluding (g) diameter of moon craters 0.01 3.14(5)
(h) intensity of solar flares 200 1.83(2)
those with values below xmin, not the overall total number (i) intensity of wars 3 1.80(9)
of samples.) (j) net worth of Americans $600m 2.09(4)
Table 1 lists the estimated exponents for each of the (k) frequency of family names 10000 1.94(1)
distributions of figure 4, along with standard errors (l) population of US cities 40000 2.30(5)
calculated by bootstrapping 100 times, and also the values
of xmin used in the calculations. Note that the quoted
errors correspond only to the statistical sampling error in
the estimation of a; I have included no estimate of any
2.2 Distributions that do not follow a power law
errors introduced by the fact that a single power-law
function may not be a good model for the data in some Power-law distributions are, as we have seen, impressively
cases or for variation of the estimates with the value ubiquitous, but they are not the only form of broad
chosen for xmin. distribution. Lest I give the impression that everything
In the author’s opinion, the identification of some of the interesting follows a power law—an opinion that has been
distributions in figure 4 as following power laws should be espoused elsewhere—let me emphasize that there are quite
considered unconfirmed. While the power law seems to be a number of quantities with highly right-skewed distribu-
an excellent model for most of the data sets depicted, a tions that nonetheless do not follow power laws. A few of
tenable case could be made that the distributions of web them, shown in figure 5, are the following.
hits and family names might have two different power-law
regimes with slightly different exponents. And the data for (a) The lengths of relationships between couples, which
the numbers of copies of books sold cover rather a small although they span more than four orders of
range—little more than one decade horizontally{. None- magnitude appear to be exponentially distributed.
theless, one can, without stretching the interpretation of the (b) The abundance of North American bird species,
data unreasonably, claim that power-law distributions have which spans over five orders of magnitude but is
been observed in language, demography, commerce, probably distributed according to a log-normal. A
information and computer sciences, geology, physics and log-normally distributed quantity is one whose
astronomy, and this on its own is an extraordinary logarithm is normally distributed; see section 4.7
statement. and [34] for further discussions.
(c) The number of entries in people’s email address
books, which spans about three orders of magnitude
but seems to follow a stretched exponential. A
*
Sometimes the tail is also cut off because there is, for one reason or
stretched exponential is a curve of the form exp( –
another, a limit on the largest value that may occur. An example is the axb) for some constants a, b.
finite-size effects found in critical phenomena—see section 4.5. In this case, (d) The distribution of the sizes of forest fires, which
equation (5) must be modified [20]. spans six orders of magnitude and could follow a
{
Significantly more tenuous claims to power-law behaviour for other
power law but with an exponential cut-off.
quantities have appeared elsewhere in the literature, for instance in the
discussion of the distribution of the sizes of electrical blackouts [31,32].
These however I consider insufficiently substantiated for inclusion in the This being an article about power laws, I will not discuss
present work. further the possible explanations for these distributions, but
Power laws, Pareto distributions and Zipfs law 331
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
Figure 5. Cumulative distributions of some quantities whose values span several orders of magnitude but that nonetheless do
not obey power laws. (a) The length in days of the most recent sexual relationship of 1013 men and women interviewed in the
study of Foxman et al. (unpublished). (b) The number of sightings of 591 species of birds in the North American Breeding
Bird Survey 2003. (c) The number of addresses in the e-mail address books of 16881 users of a large university computer
system [33]. (d) The size in acres of all wildfires occurring on US federal lands between 1986 and 1996 (National Fire
Occurrence Database, USDA Forest Service and Department of the Interior). Note that the horizontal axes in frames (a) and
(c) are linear but in (b) and (d) they are logarithmic.
2
a1 x a C aþ3 1
pðxÞ ¼ : ð9Þ x ¼ x : ð12Þ
xmin xmin 3a xmin
Some distributions follow a power law for part of their This diverges if a 4 3. Thus power-law distributions in this
range but are cut off at high values of x. That is, above range, which includes almost all of those in table 1, have no
some value they deviate from the power law and fall off finite mean square in the limit of a large data set, and thus
quickly towards zero. If this happens, then the distribution also no finite variance or standard deviation. We discuss
may be normalizable no matter what the value of the the meaning of this statement further below. If a 4 3, then
exponent a. Even so, exponents less than unity are rarely, if the second moment is finite and well defined, taking the
ever, seen. value
2 a 1 2
x ¼ x : ð13Þ
a 3 min
3.2 Moments
The mean value of x in our power law is given by These results can easily be extended to show that in
Z 1 Z 1 general all moments hxmi exist for m 5 a 7 1 and all higher
hxi ¼ xpðxÞdx ¼ C xaþ1 dx moments diverge. The ones that do exist are given by
xmin xmin
ð10Þ
C aþ2 1 a1
¼ x : hxm i ¼ xm : ð14Þ
2a xmin a 1 m min
a1
hxi ¼ xmin : ð11Þ Now we can calculate the mean value hxmaxi of the
a2 largest sample thus:
Z 1 Z 1
We can also calculate higher moments of the distribution hxmax i ¼ xpðxÞdx ¼ n xpðxÞ½1 PðxÞn1 dx:
p(x). For instance, the second moment, the mean square, is xmin xmin
given by ð17Þ
Power laws, Pareto distributions and Zipfs law 333
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
Using equations (9) and (15), this is As xmax becomes large this expression is dominated by the
upper limit, and using the result, equation (22), for xmax, we
hxmax i ¼ nða 1Þ get
Z 1 " #n1
x aþ1 x aþ1 2
1 dx x nð3aÞ=ða1Þ : ð24Þ
xmin xmin xmin
Z 1 ð18Þ
yn1 So, for instance, if a ¼ 52, then the mean-square sample
¼ nxmin 1=ða1Þ
dy
0 ð1 yÞ value, and hence also the sample variance, goes as n1/3 as
¼ nxmin Bðn; ða 2Þ=ða 1ÞÞ; the size of the data set gets larger.
or
The beta-function has the interesting property that for
large values of either of its arguments it itself follows a x1=2 ¼ 21=ða1Þ xmin : ð26Þ
power law{. For instance, for large a and fixed b, B(a,
b)*a – b. In most cases of interest, the number n of samples So, for example, if we are considering the distribution of
from our power-law distribution will be large (meaning wealth, there will be some well-defined median wealth that
much greater than 1), so divides the richer half of the population from the poorer.
But we can also ask how much of the wealth itself lies in
Bðn; ða 2Þ=ða 1ÞÞ nða2Þ=ða1Þ ð21Þ those two halves. Obviously more than half of the total
amount of money belongs to the richer half of the
and population. The fraction of the money in the richer half
is given by
hxmax i n1=ða1Þ : ð22Þ R1
x xpðxÞdx x1=2 aþ2
R 11=2 ¼ ¼ 2ða2Þ=ða1Þ ; ð27Þ
Thus hxmaxi always increases as n becomes larger so long as xmin xpðxÞdx
xmin
a 4 1.
This allows us to complete the calculation of the provided a 4 2 so that the integrals converge. Thus, for
moments in section 3.2. Consider for instance the second instance, if a = 2.1 for the wealth distribution, as indicated
moment, which is often of interest in power laws. For the in table 1, then a fraction 2 – 0.091^94% of the wealth is in
crucial case 2 5 a 4 3, which covers most of the power-law the hands of the richer 50% of the population, making the
distributions observed in real life, we saw in equation (12) distribution quite top-heavy.
that the second moment of the distribution diverges as the More generally, the fraction of the population whose
size of the data set becomes infinite. But in reality all data personal wealth exceeds x is given by the quantity P(x),
sets are finite and so have a finite maximum sample xmax. equation (15), and the fraction of the total wealth in the
This means that (12) becomes hands of those people is
2 R1 0 0
C aþ3 xmax 0
x aþ2
x ¼ : ð23Þ x x pðx Þdx
3a
x xmin WðxÞ ¼ R 1 0 0 0
¼ ; ð28Þ
xmin x pðx Þdx
xmin
*
Also called the Eulerian integral of the first kind.
assuming again that a 4 2. Eliminating x/xmin between (15)
{
This can be demonstrated by approximating the G-functions of equation and (28), we find that the fraction W of the wealth in the
(19) using Sterling’s formula. hands of the richest P of the population is
334 M.E.J. Newman
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
pðbÞpðxÞ
pðbxÞ ¼ : ð31Þ
pð1Þ
Figure 6. The fraction W of the total wealth in a country
held by the fraction P of the richest people, if wealth is Since this equation is supposed to be true for any b, we can
distributed following a power law with exponent a. If differentiate both sides with respect to b to get
a = 2.1, for instance, as it appears to in the United States
p0 ðbÞpðxÞ
(table 1), then the richest 20% of the population hold about xp0 ðbxÞ ¼ ; ð32Þ
86% of the wealth (dashed lines). pð1Þ
Power laws, Pareto distributions and Zipfs law 335
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
where p’ indicates the derivative of p with respect to its for some constant exponent a. Clearly this distribution
argument. Now we set b = 1 and get cannot hold all the way down to k = 0, since it diverges
there, but it could in theory hold down to k = 1. If we
dp p0 ð1Þ
x ¼ pðxÞ: ð33Þ discard any data for k = 0, the constant C would then be
dx pð1Þ given by the normalization condition
X
1 X
1
This is a simple first-order differential equation which has 1¼ pk ¼ C ka ¼ CzðaÞ; ð37Þ
the solution k¼1 k¼1
pð1Þ
ln pðxÞ ¼ ln x þ constant: ð34Þ where z(a) is the Riemann z-function. Rearranging,
p0 ð1Þ C=1/z(a) and
ka
pk ¼ : ð38Þ
Setting x = 1 we find that the constant is simply ln p(1), and zðaÞ
then taking exponentials of both sides
If, as is usually the case, the power-law behaviour is seen
pðxÞ ¼ pð1Þxa ; ð35Þ
only in the tail of the distribution, for values k 5 kmin, then
the equivalent expression is
where a = – p(1)/p’(1). Thus, as advertised, the power-law
ka
distribution is the only function satisfying the scale-free pk ¼ ; ð39Þ
criterion (30). zða; kmin Þ
This fact is more than just a curiosity. As we will see
P
in section 4.5, there are some systems that become scale- where zða;kmin Þ ¼ 1 k¼kmin k
a
is the generalized or incom-
free for certain special values of their governing para- plete z-function.
meters. The point defined by such a special value is Most of the results of the previous sections can be
called a ‘continuous phase transition’ and the argument generalized to the case of discrete variables, although the
given above implies that at such a point the observable mathematics is usually harder and often involves special
quantities in the system should adopt a power-law functions in place of the more tractable integrals of the
distribution. This indeed is seen experimentally and the continuous case.
distributions so generated provided the original motiva- It has occasionally been proposed that equation (36) is
tion for the study of power laws in physics (although not the best generalization of the power law to the discrete
most experimentally observed power laws are probably case. An alternative and in many cases more convenient
not the result of phase transitions—a variety of other form is
mechanisms produce power-law behaviour as well, as we
ðkÞðaÞ
will shortly see). pk ¼ C ¼ CBðk; aÞ; ð40Þ
ðk þ aÞ
and hence C = a – 1 and on a typewriter*, pressing the space bar with probability qs
per stroke and each letter with equal probability ql per
pk ¼ ða 1ÞBðk; aÞ: ð42Þ
stroke. If there are m letters in the alphabet then ql = (1 –
qs)/m. (In this simplest version of the argument we also type
The first and second moments (i.e. the mean and mean no punctuation, digits or other non-letter symbols.) Then
square of the distribution) are the frequency x with which a particular word with y letters
(followed by a space) occurs is
a 1 2 ða 1Þ2
hki ¼ ; k ¼ ; ð43Þ 1 qs y
a2 ða 2Þða 3Þ x¼ qs exp ðbyÞ; ð47Þ
m
and there are similarly simple expressions corresponding to
many of our earlier results for the continuous case. where b = ln (1 – qs) – ln m. The number (or fraction) of
distinct possible words with length between y and y + dy
goes up exponentially as p(y)*my = exp (ay) with a = ln
4. Mechanisms for generating power-law distributions
m. Thus, following our argument above, the distribution of
In this section we look at possible candidate mechanisms by frequencies of words has the form p(x)*x – a with
which power-law distributions might arise in natural and
a 2 ln m ln ð1 qs Þ
man-made systems. Some of the possibilities that have been a¼1 ¼ : ð48Þ
suggested are quite complex—notably the physics of critical b ln m ln ð1 qs Þ
phenomena and the tools of the renormalization group that
are used to analyse it. But let us start with some simple For the typical case where m is reasonably large and qs
algebraic methods of generating power-law functions and quite small this gives a^2 in approximate agreement with
progress to the more involved mechanisms later. table 1.
This is a reasonable theory as far as it goes, but real text
is not made up of random letters. Most combinations of
4.1 Combinations of exponentials
letters do not occur in natural languages; most are not even
A much more common distribution than the power law is pronounceable. We might imagine that some constant
the exponential, which arises in many circumstances, such fraction of possible letter sequences of a given length would
as survival times for decaying atomic nuclei or the correspond to real words and the argument above would
Boltzmann distribution of energies in statistical mechanics. then work just fine when applied to that fraction, but upon
Suppose some quantity y has an exponential distribution: reflection this suggestion is obviously bogus. It is clear for
instance that very long words simply do not exist in most
pðyÞ exp ðayÞ: ð44Þ languages, although there are exponentially many possible
combinations of letters available to make them up. This
The constant a might be either negative or positive. If it is observation is backed up by empirical data. In figure 7 (a)
positive then there must also be a cut-off on the we show a histogram of the lengths of words occurring in
distribution—a limit on the maximum value of y—so that the text of Moby Dick, and one would need a particularly
the distribution is normalizable. vivid imagination to convince oneself that this histogram
Now suppose that the real quantity we are interested in is follows anything like the exponential assumed by Miller’s
not y but some other quantity x, which is exponentially argument. (In fact, the curve appears roughly to follow a
related to y thus: log-normal [34].)
There may still be some merit in Miller’s argument
x exp ðbyÞ; ð45Þ
however. The problem may be that we are measuring word
‘length’ in the wrong units. Letters are not really the basic
with b another constant, also either positive or negative. units of language. Some basic units are letters, but some are
Then the probability distribution of x is groups of letters. The letters ‘th’ for example often occur
together in English and make a single sound, so perhaps
dy exp ðayÞ x1þa=b they should be considered to be a separate symbol in their
pðxÞ ¼ pðyÞ ¼ ; ð46Þ
dx b exp ðbyÞ b own right and contribute only one unit to the word length?
Following this idea to its logical conclusion we can
which is a power law with exponent a = 1 – a/b. imagine replacing each fundamental unit of the language—
A version of this mechanism was used by Miller [37] to
explain the power-law distribution of the frequencies of *
This argument is sometimes called the ‘monkeys with typewriters’
words as follows (see also [38]). Suppose we type randomly argument, the monkey being the traditional exemplar of a random typist.
Power laws, Pareto distributions and Zipfs law 337
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
whatever that is—by its own symbol and then measuring exponentially more distinct words in the language of high
lengths in terms of numbers of symbols. The pursuit of information content than of low. That this is the case is
ideas along these lines led Claude Shannon in the 1940s to experimentally verified by figure 7 (b), but the reason must
develop the field of information theory, which gives a be considered still a matter of debate. Some possibilities are
precise prescription for calculating the number of symbols discussed by, for instance, Mandelbrot [42] and more
necessary to transmit words or any other data [39, 40]. The recently by Mitzenmacher [19].
units of information are bits and the true ‘length’ of a word Another example of the ‘combination of exponentials’
can be considered to be the number of bits of information it mechanism has been discussed by Reed and Hughes [43].
carries. Shannon showed that if we regard words as the They consider a process in which a set of items, piles or
basic divisions of a message, the information y carried by groups each grows exponentially in time, having size
any particular word is x^(bt) with b 4 0. For instance, populations of organisms
reproducing freely without resource constraints grow
y ¼ k ln x; ð49Þ exponentially. Items also have some fixed probability of
dying per unit time (populations might have a stochasti-
where x is the frequency of the word as before and k is a cally constant probability of extinction), so that the times t
constant. (The reader interested in finding out more about at which they die are exponentially distributed p(t)^(at)
where this simple relation comes from is recommended to with a 5 0.
look at the excellent introduction to information theory by These functions again follow the form of equations (44)
Cover and Thomas [41].) and (45) and result in a power-law distribution of the sizes
But this has precisely the form that we want. Inverting it x of the items or groups at the time they die. Reed and
we have x = exp( – y/k) and if the probability distribution Hughes suggest that variations on this argument may
of the ‘lengths’ measured in terms of bits is also exponential explain the sizes of biological taxa, incomes and cities,
as in equation (44) we will get our power-law distribution. among other things.
Figure 7 (b) shows the latter distribution, and indeed it
follows a nice exponential—much better than figure 7 (a).
This is still not an entirely satisfactory explanation.
4.2 Inverses of quantities
Having made the shift from pure word length to informa-
tion content, our simple count of the number of words of Suppose some quantity y has a distribution p(y) that passes
length y—that it goes exponentially as my—is no longer through zero, so that y has both positive and negative
valid, and now we need some reason why there should be values. And suppose further that the quantity we are really
interested in is the reciprocal x = 1/y, which will have
distribution
dy pðyÞ
pðxÞ ¼ pðyÞ ¼ 2 : ð50Þ
dx x
So
*
Gambler’s ruin is so called because a gambler’s night of betting ends when 1
his or her supply of money hits zero (assuming the gambling establishment FðzÞ ¼ 1 : ð55Þ
declines to offer him or her a line of credit).
UðzÞ
Power laws, Pareto distributions and Zipfs law 339
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
Now consider the form of f2n for large n. Writing out the
2
n ¼ ð2nÞ!=ðn!Þ , we take logs thus:
binomial coefficient as 2n
*
Figure 10. J.C. Willis’s 1922 plot of the cumulative Yule’s analysis of the process was considerably more involved than the
one presented here, essentially because the theory of stochastic processes as
distribution of the number of species in genera of flowering we now know it did not yet exist in his time. The master equation method
plants [49,15]. (Reproduced with permission from Nature, we employ is a relatively modern innovation, introduced in this context by
vol. 109, pp. 177 – 179 http://www.nature.com/). Simon [35].
Power laws, Pareto distributions and Zipfs law 341
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
m 1
ðn þ 1Þpk;nþ1 ¼ npk;n þ ðk 1Þpk1;n kpk;n : ð65Þ a¼2þ : ð72Þ
mþ1 m
The only exception to this equation is for genera of size The mean number m + 1 of species per genus for the
1, which instead obey the equation example of flowering plants is about 3, making m^2 and
m a^2.5. The actual exponent for the distribution in figure 10
ðn þ 1Þp1;nþ1 ¼ np1;n þ 1 p1;n ; ð66Þ is a = 2.5 + 0.1, which is in excellent agreement with the
mþ1
theory.
Most likely this agreement is fortuitous, however. The
since by definition exactly one new such genus appears on Yule process is probably not a terribly realistic explanation
each time step. for the distribution of the sizes of genera, principally
Now we ask what form the distribution of the sizes of because it ignores the fact that species (and genera) become
genera takes in the limit of long times. To do this we extinct. However, it has been adapted and generalized by
allow n?? and assume that the distribution tends to others to explain power laws in many other systems, most
some fixed value pk = limn??pn,k independent of n. Then famously city sizes [35], paper citations [50, 51], and links to
equation (66) becomes p1 = 1 – mp1/(m + 1), which has the pages on the world wide web [52, 53]. The most general
solution form of the Yule process is as follows.
Suppose we have a system composed of a collection of
mþ1
p1 ¼ : ð67Þ objects, such as genera, cities, papers, web pages and so
2m þ 1 forth. New objects appear every once in a while as cities
grow up or people publish new papers. Each object also has
And equation (65) becomes some property k associated with it, such as number of
m species in a genus, people in a city or citations to a paper,
pk ¼ ½ðk 1Þpk1 kpk ; ð68Þ which is reputed to obey a power law, and it is this power
mþ1
law that we wish to explain. Newly appearing objects have
some initial value of k which we will denote k0. New genera
which can be rearranged to read initially have only a single species k0 = 1, but new towns or
cities might have quite a large initial population—a single
k1
pk ¼ pk1 ; ð69Þ person living in a house somewhere is unlikely to constitute
k þ 1 þ 1=m a town in their own right but k0 = 100 people might do so.
The value of k0 can also be zero in some cases: newly
and then iterated to get published papers usually have zero citations for instance.
In between the appearance of one object and the next, m
ðk 1Þðk 2Þ . . . 1
pk ¼ p1 new species/people/citations etc. are added to the entire
ðk þ 1 þ 1=mÞðk þ 1=mÞ . . . ð3 þ 1=mÞ system. That is some cities or papers will get new people or
ð70Þ
ðk 1Þ . . . 1 citations, but not necessarily all will. And in the simplest
¼ ð1 þ 1=mÞ ;
ðk þ 1 þ 1=mÞ . . . ð2 þ 1=mÞ case these are added to objects in proportion to the number
that the object already has. Thus the probability of a city
where I have made use of equation (67). This can be gaining a new member is proportional to the number
simplified further by making use of a handy property of the already there; the probability of a paper getting a new
G-function, equation (20), that G(a) = (a – 1)G(a – 1). Using citation is proportional to the number it already has. In
this, and noting that G(1) = 1, we get many cases this seems like a natural process. For example,
a paper that already has many citations is more likely to be
ðkÞð2 þ 1=mÞ
pk ¼ ð1 þ 1=mÞ discovered during a literature search and hence more likely
ðk þ 2 þ 1=mÞ ð71Þ to be cited again. Simon [35] dubbed this type of ‘rich-get-
¼ ð1 þ 1=mÞBðk; 2 þ 1=mÞ; richer’ process the Gibrat principle. Elsewhere it also goes
by the names of the Matthew effect [54], cumulative
where B(a, b) is again the beta-function, equation (19). advantage [50], or preferential attachment [52].
This, we note, is precisely the distribution defined in There is a problem however when k0 = 0. For example, if
equation (40), which Simon called the Yule distribution. new papers appear with no citations and garner citations in
Since the beta-function has a power-law tail B(a, b)*a – b, proportion to the number they currently have, which is
we can immediately see that pk also has a power-law tail zero, then no paper will ever get any citations! To overcome
with an exponent this problem one typically assigns new citations not in
342 M.E.J. Newman
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
proportion simply to k, but to k + c, where c is some citations Price [50] assumed that c = 1, so that paper
constant. Thus there are three parameters k0, c and m that citations have the same exponent a = 2 + 1/m as the
control the behaviour of the model. standard Yule process, although there does not seem to
By an argument exactly analogous to the one given be any very good reason for making this assumption. As we
above, one can then derive the master equation saw in table 1 (and as Price himself also reported), real
citations seem to have an exponent a^3, so we should
k1þc
ðn þ 1Þpk;nþ1 ¼ npk;n þ m pk1;n expect c^. For the data from the Science Citation Index
k0 þ c þ m
ð73Þ examined in section 2.1, the mean number m of citations
kþc
m pk;n ; for k4k0 ; per paper is 8.6. So we should put c^8.6 too if we want the
k0 þ c þ m Yule process to match the observed exponent.
The most widely studied model of links on the web, that
and of Barabási and Albert [52], assumes c = m so that a = 3,
but again there does not seem to be a good reason for this
k0 þ c
ðn þ 1Þpk0 ;nþ1 ¼ npk0 ;n þ 1 m pk ;n ; for k ¼ k0 : assumption. The measured exponent for numbers of links
k0 þ c þ m 0 to web sites is about a = 2.2, so if the Yule process is to
ð74Þ match the data in this case, we should put c^0.2m.
However, the important point is that the Yule process is
(Note that k is never less than k0, since each object appears a plausible and general mechanism that can explain a
with k = k0 initially.) number of the power-law distributions observed in nature
Looking for stationary solutions of these equations as and can produce a wide range of exponents to match the
before, we define pk = limn??pn,k and find that observations by suitable adjustments of the parameters.
For several of the distributions shown in figure 4, especially
k0 þ c þ m
pk0 ¼ ; ð75Þ citations, city populations and personal income, it is now
ðm þ 1Þðk0 þ cÞ þ m the most widely accepted theory.
and
critical phenomena, of which power-law distributions are And what happens in between these two extremes? As we
one example. increase p from small values, the value of hsi also increases.
To better understand the physics of critical phenomena, But at some point we reach the start of the regime in which
let us explore one simple but instructive example, that of hsi goes up with system size instead of staying constant. We
the ‘percolation transition’. Consider a square lattice like now know that this point is at p = 0.5927462. . ., which is
the one depicted in figure 11 in which some of the squares called the critical value of p and is denoted pc. If the size of
have been coloured in. Suppose we colour each square with the lattice is large, then hsi also becomes large at this point,
independent probability p, so that on average a fraction p and in the limit where the lattice size goes to infinity hsi
of them are coloured in. Now we look at the clusters of actually diverges. To illustrate this phenomenon, I show in
coloured squares that form, i.e. the contiguous regions of figure 13 a plot of hsi from simulations of the percolation
adjacent coloured squares. We can ask, for instance, what model and the divergence is clear.
the mean area hsi is of the cluster to which a randomly Now consider not just the mean cluster size but the entire
chosen square belongs. If that square is not coloured in distribution of cluster sizes. Let p(s) be the probability that
then the area is zero. If it is coloured in but none of the a randomly chosen square belongs to a cluster of area s. In
adjacent ones is coloured in then the area is one, and so general, what forms can p(s) take as a function of s? The
forth. important point to notice is that p(s), being a probability
When p is small, only a few squares are coloured in and distribution, is a dimensionless quantity—just a number—
most coloured squares will be alone on the lattice, or maybe but s is an area. We could measure s in terms of square
grouped in twos or threes. So hsi will be small. This metres, or whatever units the lattice is calibrated in. The
situation is depicted in figure 12 for p = 0.3. Conversely, if average hsi is also an area and then there is the area of a
p is large—almost 1, which is the largest value it can have— unit square itself, which we will denote a. Other than these
then most squares will be coloured in and they will almost three quantities, however, there are no other independent
all be connected together in one large cluster, the so-called parameters with dimensions in this problem. (There is the
spanning cluster. In this situation we say that the system area of the whole lattice, but we are considering the limit
percolates. Now the mean size of the cluster to which a where that becomes infinite, so it is out of the picture.)
vertex belongs is limited only by the size of the lattice itself If we want to make a dimensionless function p(s) out of
and as we let the lattice size become large hsi also becomes these three dimensionful parameters, there are three
large. So we have two distinctly different behaviours, one dimensionless ratios we can form: s/a, a/hsi and s/hsi (or
for small p in which hsi is small and does not depend on the their reciprocals, if we prefer). Only two of these are
size of the system, and one for large p in which hsi is much independent however, since the last is the product of the
larger and increases with the size of the system. other two. Thus in general we can write
s a
pðsÞ ¼ Cf ; ; ð79Þ
a hsi
Figure 12. Three examples of percolation systems on 100 6 100 square lattices with p = 0.3, p = pc = 0.5927. . . and p = 0.9.
The first and last are well below and above the critical point respectively, while the middle example is precisely at it.
still sums to unity, and that this change will depend on the
value we choose for the rescaling factor b.
But now we notice that there is one special point at which
this rescaling by definition does not result in a change in hsi
or a corresponding change in the site occupation prob-
ability, and that is the critical point. When we are precisely
at the point at which hsi??, then bhsi = hsi by definition.
Putting hsi?? in equations (79) and (80), we then get
p(s) = C’f(bs/a, 0) = (C’/C)p(bs). Or equivalently
Figure 14. A site percolation system is coarse-grained, so that the area of the fundamental square is (in this case) quadrupled.
The occupation of the squares in the coarse-grained lattice (right) is chosen to mirror as nearly as possible that of the squares
on the original lattice (left), so that the sizes and shapes of the large clusters remain roughly the same. The small clusters are
mostly lost in the coarse-graining, so that the arguments given in the text are valid only for the large-s tail of the cluster size
distribution.
from tree to adjacent tree until the whole cluster is burned, estimates give a value of a = 1.19 + 0.01 [59], meaning that
but the fire cannot cross the firebreak formed by an empty the distribution has an infinite mean in the limit of large
square. If there is no tree in the square struck by the system size. For all real systems however the mean is finite:
lightning, then nothing happens. After a fire, trees can grow the distribution is cut off in the large-size tail because fires
up again in the squares vacated by burnt trees, so the cannot have a size any greater than that of the lattice as a
process keeps going indefinitely. whole and this makes the mean well behaved. This cut-off is
If we start with an empty lattice, trees will start to appear clearly visible in figure 17 as the drop in the curve towards
but will initially be sparse and lightning strikes will either the right of the plot. What is more the distribution of the
hit empty squares or if they do chance upon a tree they will sizes of fires in real forests, figure 5 (d), shows a similar cut-
burn it and its cluster, but that cluster will be small and off and is in many ways qualitatively similar to the
localized because we are well below the percolation distribution predicted by the model. (Real forests are
threshold. Thus fires will have essentially no effect on the obviously vastly more complex than the forest fire model,
forest. As time goes by however, more and more trees will and no one is seriously suggesting that the model is an
grow up until at some point there are enough that we have accurate representation of the real world. Rather it is a
percolation. At that point, as we have seen, a spanning guide to the general type of processes that might be going
cluster forms whose size is limited only by the size of the on in forests.)
lattice, and when any tree in that cluster gets hit by the There has been much excitement about self-organized
lightning the entire cluster will burn away. This gets rid of criticality as a possible generic mechanism for explaining
the spanning cluster so that the system does not percolate where power-law distributions come from. Per Bak, one of
any more, but over time as more trees appear it will the originators of the idea, wrote an entire book about it
presumably reach percolation again, and so the scenario [60]. Self-organized critical models have been put forward
will play out repeatedly. The end result is that the system not only for forest fires, but for earthquakes [61, 62], solar
oscillates right around the critical point, first going just flares [5], biological evolution [63], avalanches [57] and
above the percolation threshold as trees appear and then many other phenomena. Although it is probably not the
being beaten back below it by fire. In the limit of large universal law that some have claimed it to be, it is certainly
system size these fluctuations become small compared to a powerful and intriguing concept that potentially has
the size of the system as a whole and to an excellent applications to a variety of natural and man-made systems.
approximation the system just sits at the threshold
indefinitely. Thus, if we wait long enough, we expect the
4.7 Other mechanisms for generating power laws
forest fire model to self-organize to a state in which it has a
power-law distribution of the sizes of clusters, or of the In the preceding sections I have described the best known
sizes of fires. and most widely applied mechanisms that generate power-
In figure 17 I show the cumulative distribution of the law distributions. However, there are a number of others
sizes of fires in the forest fire model and, as we can see, it that deserve a mention. One that has been receiving some
follows a power law closely. The exponent of the attention recently is the highly optimized tolerance
distribution is quite small in this case. The best current mechanism of Carlson and Doyle [64, 65]. The classic
Figure 16. Lightning strikes at random positions in the forest fire model, starting fires that wipe out the entire cluster to which
a struck tree belongs.
Power laws, Pareto distributions and Zipfs law 347
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
p(x) will look like a power-law distribution when we look at could produce a power-law distribution of the sizes of
a small portion on log scales. The effective exponent a of meteor craters similar to the one in figure 4 (g).
the distribution is in this case not fixed by the theory—it In fact, as discussed by a number of authors [67 – 69],
could be anything, depending on which part of the random multiplication processes can also generate perfect
quadratic our data fall on. power-law distributions with only a slight modification: if
On larger scales the distribution will have some down- there is a lower bound on the value that the product of a set
ward curvature, but so do many of the distributions of numbers is allowed to take (for example if there is a
claimed to follow power laws, so it is possible that these ‘reflecting boundary’ on the lower end of the range, or an
distributions are really log-normal. In fact, in many cases additive noise term as well as a multiplicative one) then the
we do not even have to restrict ourselves to a particularly behaviour of the process is modified to generate not a log-
small portion of the curve. If s is large then the quadratic normal, but a true power law.
term in equation (84) will vary slowly and the curvature of Finally, some processes show power-law distributions of
the line will be slight, so the distribution will appear to times between events. The distribution of times between
follow a power law over relatively large portions of its earthquakes and their aftershocks is one example. Such
range. This situation arises commonly when we are power-law distributions of times are observed in critical
considering products of random numbers. models and in the coherent noise mechanism mentioned
Suppose for example that we are multiplying together above, but another possible explanation for their occur-
100 numbers, each of which is drawn from some distribu- rence is a random extremal process or record dynamics. In
tion such that the standard deviation of the logs is around this mechanism we consider how often a randomly
1—i.e. the numbers themselves vary up or down by about a fluctuating quantity will break its own record for the
factor of e. Then, by the central limit theorem, the standard highest value recorded. For a quantity with, say, a
deviation for ln x will be s^10 and ln x will have to vary Gaussian distribution, it is always in theory possible for
by about + 10 for changes in (ln x)2/s2 to be apparent. But the record to be broken, no matter what its current value,
such a variation in the logarithm corresponds to a variation but the more often the record is broken the higher the
in x of more than four orders of magnitude. If our data record will get and the longer we will have to wait until it is
span a domain smaller than this, as many of the plots in broken again. As shown by Sibani and Littlewood [70], this
figure 4 do, then we will see a measured distribution that non-stationary process gives a distribution of waiting times
looks close to a power law. And the range will get quickly between the establishment of new records that follows a
larger as the number of numbers we are multiplying grows. power law with exponent a = – 1. Interestingly, this is
One example of a random multiplicative process might precisely the exponent observed for the distribution of
be wealth generation by investment. If a person invests waiting times for aftershocks of earthquakes. The record
money, for instance in the stock market, they will get a dynamics has also been proposed as a model for the
percentage return on their investment that varies over time. lifetimes of biological taxa [71].
In other words, in each period of time their investment is
multiplied by some factor which fluctuates from one period
5. Conclusions
to the next. If the fluctuations are random and uncorre-
lated, then after many such periods the value of the In this review I have discussed the power-law statistical
investment is the initial value multiplied by the product of a distributions seen in a wide variety of natural and man-
large number of random numbers, and therefore should be made phenomena, from earthquakes and solar flares to
distributed according to a log-normal. This could explain populations of cities and sales of books. We have seen
why the tail of the wealth distribution, figure 4 (j), appears many examples of power-law distributions in real data and
to follow a power law. seen how to analyse those data to understand the behaviour
Another example is fragmentation. Suppose we break a and parameters of the distributions. I have also described a
stick of unit length into two parts at a position which is a number of physical mechanisms that have been proposed to
random fraction z of the way along the stick’s length. Then explain the occurrence of power laws. Perhaps the two most
we break the resulting pieces at random again and so on. important of these are the following.
After many breaks, the length of one of the remaining
pieces will be Pizi, where zi is the position of the ith break. (a) The Yule process, a rich-get-richer mechanism in
This is a product of random numbers and thus the resulting which the most populous cities or best-selling books
distribution of lengths should follow a power law over a get more inhabitants or sales in proportion to the
portion of its range. A mechanism like this could, for number they already have. Yule and later Simon
instance, produce a power-law distribution of meteors or showed mathematically that this mechanism pro-
other interplanetary rock fragments, which tend to break duces what is now called the Yule distribution, which
up when they collide with one another, and this in turn follows a power law in its tail.
Power laws, Pareto distributions and Zipfs law 349
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
(b) Critical phenomena and the associated concept of [14] R. Kohli and R. Sah, Market shares: some power law results and
observations, Working Paper 04.01 (Harris School of Public Policy,
self-organized criticality, in which a scale-factor of a
University of Chicago, 2003).
system diverges, either because we have tuned the [15] J.C. Willis and G.U. Yule, Nature 109 177 (1922).
system to a special critical point in its parameter [16] V. Pareto, Cours d’Economie Politique (Droz, Geneva, 1896).
space or because the system automatically drives itself [17] G.B. West, J.H. Brown and B.J. Enquist, Science 276 122 (1997).
to that point by some dynamical process. The [18] D. Sornette, Critical Phenomena in Natural Sciences (Springer, Berlin,
2000), chapter 14.
divergence can leave the system with no appropriate
[19] M. Mitzenmacher, Internet Math. 1 226 (2004).
scale factor to set the size of some measured quantity [20] M.L. Goldstein, S.A. Morris and G.G. Yen, Eur. Phys. J. B 41 255
and as we have seen the quantity must then follow a (2004).
power law. [21] B. Efron, SIAM Rev. 21 460 (1979).
[22] H. Dahl, Word Frequencies of Spoken American English (Verbatim,
Essex, CT, 1979).
The study of power-law distributions is an area in which
[23] S. Redner, Eur. Phys. J. B 4 131 (1998).
there is considerable current research interest. While the [24] A.P. Hackett, 70 Years of Best Sellers, 1895 – 1965. (R.R. Bowker
mechanisms and explanations presented here certainly offer Company, New York, 1967).
some insight, there is much work to be done both [25] W. Aiello, F. Chung and L. Lu, in Proceedings of the 32nd Annual
experimentally and theoretically before we can say we ACM Symposium on Theory of Computing (Association of Comput-
ing Machinery, New York, 2000), pp. 171 – 180.
really understand the physical processes driving these
[26] H. Ebel, L.-I. Mielsch and S. Bornholdt, Phys. Rev. E 66 035103
systems. Without doubt there are many exciting discoveries (2002).
still waiting to be made. [27] B.A. Huberman and L.A. Adamic, in Complex Networks, No. 650,
Lecture Notes in Physics, edited by E. Ben-Naim, H. Frauenfelder and
Z. Toroczkai (Springer, Berlin, 2004), pp. 371 – 398.
[28] M. Small and J.D. Singer, Resort to Arms: International and Civil
Wars, 1816 – 1980 (Sage Publications, Beverley Hills, 1982).
Acknowledgments [29] S. Miyazima, Y. Lee, T. Nagamine, et al., Physica A 278 282 (2000).
[30] B.J. Kim and S.M. Park, preprint cond-mat/0407311 (2004).
The author would like to thank Petter Holme, Cris Moore [31] J. Chen, J.S. Thorp and M. Parashar, in 34th Hawaii International
and Erik van Nimwegen for useful conversations, and Lada Conference on System Sciences (IEEE Computer Society, New York,
2001).
Adamic for the Web site hit data. This work was funded in
[32] B.A. Carreras, D.E. Newman, I. Dobson, et al., in 34th Hawaii
part by the National Science Foundation under grant International Conference on System Sciences (IEEE Computer
number DMS – 0405348. Society, New York, 2001).
[33] M.E.J. Newman, S. Forrest and J. Balthrop, Phys. Rev. E 66 035101
(2002).
[34] E. Limpert, W.A. Stahel and M. Abbt, Bioscience 51 341 (2001).
[35] H.A. Simon, Biometrika 42 425 (1955).
References [36] G.U. Yule, Phil. Trans. R. Soc. (London) B 213 21 (1925).
[37] G.A. Miller, Am. J. Psychol. 70 311 (1957).
[1] F. Auerbach, Petermanns Geogr. Mitteilung. 59 74 (1913). [38] W. Li, IEEE Trans. Inf. Theory 38 1842 (1992).
[2] G.K. Zipf, Human Behaviour and the Principle of Least Effort [39] C.E. Shannon, Bell Syst. Techn. J. 27 379 (1948).
(Addison-Wesley, Reading, MA, 1949). [40] C.E. Shannon, Bell Syst. Techn. J. 27 623 (1948).
[3] B. Gutenberg and R. F. Richter, Bull. Seismol. Soc. Am. 34 185 [41] T.M. Cover and J.A. Thomas, Elements of Information Theory (John
(1944). Wiley, New York, 1991).
[4] G. Neukum and B.A. Ivanov, in Hazards Due to Comets and [42] B.B. Mandelbrot, in Symposium on Applied Communications
Asteroids, edited by T. Gehrels (University of Arizona Press, Tucson, Theory, edited by W. Jackson (Butterworth, Woburn, MA, 1953),
AZ, 1994), pp. 359 – 416. pp. 486 – 502.
[5] E.T. Lu and R.J. Hamilton, Astrophys. J. 380 89 (1991). [43] W.J. Reed and B.D. Hughes, Phys. Rev. E 66 067103 (2002).
[6] M.E. Crovella and A. Bestavros, in Proceedings of the 1996 ACM [44] N. Jan, L. Moseley, T. Ray, et al., Adv. Complex Syst. 2 137 (1999).
SIGMETRICS Conference on Measurement and Modeling of [45] D. Sornette, Int. J. Mod. Phys. C 13 133 (2001).
Computer Systems, edited by B.E. Gaither and D.A. Reed (Associa- [46] R.H. Swendsen and J.-S. Wang, Phys. Rev. Lett. 58 86 (1987).
tion of Computing Machinery, New York, 1996), pp. 148 – 159. [47] K. Sneppen, P. Bak, H. Flyvbjerg, et al., Proc. Natl. Acad. Sci. (USA)
[7] D.C. Roberts and D.L. Turcotte, Fractals 6 351 (1998). 92 5209 (1995).
[8] J.B. Estoup, Gammes Stenographiques (Institut Stenographique de [48] M.E.J. Newman and R.G. Palmer, Modeling Extinction (Oxford
France, Paris, 1916). University Press, Oxford, 2003).
[9] D.H. Zanette and S.C. Manrubia, Physica A 295 1 (2001). [49] J.C. Willis, Age and Area (Cambridge University Press, Cambridge,
[10] A.J. Lotka, J. Wash. Acad. Sci. 16 317 (1926). 1922).
[11] D.J. de S. Price, Science 149 510 (1965). [50] D.J. de S. Price, J. Am. Soc. Inform. Sci. 27 292 (1976).
[12] L. A. Adamic and B. A. Huberman, Q. J. Electron. Commerce 1 512 [51] P.L. Krapivsky, S. Redner and F. Leyvraz, Phys. Rev. Lett. 85 4629
(2000). (2000).
[13] R.A.K. Cox, J.M. Felton and K.C. Chung, J. Cult. Econ. 19 333 [52] A.-L. Barabási and R. Albert, Science 286 509 (1999).
(1995).
350 M.E.J. Newman
Downloaded By: [Universidad Granada] At: 09:10 3 December 2007
[53] S.N. Dorogovtsev, J.F.F. Mendes and A.N. Samukhin, Phys. Rev. of their frequency. Such a plot of rank against frequency
Lett. 85 4633 (2000).
was called by Zipf [2] a rank/frequency plot, and this
[54] R.K. Merton, Science 159 56 (1968).
[55] P.J. Reynolds, W. Klein and H.E. Stanley, J. Phys. C 10 L167 (1977).
name is still sometimes used to refer to plots of the
[56] K.G. Wilson and J. Kogut, Phys. Rep. 12 75 (1974). cumulative distribution of a quantity. Of course, many
[57] P. Bak, C. Tang and K. Wiesenfeld, Phys. Rev. Lett. 59 381 (1987). quantities we are interested in are not frequencies—they
[58] B. Drossel and F. Schwabl, Phys. Rev. Lett. 69 1629 (1992). are the sizes of earthquakes or people’s personal wealth
[59] P. Grassberger, New J. Phys. 4 17 (2002).
or whatever—but nonetheless people still talk about
[60] P. Bak, How Nature Works: The Science of Self-Organized Criticality
(Copernicus, New York, 1996).
‘rank/frequency’ plots although the name is not techni-
[61] P. Bak and C. Tang, J. Geophys. Res. 94 15635 (1989). cally accurate.
[62] Z. Olami, H.J.S. Feder and K. Christensen, Phys. Rev. Lett. 68 1244 In practice, sorting and ranking measurements and then
(1992). plotting rank against those measurements is usually the
[63] P. Bak and K. Sneppen, Phys. Rev. Lett. 74 4083 (1993).
quickest way to construct a plot of the cumulative
[64] J.M. Carlson and J. Doyle, Phys. Rev. E 60 1412 (1999).
[65] J.M. Carlson and J. Doyle, Phys. Rev. Lett. 84 2529 (2000).
distribution of a quantity. All the cumulative plots in this
[66] K. Sneppen and M.E.J. Newman, Physica D 110 209 (1997). paper were made in this way, except for the plot of the sizes
[67] D. Sornette and R. Cont, J. Phys. I 7 431 (1997). of moon craters in figure 4 (g), for which the data came
[68] D. Sornette, Phys. Rev. E 57 4811 (1998). already in cumulative form.
[69] X. Gabaix, Q. J. Econ. 114 739 (1999).
[70] P. Sibani and P.B. Littlewood, Phys. Rev. Lett. 71 1482 (1993).
[71] P. Sibani, M.R. Schmidt and P. Alstrøm, Phys. Rev. Lett. 75 2055
(1995).
n
X X
n
xi n xi
L ¼ ln PðxjaÞ ¼ ln ða1Þ ln xmin a ln ln ¼ 0; ðB5Þ
i¼1
xmin a 1 i¼1 xmin
X
n
xi
¼ n ln ða 1Þ n ln xmin a ln : ðB4Þ or
i¼1
xmin
" #1
X xi
a¼1þn ln : ðB6Þ
Now we calculate the most likely value of a by maximizing i
xmin
the likelihood with respect to a, which is the same as
maximizing the log likelihood, since the logarithm is a
monotonic increasing function. Setting dL/da = 0, we find