Random Numbers with Applications: An Overview
An-Der Lin
Department of Medical Information Management
Kaohsiung Medical University, Kaohsiung, Taiwan 807;
November 22, 2008
Abstract
Randomness plays a central role in statistics sampling, probability, gam-
bling, cryptography, and everything with unpredictability. Different random-
ization methods are applied to different requirements. Major applications of
randomness will be reviewed. A brief introduction of the random number
generation will be given.
1 Introduction
Randomness is an absence of order, purpose, cause, or predictability. A random process is
defined as a reiterating process whose outcomes have no describable deterministic pattern,
but follow a probability distribution. Randomness connects with many important scientific
fields like chaos theory, cryptography, game theory, information theory, pattern recognition,
probability theory, quantum mechanics, statistics, and statistical mechanics. From the
beginning of history, gambling is one of the oldest human behavior in society, which inspires
the study and development of probability theory. In finance, the asset prices in an organized
market evolve at random, which is assumed to satisfy the random walk hypothesis. In
statistics and real circumstance, sampling applies to industry quality control, political
opinion polls, engineering, pure and social science, and many different areas, in which the
use of random numbers is especially important. Certain types of computational simulations
and solutions use random numbers widely, such as in the Monte Carlo method and in
genetic algorithms. In medical applications, random allocation of a clinical intervention
is used to reduce bias in controlled trials. Computers have been extensively applied to
academic research since mid-1970. The development of world wide web leads the booming
of internet communications. After then, cryptography has become a widely used tool in
communications, computer networks, and computer security. Many modern cryptographic
techniques are based on the difficulty of certain computational problems, such as the integer
factorization or the discrete logarithm problems, which need the usage of random numbers
extensively.
1
A random number generator (RNG) is a computational or physical device designed
to generate a sequence of numbers or symbols that lack any pattern. Computer-based
systems for random number generation are widely used, but often fall short of this goal,
though they may meet some statistical tests for randomness intended to ensure that they
do not have any easily discernible patterns. Methods for generating random results have
existed since ancient times, including dice, coin flipping, the shuffling of playing cards,
and many other techniques. Many applications of randomness have led to many different
methods for generating random data. These methods may vary as to how unpredictable or
statistically random they are, and how quickly they can generate random numbers. Some
physical phenomena, such as thermal noise in Zener diodes appear to be truly random and
can be used as the basis for hardware random number generators. Pseudo-random number
generators (PRNG) are algorithms that can automatically create long runs (millions of
numbers long) with good random properties but eventually the sequence repeats exactly
(or the memory usage grows without bound).
Three major mechanisms induce random behavior in systems: The first mechanism
comes from nature environment. One daily example is Brownian motion, which is a
common phenomenon in biology, botany, aerodynamics, and meteorology. The second
mechanism is from chaos theory. The randomness is caused by the initial conditions. In
certain dynamics systems, their behavior is very sensitive to small variations in initial
conditions. Any small perturbation in initial condition will cause chaos in the system.
The third mechanism is so called pseudo-randomness, which is obtained by different al-
gorithms as pseudo-random number generators. These methods are quicker than getting
”true” randomness from the environment. Nevertheless, a ”random number generator”
based solely on deterministic computation cannot be regarded as a ”true” random number
generator, since its output is inherently predictable. How to distinguish a ”true” random
number from the output of a pseudo-random number generator is a very difficult problem.
However, carefully chosen pseudo-random number generators can be used instead of true
random numbers in many applications. Rigorous statistical analysis of the output is often
needed to have confidence in the algorithm.
Randomness needs to be tested. There are many practical measures of randomness
based on frequency, discrete transforms, and complexity or a mixture of all. These include
tests by Kak [9], Phillips [21], Yuen [23], Hopkins [8], Beth and Dai [2], Mund [19], and
Marsaglia and Zaman [14]. A PRNG can be started from an arbitrary starting state, using
a seed state. It will always produce the same sequence thereafter when initialized with that
state. The maximum length of the sequence before it begins to repeat is determined by
the size of the state, measured in bits. However, since the length of the maximum period
potentially doubles with each bit of ’state’ added, it is easy to build PRNGs with periods
long enough for any practical application.
If a PRNG’s internal state contains n bits, its period can be no longer than 2n results.
For some PRNGs the period length can be calculated without walking through the whole
period. Linear Feedback Shift Registers (LFSRs) are usually chosen to have periods of
exactly 2n − 1. Linear congruential generators have periods that can be calculated by fac-
2
toring. Mixes (no restrictions) have periods of about 2n/2 on average, usually after walking
through a nonrepeating starting sequence. Mixes that are reversible (permutations) have
periods of about 2n−1 on average, and the period will always include the original internal
state [12]. Although PRNGs will repeat their results after they reach the end of their
period, a repeated result does not imply that the end of the period has been reached.
It is an open question, and one central to the theory and practice of cryptography,
whether there is any way to distinguish the output of a high-quality PRNG from a truly
random sequence without knowing the algorithm(s) used and the state with which it was
initialized. The security of most cryptographic algorithms and protocols using PRNGs
is based on the assumption that it is infeasible to distinguish use of a suitable PRNG
from a random sequence. The simplest examples of this dependency are stream ciphers,
which (most often) work by exclusive oring the plaintext of a message with the output
of a PRNG, producing ciphertext. The design of cryptographically adequate PRNGs is
extremely difficult.
2 Methods to generate random numbers
1. Physical methods: physical processes can generate ”true” random numbers. In the laws
of quantum mechanics, the true randomness and unpredictability can be obtained by an
essentially random atomic or subatomic physical phenomenon. An example is the Atari 8-
bit computers, which used electronic noise from an analog circuit to generate true random
numbers [17]. Others include radioactive decay, thermal noise [5], shot noise [7] and clock
drift [22]. Even lava lamps have been used by the Lavarand generator [18].
2. Computational methods: An early computer-based pseudo-random number generator
(PRNG), known as the middle-square method, was suggested by John von Neumann in
1946. It squares any chosen number, remove the middle digits of the resulting number as
the ”random number”, then use this number as the seed for the next iteration. The prob-
lem of this method is all sequences eventually repeat themselves, so it has been supplanted
by more elaborate generators.
Linear Congruential Generator: One of the most common pseudo-random number gener-
ators is the linear congruential generator, which uses integer constants then applies the
recurrence
Xn+1 = (aXn + b) mod m
to generate integers, where Xn is the sequence of random integer values, 0 < m is the
”modulus”, a is the ”multiplier” with 0 < a < m, b is the ”increment” with 0 ≤ a < m(the
special case of c = 0 corresponds to Park-Miller RNG [20]), and X0 is the ”seed” or
”start value” with 0 < X0 < m. Many computer experiments require the generation of
pseudorandom numbers between 0 and 1. To generate such numbers, we divide numbers
3
generated with a linear congruential generator by the modulus: that is, we use the numbers
Xn /m.
The period of a general LCG is at most m, and for some choices of a much less than
that. The LCG will have a full period [11] if and only if:
1. c and m are relatively prime,
2. a − 1 is divisible by all prime factors of m,
3. a − 1 is a multiple of 4 if m is a multiple of 4.
While LCGs are capable of producing decent pseudorandom numbers, this is extremely
sensitive to the choice of the coefficients c, m, and a. There are other uniformly distrib-
uted generators similar to LCG, such as lagged Fibonacci generators, linear feedback shift
registers and generalised feedback shift registers. Recent instances of pseudo-random algo-
rithms include Blum Blum Shub [3], Fortuna [6], and the Mersenne twister [16].
Mersenne Twister: Makoto Matsumoto and Takuji Nishimura proposed a pseudo random
number generator called the Mersenne twister [16], which is based on a matrix linear recur-
rence over a finite binary field F2 , and generates very high-quality pseudo random numbers
speedily. The Mersenne Twister algorithm is a twisted generalised feedback shift register
[15] of rational normal form (TGFSR(R)), with state bit reflection and tempering. It is
characterized by the following quantities:
w: word size (in number of bits)
n: degree of recurrence
m: middle word, or the number of parallel sequences, 1 ≤ m ≤ n
r: separation point of one word, or the number of bits of the lower bitmask, 0 ≤ r ≤
w−1
a: coefficients of the rational normal form twist matrix
b, c: TGFSR(R) tempering bitmasks
s, t: TGFSR(R) tempering bit shifts
u, l: additional Mersenne Twister tempering bit shifts
with the restriction that 2nw−r − 1 is a Mersenne prime. This choice simplifies the
primitivity test and k-distribution test that are needed in the parameter search.
For a word x with w bit width, it is expressed as the recurrence relation
³ ´
xk+n := xk+m ⊕ xuk | xlk+1 A k = 0, 1, . . .
with | as the bitwise or and ⊕ as the bitwise exclusive or (XOR), xu , xl being x with upper
and lower bitmasks applied. The twist transformation A is defined in rational normal form
µ ¶
0 Iw−1
A=R=
aw−1 (aw−2 , · · · a0 )
4
with In−1 as the (n − 1) × (n − 1) identity matrix (and in contrast to normal matrix
multiplication, bitwise XOR replaces addition). The rational normal form has the benefit
that it can be efficiently expressed as
½ ³ ´
xÀ1 x0 = 0
xA = where x := xuk | xlk+1 k = 0, 1, . . .
(x À 1) ⊕ a x0 = 1
In order to achieve the 2nw−r − 1 theoretical upper limit of the period in a TGFSR, ϕB (t)
must be a primitive polynomial, ϕB (t) being the characteristic polynomial of
0 Iw ··· 0 0
..
.
.. .. .
.. ..
. µ ¶
Iw . .
0 Ir
B = .. S= A
. Iw−r 0
0 0 ··· Iw 0
0 0 ··· 0 Iw−r
S 0 ··· 0 0
The twist transformation improves the classical GFSR with the following key properties:
Period reaches the theoretical upper limit 2nw−r − 1 (except if initialized with 0)
Equidistribution in n dimensions (e.g. linear congruential generators can at best man-
age reasonable distribution in 5 dimensions)
As like TGFSR(R), the Mersenne Twister is cascaded with a tempering transform to
compensate for the reduced dimensionality of equidistribution (because of the choice of A
being in the rational normal form), which is equivalent to the transformation A = R →
A = T −1 RT , T invertible. The tempering is defined in the case of Mersenne Twister as
y := x ⊕ (x À u)
y :=: y ⊕ ((y ¿ s)&b)
y :=: y ⊕ ((y ¿ t)&c)
z := y ⊕ (y À l)
with ¿, À as the bitwise left and right shifts, and & as the bitwise and. The first and
last transforms are added in order to improve lower bit equidistribution. From the property
of TGFSR, s + t ≥ bw/2c − 1 is required to reach the upper bound of equidistribution for
the upper bits.
Non-Uniform Generators: Numbers between zero and one selected from a non-uniform
probability distribution can be generated using a uniform distribution PRNG and a func-
tion that relates the two distributions.
5
The cumulative distribution function F (b) of the target distribution f (b) needs to be
built: Z b
F (b) = f (b0 ) db0 ,
−∞
where 0 = F (−∞) ≤ F (b) ≤ F (∞) = 1. Using a random number from a uniform
distribution c as the probability density to ”pass by”, we get F (b) = c so that b = F −1 (c)
is a number randomly selected from distribution f (b).
For example the inverse of cumulative Gaussian distribution (error function) erf−1 (x)
with an ideal uniform PRNG with range (0, 1) as input x would produce a sequence of
positive values with a Gaussian distribution; however
when using practical number representations the infinite ”tails” of the distribution have
to be truncated to finite values.
Repetitive recalculation of erf−1 (x) should be reduced by means such as ziggurat algo-
rithm [13] for faster generation.
Similar considerations apply to generating other non-uniform distributions such as
Rayleigh (continuous) and Poisson (discrete) distributions.
There are other methods to generate random numbers. We only list the most common
PRNGs for applications. The other methods are discussed in the references [1, 10, 11, 4].
3 Cryptography
A omnipresent use of unpredictable random numbers is in cryptography which exists almost
everywhere to provide security in modern communications (e.g., confidentiality, authenti-
cation, electronic commerce, etc.).
For example, if a user wants to use an encryption algorithm, it is best that they
select a random number as the ”key”. These numbers must be completely unguessable
to anyone else. The only way to practically manufacture such numbers is to use random
numbers. If this is not done properly, security can be compromised. A simple 32 bit
linear congruential pseudo-random number generator will produce four billion possible keys
before the generator repeats itself, with most programming languages is used. A suitably
motivated adversary could simply test them all. Even if a more sophisticated random
number generator is used, its seed might be guessed, and then keys can be predicted.
(An example was famously discovered in an early release of Netscape Navigator, forcing
the authors to quickly find a source of ”more random” random numbers). Thus for this
application, some truly random numbers are required.
Truly random numbers are absolutely required to be assured of the theoretical secu-
rity provided by the one-time pad – the only provably unbreakable encryption algorithm.
Furthermore, those random sequences cannot be reused and must never become available
6
to any attacker, which implies a continuously operable generator. The Venona gives an
example of what happens when these requirements are violated when using a one-time pad.
(The Venona project was a long-running and highly secret collaboration between intelli-
gence agencies of the United States and United Kingdom that involved the cryptanalysis
of messages sent by several intelligence agencies of the Soviet Union, mostly during World
War II.)
For cryptographic purposes, one normally assumes some upper limit on the work an
adversary can do (usually this limit is astronomically sized). If one has a pseudo-random
number generator whose output is ”sufficiently difficult” to predict for an unknown seed
(such as a stream cipher), one can generate true random numbers to fill the seed and then
use the pseudo-random numbers in cryptographic applications. Such random number gen-
erators are called cryptographically secure pseudo-random number generators, and several
have been implemented (for example, the /dev/urandom device available on most Unixes,
the Yarrow server, and AT&T Bell Labs ”truerand”). As with all cryptographic software,
there are subtle issues beyond those discussed here, so care is certainly indicated in actual
practice. In any case, it is often impossible to avoid the need for true (i.e., hardware)
random number generators.
Since a requirement in cryptography is unpredictability to an attacker, any published
random sequence is a poor choice, as are such sequences as the digits in an irrational
number such as the ϕ (golden ratio) or even in transcendental numbers such as π, or
e. Put another way, in cryptography, random bit streams need to be not only random,
but also secret and hence unpredictable. Public or third-party sources of random values,
or random values computed from publicly observable phenomena (weather, sports game
results, stock prices), are almost never cryptographically acceptable, though often tempting
and too often used by the unwary. They permit attacks that should never be allowed.
Since most cryptographic applications require a few thousand bits at most, slow random
number generators serve well - if they are actually random. This use of random generators is
important; many informed observers believe every computer should have a way to generate
true random numbers.
4 Conclusion
Random numbers are useful for a variety of purposes, such as generating data encryption
keys, simulating and modeling complex phenomena and for selecting random samples from
larger data sets. They have also been used aesthetically, for example in literature and
music, and are of course ever popular for games and gambling. When discussing single
numbers, a random number is one that is drawn from a set of possible values, each of
which is equally probable, i.e., a uniform distribution. When discussing a sequence of
random numbers, each number drawn must be statistically independent of the others.
With the advent of computers, programmers recognized the need for a means of intro-
ducing randomness into a computer program. However, surprising as it may seem, it is
7
difficult to get a computer to do something by chance. A computer follows its instructions
blindly and is therefore completely predictable. (A computer that doesn’t follow its in-
structions in this manner is broken.) There are two main approaches to generating random
numbers using a computer: Pseudo-Random Number Generators (PRNGs) and True Ran-
dom Number Generators (TRNGs). The approaches have quite different characteristics
and each has its properties and conditions.
Acknowledgements This overview adopts many articles in wikipedia.org and ran-
dom.org (http://www.wikipedia.org and http://www.random.org). Thanks to Prof.
Yiter Chen in KMU, Taiwan for his kind invitation.
References
[1] Deborah J. Bennett. Randomness. Harvard University Press, 1998.
[2] T. Beth and Z-D. Dai. On the complexity of pseudo-random sequences. Advances in
Cryptology – EUROCRYPT, pages 533–543, 1989.
[3] Lenore Blum, Manuel Blum, and Michael Shub. A simple unpredictable pseudo-
random number generator. SIAM Journal on Computing, 15:364–383, 1986.
[4] Gregory Chaitin. Exploring Randomness. Springer-Verlag London, 2001.
[5] J. H. Constable. Investigation of environmental noise in small electrical conductors.
IEEE Transactions on Instrumentation and Measurement, 55(6):2045–2054, 2006.
[6] Niels Ferguson and Bruce Schneier. Practical Cryptography. Wiley, ISBN 0-471-22357-
3, 2003.
[7] G. Gomila and L. Reggiani. Anomalous crossover between thermal and shot noise in
macroscopic diffusive conductors. Physical Review B, 62(12):8068–8071, 2000.
[8] T. Hopkins. A revised algorithm for the spectral test. Applied Statistics, 32:328–335,
1983.
[9] S. Kak. Classification of random binary sequences using walsh-fourier analysis. Pro-
ceedings of Applications of Walsh Functions, pages 74–77, 1971.
[10] Olav Kallenberg. Random Measures, 4th Ed. Academic Press, 1986.
[11] D. E. Knuth. The Art of Computer Programming, 3rd Ed., volume 2, Seminumerical
Algorithms. Addison-Wesley, 1997.
[12] David J.C. MacKay. Information Theory, Inference, and Learning Algorithms. Cam-
bridge Univ. Press, 2003.
8
[13] G. Marsaglia and W. W. Tsang. The ziggurat method for generating random variables.
Journal of Statistical Software, 5(8), 2000.
[14] G. Marsaglia and A. Zaman. Monkey tests for random number generators. Computers
and Mathematics with Applications, 26(9):1–10, 1993.
[15] M. Matsumoto and Y. Kurita. Twisted gfsr generators. ACM Trans. Model. Comput.
Simul., 2(3):179–194, 1992.
[16] M. Matsumoto and T. Nishimura. Mersenne twister: a 623-dimensionally equidistrib-
uted uniform pseudorandom number generator. ACM Trans. Model. Comput. Simul.,
8(1):3–30, 1998.
[17] David McIntosh. Random atari, enhancing the number generator. ANTIC, 7(11):12,
1989.
[18] Tom McNichol. Totally random. Wired Magazine, 11(8):1, 2003.
[19] S. Mund. Ziv-lempel complexity for periodic sequences and its cryptographic appli-
cation. Advances in Cryptology – EUROCRYPT, pages 114–126, 1991.
[20] S.K. Park and K.W. Miller. Random number generators: Good ones are hard to find.
Communications of the ACM, 31(10).
[21] J. Phillips. Autocorrelation function for a binary sequence. Applied Statistics, 21:100–
103, 1972.
[22] R. A. Serway, R. J. Beichner, and J. W. Jewett. Physics for Scientists and Engineers.
Saunders College Publishing, 2000.
[23] C. Yuen. Testing random number generators by walsh transform. IEEE Transactions
on Computers, C-26(4):329–333, 1977.