How Much Data Do You Need? A Pre-Asymptotic Metric For Fat-Tailedness
How Much Data Do You Need? A Pre-Asymptotic Metric For Fat-Tailedness
1
FAT TAILS STATISTICAL PROJECT 2
0.0 α
distribution, namely the Lévy α-Stable basin (of which
1.0 1.5 2.0 2.5 3.0 3.5 4.0 the Gaussian is a special case).
Student T (α) • For statistical inference, allows assessing the "speed" of
κ
the law of large numbers, expressed in change of the
1.0
mean absolute error around the average thanks to the
κ1,30 increase of sample size n.
κ1,100 • Allows comparative assessment of the "fat-tailedness" of
0.8
two different univariate distributions, when both have
κ1
finite first moment.
0.6
• Allows us to know ahead of time how many runs we need
for a Monte Carlo simulation.
0.4
3) The state of statistical inference: The last point, the
0.2
"speed", appears to have been ignored. For in the 9,400
pages of the Encyclopedia of Statistical Science [1], we were
α
unable to find a single comment as to how long it takes
1.5 2.0 2.5 3.0 3.5 4.0 to reach the asymptote, or how to deal with n summands
that are large but perhaps not sufficiently so for the so-
Fig. 2. Watching the effect of the Generalized Central Limit Theorem: Pareto called "normal approximation". Further, the entry on statistical
and Student T Distribution, in the P class, with α exponent, κ converge
to 2 − (1α<2 α + 1α≥2 2), or the Stable S class. We observe how slow
inference (authored by W. Hoeffding) explicitly brushes away
the convergence, even after 1000 summands. This discounts Mandelbrot’s the problem, stating:
assertion that an infinite variance Pareto can be subsumed into a stable "The exact distribution of a statistic is usually
distribution.
highly complicated and difficult to work with. Hence
the need to approximate the exact distribution by
a distribution of a simpler form whose properties
1) Criterion for "fat-tailedness": There are various ways are more transparent. The limit theorems of prob-
to "define" Fat Tails and rank distributions according to ability theory provide an important tool for such
each definition. In the narrow class of distributions having approximations. In particular, the classical central
all moments finite, it is the kurtosis, which allows simple limit theorems state that the sum of a large number
comparisons and measure departures from the Gaussian, which of independent random variables is approximately
is used as a norm. For the power law class, it can be the tail normally distributed under general conditions. In
exponent. One can also use extremal values, taking the prob- fact, the normal distribution plays a dominating role
ability of exceeding a maximum value, adjusted by the scale among the possible limit distributions.
(as practiced in extreme value theory). For operational uses, (...) Moreover, many statistics behave asymptoti-
practitioners’ fat-tailedness is a degree of concentration, such cally like sums of independent random variables. All
as "how much of the statistical properties will be attributable to of this helps to explain the importance of the normal
a single observation?", or, appropriately adjusted by the scale distribution as an asymptotic distribution."
(or the mean dispersion), "how much is the total wealth of a Even social science discussions of the "law of small numbers"
country in the hands of the richest individual?" [2] assume Gaussian attributes as the norm. As to extreme
Here we use the following criterion for our purpose, which value theory, the "functional law of small numbers" [3] con-
maps to the measure of concentration in the past paragraph: cerns Poisson hitting with small probabilities; more generally,
"How much will additional data (under such a probability extreme value theory (while naturally equipped with the tools
distribution) help increase the stability of the observed mean". for fat tails) is concerned with the behavior of maxima, not
The purpose is not entirely statistical: it can equally mean: averages.
"How much will adding an additional security into my port- Our motto here and elsewere is "statistics is never standard".
folio allocation (i.e., keeping the total constant) increase its This metric aims at showing how standard is standard, and
FAT TAILS STATISTICAL PROJECT 3
measure the exact departure from the standard from the III. S TABLE BASIN OF C ONVERGENCE AS B ENCHMARK
standpoint of statistical significance. Definition 2 (the class P). The P class of power laws (regular
variation) is defined for r.v. X as follows:
II. T HE M ETRIC
P = {X : P(X > x) ∼ L(x) x−α } (3)
κ1 where ∼ means that the limit of the ratio or rhs to lhs goes to
1 as x → ∞. L : [xmin , +∞) → (0, +∞) is a slowly varying
1.0
function, defined as limx→+∞ L(kx)
L(x) = 1 for any k > 0. The
constant α > 0.
0.8 Stable α=1.2
Next we define the domain of attraction of the sum of
identically distributed variables, in our case with identical
0.6
parameters.
α 1α<2 + 2 1α≥2 if X is in P
(
Definition 1 (the κ metric). Let X1 , . . . , Xn be i.i.d. random α̃ , (5)
variables with finite mean, that is E(X) < +∞. Let Sn = 2 otherwise.
X1 + X2 + . . . + Xn be a partial sum. Let M(n) = E(|Sn − Further discussions of the class S are as follows.
E(Sn )|) be the expected mean absolute deviation from the
mean for n summands. Define the "rate" of convergence for
A. Equivalence for stable distributions
n additional summands starting with n0 :
For all n0 and n ≥ 1 in the Stable S class with α̃ ≥ 1:
( 2−κ1 )
κ(n0 ,n) = 2 − α̃,
M(n) n n0 ,n
κn0 ,n = min κn0 ,n : = , n0 = 1, 2, ... ,
M(n0 ) n0 simply from the property that
1
n > n0 ≥ 1, hence M(n) = n α M(1) (6)
log(n) − log(n0 ) This, simply shows that κn0 ,n = 0 for the Gaussian.
κ(n0 , n) = 2 − . (1)
M(n)
log M(n The problem of the preasymptotics for n summands reduces
0)
to:
Further, for the baseline values n = n0 + 1, we use the • What is the property of the distribution for n0 = 1 (or
shorthand κn0 . starting from a standard, off-the shelf distribution)?
• What is the property of the distribution for n0 summands?
We can also decompose κ(n0 , n) in term of "local" interme-
• How does κn → 2 − α̃ and at what rate?
diate ones similar to "local" interest rates, under the constraint.
log(n) − log(n0 ) B. Practical significance for sample sufficiency
κ(n0 , n) = 2 − Pn log(i+1)−log(i) . (2)
i=0 2−κ(i,i+1)
Confidence intervals: As a simple heuristic, the higher
Use of Mean Deviation: Note that we use for measure of κ, the more disproportionally insufficient the confidence
dispersion around the mean the mean absolute deviation, to interval. Any value of κ above .15 effectively indicates
stay in norm L1 in the absence of finite variance –actually, a high degree of unreliability of the "normal approxima-
even in the presence of finite variance, under power law tion". One can immediately doubt the results of numerous
regimes, distributions deliver an unstable and uninformative research papers in fat-tailed domains.
second moment. Mean deviation proves far more robust there.
(Mean absolute deviation can be shown to be more "efficient" Computations of the sort done Table II for instance allows
except in the narrow case of kurtosis equals 3 (the Gaussian), us to compare various distributions under various parametri-
see a longer discussion in [4]; for other advantages, see [5].) azation. (comparing various Pareto distributions to symmetric
FAT TAILS STATISTICAL PROJECT 4
TABLE I
K APPA FOR 2 SUMMANDS , κ1 .
Distribution κ1
2 log(2) !
Student T (α) 2−
22−α Γ α− 1 ( )
2 log 2 +log(π)
2
Γ α ( )
2
log(2)
Exponential/Gamma 2− 2 log(2)−1
≈ .21
log(2)
Pareto (α) 2− 2
! !a
R α−1
log (α−1)2−α αα−1 2
−2α (y+2)−2α−1 2 −y B 1 (−α,1−α)−B y+1 (−α,1−α) dy
0 α−1
y+2 y+2
log(2)
Normal (µ, σ) with 2− √ q
ap
q ap q
ap
r √ r
2 +σ 2 +p −2 +σ 2 +p +σ 2 − 2a 1 +2 +4σ 2 + a+σ 2 + 2a 1 +2 +4σ 2
switching variance log
p−1 p−1 p−1
√ p−1 p−1
σ 2 a w.p pb .
q ap
p a+σ 2 −(p−1) +σ 2
p−1
log(2)
Lognormal (µ, σ) ≈2− r
2
.
log 1 eσ +1
2
erf
2 √
2 2
log
erf σ
√
2 2
Rz Rz 2
a B (., .)
. is the incomplete Beta function: Bz (a, b) = ta−1 (1 − t)b−1 dt; erf(.) is the error function erf(z) = √2 e−t dt.
0 π 0
b See comments and derivations in the appendix for switching both variance and mean as it can produce negative values for kappa.
( n !
ν
X Xν,i − mp B. Rate of convergence of a student T distribution to the
nmin = inf nν : E
nν Gaussian Basin
i=1
ng ! )
X X − m
g,i g
We show in the appendix –thanks to the explicit derivation
≤E , nν > 0
ng of κ for the sum of students with α = 3, the "cubic" commonly
i=1 noticed in finance –that the rate of convergence of κ to 0
which can be computed using κn = 0 for the Gaussian and under summation is slow. The semi-closed form for the density
backing our from κn for the target distribution with the simple of an n-summed cubic Student allows to complement the
approximation: result in Bouchaud and Potters [9] (see also [10], which is
1
as follows. Their approach is to separate the "Gaussian zone"
−κ −κ1
1,ng −1 1 −1 where the density is approximated by that of a Gaussian, and
nν = ng ≈ ng , ng > 1 (7)
a "power law zone" in the tails which retains the original
The approximation is owed to the slowness of convergence. So distribution with power law decline. The "crossover" between
for example, a Student T with 3 degrees of freedom (α = 3) the
p two moves right and left of the center at a rate of
requires 120 observations to get the same drop in variance n log(n) standard deviations) which is excruciatingly slow.
from averaging (hence confidence level) as the Gaussian with Indeed, one can note that more summands fall at the center
30, that is 4 times as much. The one-tailed Pareto with the of the distribution, and fewer outside of it, hence the speed of
same tail exponent α = 3 requires 543 observations to match convergence according to the central limit theorem will differ
a Gaussian sample of 30, 4.5 times more than the Student, according to whether the density concerns the center or the
which shows 1) finiteness of variance is not an indication of fat tails.
tailedness (in our statistical sense), 2) neither are tail exponents Further investigations would concern the convergence of the
good indicators 3) how the symmetric Student and the Pareto Pareto to a Levy-Stable, which so far we only got numerically.
distribution are not equivalent because of the "bell-shapedness"
of the Student (from the slowly moving function) that dampens
variations in the center of the distribution. C. The lognormal is neither thin nor fat tailed
We can also elicit quite counterintuitive results. From Eq. 7, Naively, as we can see in Figure II, at low values of the
the "Pareto 80/20" in the popular mind, which maps to a tail parameter σ, the lognormal behaves like a Gaussian, and, at
exponent around α ≈ 1.14, requires > 109 more observations high σ, it appears to have the behavior of a Cauchy of sorts
than the Gaussian. (a one-tailed Cauchy, rather a stable distribution with α = 1,
β = 1), as κ gets closer and closer to 1. This gives us an idea
IV. T ECHNICAL C ONSEQUENCES about some aspects of the debates as to whether some variable
is Pareto or lognormally distributed, such as, say, the debates
A. Some oddities with asymmetric distributions
about wealth [11], [12], [13]. Indeed, such debates can be
The stable distribution, when skewed, has the same κ index irrelevant to the real world. As P. Cirillo [14] observed, many
as a symmetric one (in other words, κ is invariant to the β cases of Paretianity are effectively lognormal situations with
parameter in Eq. 4, which conserves under summation). But high variance; the practical statistical consequences, however,
a one-tailed simple Pareto distribution is fatter tailed (for our are smaller than imagined.
purpose here) than an equivalent symmetric one.
This is relevant because the stable is never really observed
in practice and used as some limiting mathematical object, D. Can kappa be negative?
while the Pareto is more commonly seen. The point is not Just as kurtosis for a mixed Gaussian (i.e., with stochastic
well grasped in the literature. Consider the following use of the mean, rather than stochastic volatility) can dip below 3 (or
substitution of a stable for a Pareto. In Uchaikin and Zolotarev become "negative" when one uses the convention of mea-
[8]: suring Kurtosis as excess over the Gaussian by adding 3 to
Mandelbrot called attention to the fact that the use the measure), the kappa metric can become negative when
of the extremal stable distributions (corresponding kurtosis is "negative". These situations require bimodality (i.e.,
to β = 1) to describe empirical principles was a switching process between means under fixed variance, with
preferable to the use of the Zipf-Pareto distributions modes far apart in terms of standard deviation). They do not
for a number of reasons. It can be seen from appear to occur with unimodal distributions.
many publications, both theoretical and applied, that Details and derivations are presented in the appendix.
Mandelbrot’s ideas receive more and more wide
recognition of experts. In this way, the hope arises
to confirm empirically established principles in the V. C ONCLUSION AND C ONSEQUENCES
framework of mathematical models and, at the same To summarize, while the limit theorems (the law of large
time, to clear up the mechanism of the formation of numbers and the central limit) are concerned with the behavior
these principles. as n → +∞, we are interested in finite and exact n both small
These are not the same animals, even for large number of and large (and its statistical and risk implications).
summands. We may draw a few operational consequences:
FAT TAILS STATISTICAL PROJECT 6
A PPENDIX
A. Cubic Student T (Gaussian Basin)
The Student T with 3 degrees of freedom is of special interest in the literature owing to its prevalence in finance [6]. It
is often mistakenly approximated to be Gaussian owing to the finiteness of its variance. Asymptotically, we end up with a
Gaussian, but this doesn’t tell us anything about the rate of convergence. Mandelbrot and Taleb [25] remarks that the cubic
acts more like a powerlaw in the distribution of the extremes, which we will elaborate here thanks to an explicit PDF for the
sum.
Let X be a random variable distributed with density p(x):
√
6 3
p(x) = 2 , x ∈ (−∞, ∞) (8)
π (x2 + 3)
Proposition 1. Let Y be a sum of X1 , . . . , Xn , n identical
n copies of X. LetoM(n) be the mean absolute deviation from the
M(n) 1
mean for n summands. The "rate" of convergence κ1,n = κ : M(1) = n 2−κ is:
log(n)
κ1,n = 2 − (9)
log (en n−n Γ(n + 1, n) − 1)
R∞
where Γ(., .) is the incomplete gamma function Γ(a, z) = z dtta−1 e−t .
Since the mean deviation M(n):
( √
2 3
π for n = 1
M(n) = 2√ 3 n −n
(10)
π (e n Γ(n + 1, n) − 1) for n > 1
The derivations are as follows. For the pdf and the MAD we followed different routes.
We have the characteristic function for n summands:
√ √
ϕ(ω) = (1 + 3|ω|)n e−n 3 |ω|
The pdf of Y is given by: Z ∞ √ √
1
p(y) = (1 + 3 ω)n e−n 3ω
cos(ωy) dω
π 0
After arduous integration we get the result in 10. Further, since the following result does not appear to be found in the literature,
we have a side useful result: the PDF of Y can be written as
iy
2iy
n− √ √
e 3 e 3 E−n n + √iy3 + E−n n − √iy3
p(y) = √ (11)
2 3π
R ∞ t(−z)
where E(.) (.) is the exponential integral En z = 1 e tn dt.
Note the following identities (from the updating of Abramowitz and Stegun) [26]
n
(n − 1)! X nm
n−n−1 Γ(n + 1, n) = E−n (n) = e−n
nn m=0 m!
As to the asymptotics, we have the following result (proposed by Michail Loulakis): Reexpressing Eq. 10:
√ n−1
2 3n! X nm
M(n) =
πnn m=0 m!
Further,
n−1
nm
X 1 1
e−n = +O √
m=0
m! 2 n
(From the behavior of the sum of Poisson variables as they converge to a Gaussian by the central limit theorem:
Pn−1 m
e−n m=0 nm! = P(Xn < n) where Xn is a Poisson random variable with parameter n. Since the sum of n independent
Poisson random variables with parameter
√ 1 is Poisson with parameter n, the Central Limit Theorem says the probability
distribution of Zn = (Xn − n)/ n approaches a standard normal distribution. Thus P(Xn < n) = P(Zn < 0) → 1/2 as
2
n nn−1 n
n → ∞.2 For another approach, see [27] for proof that 1 + 1! + n2! + · · · + (n−1)! ∼ e2 .)
√
Using the property that lim n!exp(n)
√
nn n
= 2π, we get the following exact asymptotics:
n→∞
π2
lim log(n)κ1,n =
n→∞ 4
1
thus κ goes to 0 (i.e, the average becomes Gaussian) at speed log(n) , which is excruciatingly slow. In other words, even with
106 summands, the behavior cannot be summarized as that of a Gaussian, an intuition often expressed by B. Mandelbrot [25].
B. Lognormal Sums
From the behavior of its cumulants for n summands, we can observe that a sum behaves likes a Gaussian when σ is low,
and as a lognormal when σ is high –and in both cases we know explicitly κn .
The lognormal (parametrized with µ and σ) doesn’t have an explicit characteristic function. P But we can get cumulants Ki
of all orders i by recursion and for our case of summed identical copies of r.v. Xi , Kin = Ki ( n Xi ) = nKi (X1 ).
Cumulants:
σ2
K1n = neµ+ 2 2 2
K2n = n eσ − 1 e2µ+σ
2 2 2 3σ 2
K3n = n eσ − 1 eσ + 2 e3µ+ 2
K4n = . . .
√ 2 σ2 1 (2µ+σ2 )−µ− σ2 2
2
2
eσ −1 e +2 e 2 2 e2σ eσ eσ +2 +3 −6
Which allow us to compute: Skewness = √
n
and Kurtosis = 3 + n
We can immediately prove from the cumulants/moments that:
lim κ1,n = 0, lim κ1,n = 0
n→+∞ σ→0
log(n)
κ∗1,n = 2 − v
u 2
!
u n+eσ −1
tlog n
nerf
√
2 2
log
erf σ
√
2 2
1) Heuristic attempt: Among other heuristic approaches, we can see in two steps how 1) under high values of σ, κ1,n → κ∗1,n ,
σ→∞
since the law of large numbers slows down, and 2) κ∗1,n → 1.
2) Loulakis’ Proof: Proving the upper bound, that for high variance κ1,n approaches 1Rhas been shown formallyR my Michail
∞ ∞
Loulakis3 which we summarize as follows. We start with the identify E (|X − m|) = 2 m (xR− m)f (x)dx = 2 m F̄X (t)dt,
∞
where f (.) isthe density,
m is the mean, and F̄X (.) is the survival function. Further, M(n) = 2 nm F̄ (x)dx. Assume µ = 21 σ 2 ,
2 ∞
or X = exp σZ − σ2 where Z is a standard normal variate. Let Sn be the sum X1 +. . .+Xn ; we get M(n) = 2 n P(Sn >
R
2
t)dt. Using the property of subexponentiality ([28]), P(Sn > t) ≥ P(max0<i≤n (Xi ) > t) ≥ nP(X1 > t) − n2 P (X1 > t) .
σ→∞
Now P (X1 > t) → 1 and the second term to 0 (using Hölder’s inequality).
Skipping steps, we get lim inf M(n) M(n)
M(1) ≥ n, while at the same time we need to satisfy the bound M(1) ≤ n. So for σ → ∞
σ→∞
σ→∞
, M(n)
M(1) = n, hence κ1,n → 1.
3) Pearson Family approach for computation: For computational purposes, for the σ parameter not too large (below ≈ .3,
we can use the Pearson family for computational convenience –although the lognormal does not belong to the Pearson class
(the normal does, but we are close enough for computation). Intuitively, at low sigma, the first four moments can be sufficient
because of the absence of large deviations; not at higher sigma for which conserving the lognormal would be the right method.
The use of Pearson class is practiced in some fields such as information/communication theory, where there is a rich literature:
for summation of lognormal variates see Nie and Chen, [29], and for Pearson IV, [30], [31].
The Pearson family is defined for an appropriately scaled density f satisfying the following differential equation.
(a0 + a1 x)
f 0 (x) = − f (x) (12)
b0 + b1 x + b2 x2
3 Review of this paper; Loulakis proposed a formal proof in place of the heuristic derivation.
FAT TAILS STATISTICAL PROJECT 9
We note that our parametrization of a0 , b2 , etc. determine the distribution within the Pearson class –which appears to be
the Pearson IV. Finally we get an expression of mean deviation as a function of n, σ, and µ.
Let m be the mean. Diaconis et al [32] from an old trick by R∞De Moivre, Suzuki [33] show that we can get explicit mean
absolute deviation. Using, again, the identity E(|X − m|) = 2 m (x − m)f (x)dx and integrating by parts,
2 b0 + b1 m + b2 m 2
E(|X − m|) = f (m) (13)
a1 − 2b2
1 −a0
We use cumulants of the n-summed lognormal to match the parameters. Setting a1 = 1, and m = b1−2b 2
, we get
σ2
2 2 2 2 2 2
eµ+ 2 −12n2 +(3−10n)e4σ +6(n−1)eσ +12(n−1)e2σ −(8n+1)e3σ +3e5σ +e6σ +12
a =
0
2(6(n−1)+e2σ2 (eσ2 (5eσ2 +4)−3))
2σ 2
2
σ2
σ
e e −1 2e +3
b2 = 2 6(n−1)+e2σ2 eσ2 5eσ2 +4 −3
( ( ( ) ))
2
µ+ σ
2
σ2
2 2 2
2
σ σ σ
e −1 e 2 e e e −4n+eσ eσ +4 +7 −6n+6 +6(n−1) +12(n−1)
b1 = 2(6(n−1)+e2σ2 (eσ2 (5eσ2 +4)−3))
n eσ −1 e ( ) eσ −2(n−1)eσ2 −3n+e3σ2 +3+6(n−1)
2 2 µ+σ2 2
b0 = −
2(6(n−1)+e2σ2 (eσ2 (5eσ2 +4)−3))
4) Polynomial expansions: Other methods, such as Gram-Charlier expansions, such as Schleher [34], Beaulieu,[35], proved
less helpful to obtain κn . At high values of σ, the approximations become unstable as we include higher order Lhermite
polynomials. See review in Dufresne [36] and [37].
C. Exponential
The exponential is the "entry level" fat tails, just at the border.
f (x) = λe−λx , x ≥ 0.
Ry
By convolution the sum Z = X1 , X2 , . . . Xn we get, by recursion, since f (y) = 0
f (x)f (y − x) dx = λ2 ye−λy :
λn z n−1 e−λz
fn (z) = (14)
(n − 1)!
which is the gamma distribution; we get the mean deviation for n summands:
2e−n nn
M(n) = , (15)
λΓ(n)
hence:
log(n)
κ1,n = 2 − (16)
n log(n) − n − log(Γ(n)) + 1
We can see the asymptotic behavior is equally slow (similar to the student) although the exponential distribution is sitting
at the cusp of subexponentiality:
lim log(n)κ1,n = 4 − 2 log(2π)
n→∞
FAT TAILS STATISTICAL PROJECT 10
D. Negative kappa
1
Consider the simple case of a Gaussian with switching means and variance: with probability 2, X ∼ N (µ1 , σ1 ) and with
probability 21 , X ∼ N (µ2 , σ2 ). The kurtosis will be
2
2 (µ1 − µ2 ) 4 − 6 σ12 − σ22
Kurtosis = 3 − 2 (17)
2
(µ1 − µ2 ) + 2 (σ12 + σ22 )
As we see the kurtosis is a function of d = µ1 − µ2 . For situations where σ1 = σ2 , µ1 6= µ2 , the kurtosis will be below
that of the regular Gaussian, and our measure will naturally be negative. In fact for the kurtosis to remain above 3,
√
4
p
|d|≤ 6 max(σ1 , σ2 )2 − min(σ1 , σ2 )2 ,
the stochasticity of the mean offsets the stochasticity of volatility.
These situations with thinner tails than the Gaussian are encountered with bimodal situations where µ1 and µ2 are separated;
the effect becomes acute when they are separated by several standard deviations. Let d= µ1 − µ2 and σ = σ1 = σ2 (to achieve
minimum kurtosis),
log(4)
κ1 = d2 √ d2
! +2 (18)
√
πde 4σ2 erf( 2σ
d
)+2 σ2 e 4σ2 +2σ
log(π) − 2 log d2 √ d2
de 4σ2 erf 2√d2σ +2 π 2
σe 8σ2
R EFERENCES [32] P. Diaconis and S. Zabell, “Closed form summation for classical
distributions: variations on a theme of de moivre,” Statistical Science,
[1] S. Kotz and N. Johnson, Encyclopedia of Statistical Sciences. Wiley, pp. 284–302, 1991.
2004. [33] G. Suzuki, “A consistent estimator for the mean deviation of the pearson
[2] A. Tversky and D. Kahneman, “Belief in the law of small numbers.” type distribution,” Annals of the Institute of Statistical Mathematics,
Psychological bulletin, vol. 76, no. 2, p. 105, 1971. vol. 17, no. 1, pp. 271–285, 1965.
[3] M. Falk, J. Hüsler, and R.-D. Reiss, Laws of small numbers: extremes [34] D. Schleher, “Generalized gram-charlier series with application to the
and rare events. Springer Science & Business Media, 2010. sum of log-normal variates (corresp.),” IEEE Transactions on Informa-
[4] N. N. Taleb, Technical Incerto Vol 1: The Statistical Consequences of tion Theory, vol. 23, no. 2, pp. 275–280, 1977.
Fat Tails, Papers and Commentaries. Monograph, 2018. [35] N. C. Beaulieu, A. A. Abu-Dayya, and P. J. McLane, “Estimating
[5] T. Pham-Gia and T. Hung, “The mean and median absolute deviations,” the distribution of a sum of independent lognormal random variables,”
Mathematical and Computer Modelling, vol. 34, no. 7-8, pp. 921–936, Communications, IEEE Transactions on, vol. 43, no. 12, p. 2869, 1995.
2001. [36] D. Dufresne, “Sums of lognormals,” in Proceedings of the 43rd actuarial
[6] X. Gabaix, “Power laws in economics and finance,” National Bureau of research conference. University of Regina, 2008.
Economic Research, Tech. Rep., 2008. [37] D. Dufresne et al., “The log-normal approximation in financial and other
[7] N. N. Taleb, “Finiteness of variance is irrelevant in the practice of computations,” Advances in Applied Probability, vol. 36, no. 3, pp. 747–
quantitative finance,” Complexity, vol. 14, no. 3, pp. 66–76, 2009. 773, 2004.
[8] V. V. Uchaikin and V. M. Zolotarev, Chance and stability: stable
distributions and their applications. Walter de Gruyter, 1999.
[9] J.-P. Bouchaud and M. Potters, Theory of financial risk and derivative
pricing: from statistical physics to risk management. Cambridge
University Press, 2003.
[10] D. Sornette, Critical phenomena in natural sciences: chaos, fractals,
selforganization, and disorder: concepts and tools. Springer, 2004.
[11] B. Mandelbrot, “The pareto-levy law and the distribution of income,”
International Economic Review, vol. 1, no. 2, pp. 79–106, 1960.
[12] C. Dagum, “Inequality measures between income distributions with
applications,” Econometrica, vol. 48, no. 7, pp. 1791–1803, 1980.
[13] ——, Income distribution models. Wiley Online Library, 1983.
[14] P. Cirillo, “Are your data really pareto distributed?” Physica A: Statis-
tical Mechanics and its Applications, vol. 392, no. 23, pp. 5947–5962,
2013.
[15] H. Markowitz, “Portfolio selection*,” The journal of finance, vol. 7,
no. 1, pp. 77–91, 1952.
[16] H. Neth and G. Gigerenzer, “Heuristics: Tools for an uncertain world,”
Emerging trends in the social and behavioral sciences: An Interdisci-
plinary, Searchable, and Linkable Resource, 2015.
[17] N. N. Taleb, Skin in the Game: Hidden Asymmetries in Daily Life.
Penguin (London) and Random House (N.Y.), 2018.
[18] E. O. Thorp, “Optimal gambling systems for favorable games,” Revue
de l’Institut International de Statistique, pp. 273–293, 1969.
[19] N. N. Taleb, “Errors, robustness, and the fourth quadrant,” International
Journal of Forecasting, vol. 25, no. 4, pp. 744–759, 2009.
[20] S. Benartzi and R. H. Thaler, “Naive diversification strategies in defined
contribution saving plans,” American economic review, vol. 91, no. 1,
pp. 79–98, 2001.
[21] S. Benartzi and R. Thaler, “Heuristics and biases in retirement savings
behavior,” Journal of Economic perspectives, vol. 21, no. 3, pp. 81–104,
2007.
[22] H. Windcliff and P. P. Boyle, “The 1/n pension investment puzzle,” North
American Actuarial Journal, vol. 8, no. 3, pp. 32–45, 2004.
[23] V. DeMiguel, L. Garlappi, and R. Uppal, “Optimal versus naive diver-
sification: How inefficient is the 1/n portfolio strategy?” The review of
Financial studies, vol. 22, no. 5, pp. 1915–1953, 2007.
[24] S. Makridakis and N. Taleb, “Decision making and planning under low
levels of predictability,” 2009.
[25] B. B. Mandelbrot and N. N. Taleb, “Random jump, not random walk,”
2010.
[26] “NIST Digital Library of Mathematical Functions,” http://dlmf.nist.gov/,
Release 1.0.19 of 2018-06-22, f. W. J. Olver, A. B. Olde Daalhuis,
D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller
and B. V. Saunders, eds. [Online]. Available: http://dlmf.nist.gov/
[27] D. J. Newman, A problem seminar. Springer Science & Business Media,
2012.
[28] E. Pitman, “Subexponential distribution functions,” J. Austral. Math.
Soc. Ser. A, vol. 29, no. 3, pp. 337–347, 1980.
[29] H. Nie and S. Chen, “Lognormal sum approximation with type iv
pearson distribution,” IEEE Communications Letters, vol. 11, no. 10,
2007.
[30] S. Chen, H. Nie, and B. Ayers-Glassey, “Lognormal sum approximation
with a variant of type iv pearson distribution,” IEEE Communications
Letters, vol. 12, no. 9, 2008.
[31] M. Di Renzo, F. Graziosi, and F. Santucci, “Further results on the
approximation of log-normal power sum via pearson type iv distribution:
a general formula for log-moments computation,” IEEE Transactions on
Communications, vol. 57, no. 4, 2009.