0% found this document useful (0 votes)
75 views11 pages

How Much Data Do You Need? A Pre-Asymptotic Metric For Fat-Tailedness

This document presents a new metric called kappa (κ) for measuring and comparing "fat-tailedness" or thickness of tails across different probability distributions with finite first moments. κ ranges from 0 (thin-tailed Gaussian) to 1 (maximally fat-tailed) and is based on how the mean deviation of the sum of random variables grows with increasing sample size. The author derives explicit expressions or simulation results for κ for common fat-tailed distributions like lognormal, Pareto, and Student's t. This new metric allows practical comparisons of fat-tailedness across different distributions previously not possible using conventional measures that rely on higher moments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views11 pages

How Much Data Do You Need? A Pre-Asymptotic Metric For Fat-Tailedness

This document presents a new metric called kappa (κ) for measuring and comparing "fat-tailedness" or thickness of tails across different probability distributions with finite first moments. κ ranges from 0 (thin-tailed Gaussian) to 1 (maximally fat-tailed) and is based on how the mean deviation of the sum of random variables grows with increasing sample size. The author derives explicit expressions or simulation results for κ for common fat-tailed distributions like lognormal, Pareto, and Student's t. This new metric allows practical comparisons of fat-tailedness across different distributions previously not possible using conventional measures that rely on higher moments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

FAT TAILS STATISTICAL PROJECT

How Much Data Do You Need? A Pre-asymptotic


Metric for Fat-tailedness
Nassim Nicholas Taleb
Tandon School of Engineering, New York University
November 2018
Forthcoming, International Journal of Forecasting
arXiv:1802.05495v3 [stat.ME] 26 Nov 2018

Abstract— is based on the rate of convergence of the law of large


numbers for finite sums, n-summands specifically.
This paper presents an operational metric for univariate We get either explicit expressions or simulation results
unimodal probability distributions with finite first moment, and bounds for the lognormal, exponential, Pareto, and
in [0, 1] where 0 is maximally thin-tailed (Gaussian) and 1 the Student T distributions in their various calibrations
is maximally fat-tailed. It is based on "how much data –in addition to the general Pearson classes.
one needs to make meaningful statements about a given
dataset?"
Applications: Among others, it
• helps assess the sample size n needed for statistical |S n =X1 +X2 +...+Xn |
significance outside the Gaussian, 10
• helps measure the speed of convergence to the Gaus-
sian (or stable basin), Cauchy (κ=1)
• allows practical comparisons across classes of fat- 8 Pareto 1.14
tailed distributions,
• allows the assessment of the number of securities
Cubic Student T
6
needed in portfolio construction to achieve a certain Gaussian (κ=0) Degrees of
level of risk-reduction from diversification, Fat Tailedness
• helps assess risks under various settings, 4
• helps understand some inconsistent attributes of the
lognormal, pending on the parametrization of its
variance. 2
The literature is rich for what concerns asymptotic n
2 4 6 8 10
behavior, but there is a large void for finite values of n,
those needed for operational purposes.
Background: Conventional measures of fat-tailedness, Fig. 1. The intuition of what κ is measuring: how the mean deviation of
namely 1) the tail index for the power law class, and 2) the sum of identical copies of a r.v. Sn = X1 + X2 + . . . Xn grows as the
sample increases and how we can compare preasymptotically distributions
Kurtosis for finite moment distributions fail to apply to from different classes.
some distributions, and do not allow comparisons across
classes and parametrization, that is between power laws
outside the Levy-Stable basin, or power laws to distribu-
I. I NTRODUCTION AND D EFINITIONS
tions in other classes, or power laws for different number
of summands. How can one compare a sum of 100 Student How can one compare a Pareto distribution with tail α = 2.1
T distributed random variables with 3 degrees of freedom that is, with finite variance, to a Gaussian? Asymptotically,
to one in a Levy-Stable or a Lognormal class? How can these distributions in the regular variation class with finite
one compare a sum of 100 Student T with 3 degrees of
freedom to a single Student T with 2 degrees of freedom? second moment, under summation, become Gaussian, but pre-
We propose an operational and heuristic metric that asymptotically, we have no standard way of comparing them
allows us to compare n-summed independent variables given that metrics that depend on higher moments, such as
under all distributions with finite first moment. The method kurtosis, cannot be of help. Nor can we easily compare an
infinite variance Pareto distribution to its limiting α-Stable
distribution (when both have the same tail index or tail
The author owes the most to the focused comments by Michail Loulakis exponent). Likewise, how can one compare the "fat-tailedness"
who, in addition, provided the rigorous derivations for the limits of the κ of, say a Student T with 3 degrees of freedom to that of a Levy-
for the Student T and lognormal distributions, as well as to the patience
and wisdom of Spyros Makridakis. The paper was initially presented at Stable with tail exponent of 1.95? Both distributions have a
Extremes and Risks in Higher Dimensions, Sept 12-16 2016, at the Lorentz finite mean; of the two, only the first has a finite variance but,
Center, Leiden and at Jim Gatheral’s Festschrift at the Courant Institute, in for a small number of summands, behaves more "fat-tailed"
October 2017. The author thanks Jean-Philippe Bouchaud, John Einmahl,
Pasquale Cirillo, and others. Laurens de Haan suggested changing the name according to some operational criteria.1
of the metric from "gamma" to "kappa" to avoid confusion. Additional
thanks to Colman Humphrey, Michael Lawler, Daniel Dufresne and others 1 By "fat tails" we are using the generic term used by finance practitioners
for discussions and insights with derivations. to refer to thicker tails than the Gaussian, without reference to any particular
class of distributions.

1
FAT TAILS STATISTICAL PROJECT 2

Pareto (α) stability?"


κ Our metric differs from the asymptotic measures (particu-
1.0
larly ones used in extreme value theory) in the fact that it is
Bound fundamentally preasymptotic.
0.8 ( κ1 ) κ1,30 Real life, and real world realizations, are outside the asymp-
κ1,100 tote.
0.6
2) What does the metric do: The metric we propose, κ does
κ1,1000
the following:
• Allows comparison of n−summed variables of different
0.4
distributions for a given number of summands , or same
distribution for different n, and assess the preasymptotic
0.2 Central Limit properties of a given distributions.
• Provides a measure of the distance from the limiting

0.0 α
distribution, namely the Lévy α-Stable basin (of which
1.0 1.5 2.0 2.5 3.0 3.5 4.0 the Gaussian is a special case).
Student T (α) • For statistical inference, allows assessing the "speed" of
κ
the law of large numbers, expressed in change of the
1.0
mean absolute error around the average thanks to the
κ1,30 increase of sample size n.
κ1,100 • Allows comparative assessment of the "fat-tailedness" of
0.8
two different univariate distributions, when both have
κ1
finite first moment.
0.6
• Allows us to know ahead of time how many runs we need
for a Monte Carlo simulation.
0.4
3) The state of statistical inference: The last point, the
0.2
"speed", appears to have been ignored. For in the 9,400
pages of the Encyclopedia of Statistical Science [1], we were
α
unable to find a single comment as to how long it takes
1.5 2.0 2.5 3.0 3.5 4.0 to reach the asymptote, or how to deal with n summands
that are large but perhaps not sufficiently so for the so-
Fig. 2. Watching the effect of the Generalized Central Limit Theorem: Pareto called "normal approximation". Further, the entry on statistical
and Student T Distribution, in the P class, with α exponent, κ converge
to 2 − (1α<2 α + 1α≥2 2), or the Stable S class. We observe how slow
inference (authored by W. Hoeffding) explicitly brushes away
the convergence, even after 1000 summands. This discounts Mandelbrot’s the problem, stating:
assertion that an infinite variance Pareto can be subsumed into a stable "The exact distribution of a statistic is usually
distribution.
highly complicated and difficult to work with. Hence
the need to approximate the exact distribution by
a distribution of a simpler form whose properties
1) Criterion for "fat-tailedness": There are various ways are more transparent. The limit theorems of prob-
to "define" Fat Tails and rank distributions according to ability theory provide an important tool for such
each definition. In the narrow class of distributions having approximations. In particular, the classical central
all moments finite, it is the kurtosis, which allows simple limit theorems state that the sum of a large number
comparisons and measure departures from the Gaussian, which of independent random variables is approximately
is used as a norm. For the power law class, it can be the tail normally distributed under general conditions. In
exponent. One can also use extremal values, taking the prob- fact, the normal distribution plays a dominating role
ability of exceeding a maximum value, adjusted by the scale among the possible limit distributions.
(as practiced in extreme value theory). For operational uses, (...) Moreover, many statistics behave asymptoti-
practitioners’ fat-tailedness is a degree of concentration, such cally like sums of independent random variables. All
as "how much of the statistical properties will be attributable to of this helps to explain the importance of the normal
a single observation?", or, appropriately adjusted by the scale distribution as an asymptotic distribution."
(or the mean dispersion), "how much is the total wealth of a Even social science discussions of the "law of small numbers"
country in the hands of the richest individual?" [2] assume Gaussian attributes as the norm. As to extreme
Here we use the following criterion for our purpose, which value theory, the "functional law of small numbers" [3] con-
maps to the measure of concentration in the past paragraph: cerns Poisson hitting with small probabilities; more generally,
"How much will additional data (under such a probability extreme value theory (while naturally equipped with the tools
distribution) help increase the stability of the observed mean". for fat tails) is concerned with the behavior of maxima, not
The purpose is not entirely statistical: it can equally mean: averages.
"How much will adding an additional security into my port- Our motto here and elsewere is "statistics is never standard".
folio allocation (i.e., keeping the total constant) increase its This metric aims at showing how standard is standard, and
FAT TAILS STATISTICAL PROJECT 3

measure the exact departure from the standard from the III. S TABLE BASIN OF C ONVERGENCE AS B ENCHMARK
standpoint of statistical significance. Definition 2 (the class P). The P class of power laws (regular
variation) is defined for r.v. X as follows:
II. T HE M ETRIC
P = {X : P(X > x) ∼ L(x) x−α } (3)

κ1 where ∼ means that the limit of the ratio or rhs to lhs goes to
1 as x → ∞. L : [xmin , +∞) → (0, +∞) is a slowly varying
1.0
function, defined as limx→+∞ L(kx)
L(x) = 1 for any k > 0. The
constant α > 0.
0.8 Stable α=1.2
Next we define the domain of attraction of the sum of
identically distributed variables, in our case with identical
0.6
parameters.

0.4 Definition 3. (stable S class) A random variable X fol-


Student T (3)
lows a stable (or α-stable) distribution, symbolically X ∼
or
Stable α=1.7
S(α̃, β, µ, σ), if its characteristic function χ(t) = E(eitX ) is
0.2
∼ Gaussian of the form:
 iµt−|tσ|α̃ 1−iβ tan πα̃ sgn(t)
( ( (2) )) α̃ 6= 1
0.5 1.0 1.5 2.0 2.5 3.0
σ e

χ(t) = ,
Fig. 3. The lognormal distribution behaves like a Gaussian for low values
 it( 2βσ log(σ)

+µ)−|tσ|(1+
2iβ sgn(t) log(|tσ|)
)
e π π α̃ = 1
of σ, but becomes rapidly equivalent to a power law. This illustrates why, (4)
operationally, the debate on whether the distribution of wealth was lognormal
(Gibrat) or Pareto (Zipf) doesn’t carry much operational significance.
Next, we define the corresponding stable α̃:

α 1α<2 + 2 1α≥2 if X is in P
(
Definition 1 (the κ metric). Let X1 , . . . , Xn be i.i.d. random α̃ , (5)
variables with finite mean, that is E(X) < +∞. Let Sn = 2 otherwise.
X1 + X2 + . . . + Xn be a partial sum. Let M(n) = E(|Sn − Further discussions of the class S are as follows.
E(Sn )|) be the expected mean absolute deviation from the
mean for n summands. Define the "rate" of convergence for
A. Equivalence for stable distributions
n additional summands starting with n0 :
For all n0 and n ≥ 1 in the Stable S class with α̃ ≥ 1:
(   2−κ1 )
κ(n0 ,n) = 2 − α̃,
M(n) n n0 ,n
κn0 ,n = min κn0 ,n : = , n0 = 1, 2, ... ,
M(n0 ) n0 simply from the property that
1
n > n0 ≥ 1, hence M(n) = n α M(1) (6)
log(n) − log(n0 ) This, simply shows that κn0 ,n = 0 for the Gaussian.
κ(n0 , n) = 2 −   . (1)
M(n)
log M(n The problem of the preasymptotics for n summands reduces
0)
to:
Further, for the baseline values n = n0 + 1, we use the • What is the property of the distribution for n0 = 1 (or
shorthand κn0 . starting from a standard, off-the shelf distribution)?
• What is the property of the distribution for n0 summands?
We can also decompose κ(n0 , n) in term of "local" interme-
• How does κn → 2 − α̃ and at what rate?
diate ones similar to "local" interest rates, under the constraint.
log(n) − log(n0 ) B. Practical significance for sample sufficiency
κ(n0 , n) = 2 − Pn log(i+1)−log(i) . (2)
i=0 2−κ(i,i+1)
Confidence intervals: As a simple heuristic, the higher
Use of Mean Deviation: Note that we use for measure of κ, the more disproportionally insufficient the confidence
dispersion around the mean the mean absolute deviation, to interval. Any value of κ above .15 effectively indicates
stay in norm L1 in the absence of finite variance –actually, a high degree of unreliability of the "normal approxima-
even in the presence of finite variance, under power law tion". One can immediately doubt the results of numerous
regimes, distributions deliver an unstable and uninformative research papers in fat-tailed domains.
second moment. Mean deviation proves far more robust there.
(Mean absolute deviation can be shown to be more "efficient" Computations of the sort done Table II for instance allows
except in the narrow case of kurtosis equals 3 (the Gaussian), us to compare various distributions under various parametri-
see a longer discussion in [4]; for other advantages, see [5].) azation. (comparing various Pareto distributions to symmetric
FAT TAILS STATISTICAL PROJECT 4

TABLE I
K APPA FOR 2 SUMMANDS , κ1 .

Distribution κ1

2 log(2) !
Student T (α) 2−
22−α Γ α− 1 ( )
2 log 2 +log(π)
2
Γ α ( )
2

log(2)
Exponential/Gamma 2− 2 log(2)−1
≈ .21

log(2)
Pareto (α) 2− 2
! !a
R α−1  
log (α−1)2−α αα−1 2
−2α (y+2)−2α−1 2 −y B 1 (−α,1−α)−B y+1 (−α,1−α) dy
0 α−1
y+2 y+2

log(2)
Normal (µ, σ) with 2−  √ q
ap
 q ap q
ap
r   √  r    
2 +σ 2 +p −2 +σ 2 +p +σ 2 − 2a 1 +2 +4σ 2 + a+σ 2 + 2a 1 +2 +4σ 2
switching variance log
 p−1 p−1 p−1
√ p−1 p−1 
σ 2 a w.p pb .
q ap 
p a+σ 2 −(p−1) +σ 2
p−1

log(2)
Lognormal (µ, σ) ≈2−  r  
2
   .
log 1 eσ +1
2
erf
 
2  √ 


 2 2 

log
   


 erf σ

2 2


 

Rz Rz 2
a B (., .)
. is the incomplete Beta function: Bz (a, b) = ta−1 (1 − t)b−1 dt; erf(.) is the error function erf(z) = √2 e−t dt.
0 π 0

b See comments and derivations in the appendix for switching both variance and mean as it can produce negative values for kappa.

TABLE II TABLE III


M AIN RESULTS C OMPARING PARETO TO S TUDENT T (S AME TAIL EXPONENT α)

α Pareto Pareto Pareto Student Student Student


Distribution κn κ1 κ1,30 κ1,100 κ1 κ1,30 κ1,100
1.25 0.829 0.787 0.771 0.792 0.765 0.756
Exponential/Gamma Explicit 1.5 0.724 0.65 0.631 0.647 0.609 0.587
1.75 0.65 0.556 0.53 0.543 0.483 0.451
Lognormal (µ, σ) No explicit κn but explicit lower 2. 0.594 0.484 0.449 0.465 0.387 0.352
and higher bounds (low or high σ 2.25 0.551 0.431 0.388 0.406 0.316 0.282
or n). Approximated with Pearson 2.5 0.517 0.386 0.341 0.359 0.256 0.227
IV for σ in between. 2.75 0.488 0.356 0.307 0.321 0.224 0.189
3. 0.465 0.3246 0.281 0.29 0.191 0.159
Pareto (α) (Constant) Explicit for κ2 (lower bound for all 3.25 0.445 0.305 0.258 0.265 0.167 0.138
α). 3.5 0.428 0.284 0.235 0.243 0.149 0.121
3.75 0.413 0.263 0.222 0.225 0.13 0.10
Student T(α) (slowly Explicit for κ1 , α = 3. 4. 0.4 0.2532 0.211 0.209 0.126 0.093
varying function)

Student T and, of course the Gaussian which has a flat kappa


of 0)
As we mentioned in the introduction, required sample
size for statistical inference is driven by n, the number of Let Xg,1 , Xg,2 , . . . , Xg,ng be a sequence of Gaussian vari-
summands. Yet the law of large numbers is often invoked in ables with mean µ and scale σ. Let Xν,1 , Xnu,2 , . . . , Xnu,nν
erroneous conditions; we need a rigorous sample size metric. be a sequence of some other variables q scaled to be of the
Many papers, when discussing financial matters, say [6]
same M(1), namely Mν (1) = Mg (1) = π2 σ. We would be
use finite variance as a binary classification for fat tailedness:
looking for values of nν corresponding to a given ng .
power laws with a tail exponent greater than 2 are therefore
classified as part of the "Gaussian basin", hence allowing κn is indicative of both the rate of convergence under
the use of variance and other such metrics for financial the law of large number, and for κn → 0, for rate of
applications. A much more natural boundary is finiteness of convergence of summands to the Gaussian under the
expectation for financial applications [7]. Our metric can thus central limit, as illustrated in Figure 2.
be useful as follows:
FAT TAILS STATISTICAL PROJECT 5

( n !
ν
X Xν,i − mp B. Rate of convergence of a student T distribution to the
nmin = inf nν : E

nν Gaussian Basin


i=1
ng ! )
X X − m
g,i g
We show in the appendix –thanks to the explicit derivation
≤E , nν > 0

ng of κ for the sum of students with α = 3, the "cubic" commonly
i=1 noticed in finance –that the rate of convergence of κ to 0
which can be computed using κn = 0 for the Gaussian and under summation is slow. The semi-closed form for the density
backing our from κn for the target distribution with the simple of an n-summed cubic Student allows to complement the
approximation: result in Bouchaud and Potters [9] (see also [10], which is
1
as follows. Their approach is to separate the "Gaussian zone"
−κ −κ1
1,ng −1 1 −1 where the density is approximated by that of a Gaussian, and
nν = ng ≈ ng , ng > 1 (7)
a "power law zone" in the tails which retains the original
The approximation is owed to the slowness of convergence. So distribution with power law decline. The "crossover" between
for example, a Student T with 3 degrees of freedom (α = 3) the
p two moves right and left of the center at a rate of
requires 120 observations to get the same drop in variance n log(n) standard deviations) which is excruciatingly slow.
from averaging (hence confidence level) as the Gaussian with Indeed, one can note that more summands fall at the center
30, that is 4 times as much. The one-tailed Pareto with the of the distribution, and fewer outside of it, hence the speed of
same tail exponent α = 3 requires 543 observations to match convergence according to the central limit theorem will differ
a Gaussian sample of 30, 4.5 times more than the Student, according to whether the density concerns the center or the
which shows 1) finiteness of variance is not an indication of fat tails.
tailedness (in our statistical sense), 2) neither are tail exponents Further investigations would concern the convergence of the
good indicators 3) how the symmetric Student and the Pareto Pareto to a Levy-Stable, which so far we only got numerically.
distribution are not equivalent because of the "bell-shapedness"
of the Student (from the slowly moving function) that dampens
variations in the center of the distribution. C. The lognormal is neither thin nor fat tailed
We can also elicit quite counterintuitive results. From Eq. 7, Naively, as we can see in Figure II, at low values of the
the "Pareto 80/20" in the popular mind, which maps to a tail parameter σ, the lognormal behaves like a Gaussian, and, at
exponent around α ≈ 1.14, requires > 109 more observations high σ, it appears to have the behavior of a Cauchy of sorts
than the Gaussian. (a one-tailed Cauchy, rather a stable distribution with α = 1,
β = 1), as κ gets closer and closer to 1. This gives us an idea
IV. T ECHNICAL C ONSEQUENCES about some aspects of the debates as to whether some variable
is Pareto or lognormally distributed, such as, say, the debates
A. Some oddities with asymmetric distributions
about wealth [11], [12], [13]. Indeed, such debates can be
The stable distribution, when skewed, has the same κ index irrelevant to the real world. As P. Cirillo [14] observed, many
as a symmetric one (in other words, κ is invariant to the β cases of Paretianity are effectively lognormal situations with
parameter in Eq. 4, which conserves under summation). But high variance; the practical statistical consequences, however,
a one-tailed simple Pareto distribution is fatter tailed (for our are smaller than imagined.
purpose here) than an equivalent symmetric one.
This is relevant because the stable is never really observed
in practice and used as some limiting mathematical object, D. Can kappa be negative?
while the Pareto is more commonly seen. The point is not Just as kurtosis for a mixed Gaussian (i.e., with stochastic
well grasped in the literature. Consider the following use of the mean, rather than stochastic volatility) can dip below 3 (or
substitution of a stable for a Pareto. In Uchaikin and Zolotarev become "negative" when one uses the convention of mea-
[8]: suring Kurtosis as excess over the Gaussian by adding 3 to
Mandelbrot called attention to the fact that the use the measure), the kappa metric can become negative when
of the extremal stable distributions (corresponding kurtosis is "negative". These situations require bimodality (i.e.,
to β = 1) to describe empirical principles was a switching process between means under fixed variance, with
preferable to the use of the Zipf-Pareto distributions modes far apart in terms of standard deviation). They do not
for a number of reasons. It can be seen from appear to occur with unimodal distributions.
many publications, both theoretical and applied, that Details and derivations are presented in the appendix.
Mandelbrot’s ideas receive more and more wide
recognition of experts. In this way, the hope arises
to confirm empirically established principles in the V. C ONCLUSION AND C ONSEQUENCES
framework of mathematical models and, at the same To summarize, while the limit theorems (the law of large
time, to clear up the mechanism of the formation of numbers and the central limit) are concerned with the behavior
these principles. as n → +∞, we are interested in finite and exact n both small
These are not the same animals, even for large number of and large (and its statistical and risk implications).
summands. We may draw a few operational consequences:
FAT TAILS STATISTICAL PROJECT 6

Variability weighted portfolio outperforms the SP500 across a broad range


0.6
range of metrics. But even the latter two papers didn’t conceive
Markowitz
of the full effect and properties of fat tails, which we can see
0.5
here with some precision. Fig. V shows the effect for securities
Established
securities
compared to Markowitz.
0.4
This false bias is one in many examples of policy makers
Speculative
securities
"nudging" people into the wrong rationality [17] and driving
0.3
them to increase their portfolio risk many folds.
A few more comments on financial portfolio risks. The
0.2
SP500 has a κ of around .2, but one needs to take into account
that it is itself a basket of n = 500 securities, albeit unweighted
0.1
and consisting of correlated members, overweighing stable
stocks. Single stocks have kappas between .3 and .7, meaning
n
0 200 400 600 800 1000 a policy of "overdiversification" is a must.
Likewise the metric gives us some guidance in the treatment
Fig. 4. In short, why the 1/n heuristic works in portfolio theory (and similar of data for forecasting, by establishing sample sufficiency, to
decision problems): it takes many, many more securities to get the same risk
reduction as via portfolio allocation according to Markowitz. We assume to state such matters as how many years of data do we need
simplify that the securities are independent, which they are not, something before stating whether climate conditions "have changed", see
that compounds the effect. [24].

A. Portfolio pseudo-stabilization B. Other aspects of statistical inference


Our method can also naturally and immediately apply to So far we considered only univariate distributions. For
portfolio construction and the effect of diversification since higher dimensions, a potential area of investigation is an
adding a security to a portfolio has the same "stabilizing" equivalent approach to the multivariate distribution of fat
effect as adding an additional observation for the purpose of tailed variables, the sampling of which is not captured by
statistical significance. "How much data do you need?" trans- the Marchenko-Pastur (or Wishhart) distributions. As in our
lates into "How many securities do you need?". Clearly, the situation, adding variables doesn’t easily remove noise from
Markowicz allocation method in modern finance[15] (which random matrices.
seems to not be used by Markowitz himself for his own
portfolio [16]) applies only for κ near 0; people use convex C. Final comment
heuristics, otherwise they will underestimate tail risks and As we said earlier, "statistics is never standard"; however
"blow up" the way the famed portfolio-theory oriented hedge there are heuristics methods to figure out where and by how
fund Long Term Management did in 1998 [17] [18].) much we depart from the standard.
We mentioned earlier that a Pareto distribution close to the
"80/20" requires up to 109 more observations than a Gaussian;
consider that the risk of a portfolio under such a distribution
would be underestimated by at least 8 orders of magnitudes if
one uses modern portfolio criteria. Following such a reasoning,
one simply needs broader portfolios.
It has also been noted that there is practically no financial
security that is not fatter tailed than the Gaussian, from the
simple criterion of kurtosis [19], meaning Markowitz portfolio
allocation is never the best solution. It happens that agents
wisely apply a noisy approximation to the n1 heuristic which
has been classified as one of those biases by behavioral
scientists but has in fact been debunked as false (a false bias
is one in which, while the observed phenomenon is there, it
does not constitute a "bias" in the bad sense of the word;
rather it is the researcher who is mistaken owing to using
the wrong tools instead of the decision-maker). This tendency
to "overdiversify" has been deemed a departure from optimal
investment behavior by Benartzi and Thaler [20], explained
in [21] "when faced with n options, divide assets evenly
across the options. We have dubbed this heuristic the "1/n
rule."" However, broadening one’s diversification is effectively
as least as optimal as standard allocation(see critique by
Windcliff and Boyle [22] and [23]). In short, an equally
FAT TAILS STATISTICAL PROJECT 7

A PPENDIX
A. Cubic Student T (Gaussian Basin)
The Student T with 3 degrees of freedom is of special interest in the literature owing to its prevalence in finance [6]. It
is often mistakenly approximated to be Gaussian owing to the finiteness of its variance. Asymptotically, we end up with a
Gaussian, but this doesn’t tell us anything about the rate of convergence. Mandelbrot and Taleb [25] remarks that the cubic
acts more like a powerlaw in the distribution of the extremes, which we will elaborate here thanks to an explicit PDF for the
sum.
Let X be a random variable distributed with density p(x):

6 3
p(x) = 2 , x ∈ (−∞, ∞) (8)
π (x2 + 3)
Proposition 1. Let Y be a sum of X1 , . . . , Xn , n identical
n copies of X. LetoM(n) be the mean absolute deviation from the
M(n) 1
mean for n summands. The "rate" of convergence κ1,n = κ : M(1) = n 2−κ is:

log(n)
κ1,n = 2 − (9)
log (en n−n Γ(n + 1, n) − 1)
R∞
where Γ(., .) is the incomplete gamma function Γ(a, z) = z dtta−1 e−t .
Since the mean deviation M(n):
( √
2 3
π for n = 1
M(n) = 2√ 3 n −n
(10)
π (e n Γ(n + 1, n) − 1) for n > 1
The derivations are as follows. For the pdf and the MAD we followed different routes.
We have the characteristic function for n summands:
√ √
ϕ(ω) = (1 + 3|ω|)n e−n 3 |ω|
The pdf of Y is given by: Z ∞ √ √
1
p(y) = (1 + 3 ω)n e−n 3ω
cos(ωy) dω
π 0
After arduous integration we get the result in 10. Further, since the following result does not appear to be found in the literature,
we have a side useful result: the PDF of Y can be written as
iy
 2iy    
n− √ √
e 3 e 3 E−n n + √iy3 + E−n n − √iy3
p(y) = √ (11)
2 3π
R ∞ t(−z)
where E(.) (.) is the exponential integral En z = 1 e tn dt.
Note the following identities (from the updating of Abramowitz and Stegun) [26]
n
(n − 1)! X nm
n−n−1 Γ(n + 1, n) = E−n (n) = e−n
nn m=0 m!
As to the asymptotics, we have the following result (proposed by Michail Loulakis): Reexpressing Eq. 10:
√ n−1
2 3n! X nm
M(n) =
πnn m=0 m!
Further,
n−1
nm
 
X 1 1
e−n = +O √
m=0
m! 2 n
(From the behavior of the sum of Poisson variables as they converge to a Gaussian by the central limit theorem:
Pn−1 m
e−n m=0 nm! = P(Xn < n) where Xn is a Poisson random variable with parameter n. Since the sum of n independent
Poisson random variables with parameter
√ 1 is Poisson with parameter n, the Central Limit Theorem says the probability
distribution of Zn = (Xn − n)/ n approaches a standard normal distribution. Thus P(Xn < n) = P(Zn < 0) → 1/2 as
2
n nn−1 n
n → ∞.2 For another approach, see [27] for proof that 1 + 1! + n2! + · · · + (n−1)! ∼ e2 .)

Using the property that lim n!exp(n)

nn n
= 2π, we get the following exact asymptotics:
n→∞

2 Robert Israel on Math Stack Exchange


FAT TAILS STATISTICAL PROJECT 8

π2
lim log(n)κ1,n =
n→∞ 4
1
thus κ goes to 0 (i.e, the average becomes Gaussian) at speed log(n) , which is excruciatingly slow. In other words, even with
106 summands, the behavior cannot be summarized as that of a Gaussian, an intuition often expressed by B. Mandelbrot [25].

B. Lognormal Sums
From the behavior of its cumulants for n summands, we can observe that a sum behaves likes a Gaussian when σ is low,
and as a lognormal when σ is high –and in both cases we know explicitly κn .
The lognormal (parametrized with µ and σ) doesn’t have an explicit characteristic function. P But we can get cumulants Ki
of all orders i by recursion and for our case of summed identical copies of r.v. Xi , Kin = Ki ( n Xi ) = nKi (X1 ).
Cumulants:
σ2
K1n = neµ+ 2  2 2
K2n = n eσ − 1 e2µ+σ
 2 2  2  3σ 2
K3n = n eσ − 1 eσ + 2 e3µ+ 2
K4n = . . .
√ 2  σ2  1 (2µ+σ2 )−µ− σ2 2
 2
 2
 
eσ −1 e +2 e 2 2 e2σ eσ eσ +2 +3 −6
Which allow us to compute: Skewness = √
n
and Kurtosis = 3 + n
We can immediately prove from the cumulants/moments that:
lim κ1,n = 0, lim κ1,n = 0
n→+∞ σ→0

and our bound on κ becomes explicit:


Let κ∗1,n be the situation under which the sums of lognormal conserve the lognormal density, with the same first two
moments. We have
0 ≤ κ∗1,n ≤ 1,

log(n)
κ∗1,n = 2 −  v
u 2
! 
u n+eσ −1
 tlog n 
 nerf
 √
2 2


  
 
log    

 erf σ

2 2


 

1) Heuristic attempt: Among other heuristic approaches, we can see in two steps how 1) under high values of σ, κ1,n → κ∗1,n ,
σ→∞
since the law of large numbers slows down, and 2) κ∗1,n → 1.
2) Loulakis’ Proof: Proving the upper bound, that for high variance κ1,n approaches 1Rhas been shown formallyR my Michail
∞ ∞
Loulakis3 which we summarize as follows. We start with the identify E (|X − m|) = 2 m (xR− m)f (x)dx = 2 m F̄X (t)dt,

where f (.) isthe density,
 m is the mean, and F̄X (.) is the survival function. Further, M(n) = 2 nm F̄ (x)dx. Assume µ = 21 σ 2 ,
2 ∞
or X = exp σZ − σ2 where Z is a standard normal variate. Let Sn be the sum X1 +. . .+Xn ; we get M(n) = 2 n P(Sn >
R
2
t)dt. Using the property of subexponentiality ([28]), P(Sn > t) ≥ P(max0<i≤n (Xi ) > t) ≥ nP(X1 > t) − n2 P (X1 > t) .

σ→∞
Now P (X1 > t) → 1 and the second term to 0 (using Hölder’s inequality).
Skipping steps, we get lim inf M(n) M(n)
M(1) ≥ n, while at the same time we need to satisfy the bound M(1) ≤ n. So for σ → ∞
σ→∞
σ→∞
, M(n)
M(1) = n, hence κ1,n → 1.
3) Pearson Family approach for computation: For computational purposes, for the σ parameter not too large (below ≈ .3,
we can use the Pearson family for computational convenience –although the lognormal does not belong to the Pearson class
(the normal does, but we are close enough for computation). Intuitively, at low sigma, the first four moments can be sufficient
because of the absence of large deviations; not at higher sigma for which conserving the lognormal would be the right method.
The use of Pearson class is practiced in some fields such as information/communication theory, where there is a rich literature:
for summation of lognormal variates see Nie and Chen, [29], and for Pearson IV, [30], [31].
The Pearson family is defined for an appropriately scaled density f satisfying the following differential equation.
(a0 + a1 x)
f 0 (x) = − f (x) (12)
b0 + b1 x + b2 x2
3 Review of this paper; Loulakis proposed a formal proof in place of the heuristic derivation.
FAT TAILS STATISTICAL PROJECT 9

We note that our parametrization of a0 , b2 , etc. determine the distribution within the Pearson class –which appears to be
the Pearson IV. Finally we get an expression of mean deviation as a function of n, σ, and µ.
Let m be the mean. Diaconis et al [32] from an old trick by R∞De Moivre, Suzuki [33] show that we can get explicit mean
absolute deviation. Using, again, the identity E(|X − m|) = 2 m (x − m)f (x)dx and integrating by parts,

2 b0 + b1 m + b2 m 2
E(|X − m|) = f (m) (13)
a1 − 2b2
1 −a0
We use cumulants of the n-summed lognormal to match the parameters. Setting a1 = 1, and m = b1−2b 2
, we get
σ2 
 2 2 2 2 2 2

 eµ+ 2 −12n2 +(3−10n)e4σ +6(n−1)eσ +12(n−1)e2σ −(8n+1)e3σ +3e5σ +e6σ +12
a =

0

2(6(n−1)+e2σ2 (eσ2 (5eσ2 +4)−3))



2σ 2
 2
σ2
  
σ
e e −1 2e +3


 b2 = 2 6(n−1)+e2σ2 eσ2 5eσ2 +4 −3


( ( ( ) ))
2 
µ+ σ
 2
σ2
  2 2 2
 2     
σ σ σ
 e −1 e 2 e e e −4n+eσ eσ +4 +7 −6n+6 +6(n−1) +12(n−1)



 b1 = 2(6(n−1)+e2σ2 (eσ2 (5eσ2 +4)−3))

n eσ −1 e ( ) eσ −2(n−1)eσ2 −3n+e3σ2 +3+6(n−1)

  2  2 µ+σ2  2 


 b0 = −

2(6(n−1)+e2σ2 (eσ2 (5eσ2 +4)−3))

4) Polynomial expansions: Other methods, such as Gram-Charlier expansions, such as Schleher [34], Beaulieu,[35], proved
less helpful to obtain κn . At high values of σ, the approximations become unstable as we include higher order Lhermite
polynomials. See review in Dufresne [36] and [37].

C. Exponential
The exponential is the "entry level" fat tails, just at the border.
f (x) = λe−λx , x ≥ 0.
Ry
By convolution the sum Z = X1 , X2 , . . . Xn we get, by recursion, since f (y) = 0
f (x)f (y − x) dx = λ2 ye−λy :
λn z n−1 e−λz
fn (z) = (14)
(n − 1)!
which is the gamma distribution; we get the mean deviation for n summands:
2e−n nn
M(n) = , (15)
λΓ(n)
hence:
log(n)
κ1,n = 2 − (16)
n log(n) − n − log(Γ(n)) + 1
We can see the asymptotic behavior is equally slow (similar to the student) although the exponential distribution is sitting
at the cusp of subexponentiality:
lim log(n)κ1,n = 4 − 2 log(2π)
n→∞
FAT TAILS STATISTICAL PROJECT 10

D. Negative kappa
1
Consider the simple case of a Gaussian with switching means and variance: with probability 2, X ∼ N (µ1 , σ1 ) and with
probability 21 , X ∼ N (µ2 , σ2 ). The kurtosis will be
 2 
2 (µ1 − µ2 ) 4 − 6 σ12 − σ22
Kurtosis = 3 −  2 (17)
2
(µ1 − µ2 ) + 2 (σ12 + σ22 )

As we see the kurtosis is a function of d = µ1 − µ2 . For situations where σ1 = σ2 , µ1 6= µ2 , the kurtosis will be below
that of the regular Gaussian, and our measure will naturally be negative. In fact for the kurtosis to remain above 3,

4
p
|d|≤ 6 max(σ1 , σ2 )2 − min(σ1 , σ2 )2 ,
the stochasticity of the mean offsets the stochasticity of volatility.
These situations with thinner tails than the Gaussian are encountered with bimodal situations where µ1 and µ2 are separated;
the effect becomes acute when they are separated by several standard deviations. Let d= µ1 − µ2 and σ = σ1 = σ2 (to achieve
minimum kurtosis),
log(4)
κ1 = d2 √ d2
! +2 (18)

πde 4σ2 erf( 2σ
d
)+2 σ2 e 4σ2 +2σ
log(π) − 2 log d2   √ d2
de 4σ2 erf 2√d2σ +2 π 2
σe 8σ2

which we see is negative for wide values of µ1 − µ2 .


FAT TAILS STATISTICAL PROJECT 11

R EFERENCES [32] P. Diaconis and S. Zabell, “Closed form summation for classical
distributions: variations on a theme of de moivre,” Statistical Science,
[1] S. Kotz and N. Johnson, Encyclopedia of Statistical Sciences. Wiley, pp. 284–302, 1991.
2004. [33] G. Suzuki, “A consistent estimator for the mean deviation of the pearson
[2] A. Tversky and D. Kahneman, “Belief in the law of small numbers.” type distribution,” Annals of the Institute of Statistical Mathematics,
Psychological bulletin, vol. 76, no. 2, p. 105, 1971. vol. 17, no. 1, pp. 271–285, 1965.
[3] M. Falk, J. Hüsler, and R.-D. Reiss, Laws of small numbers: extremes [34] D. Schleher, “Generalized gram-charlier series with application to the
and rare events. Springer Science & Business Media, 2010. sum of log-normal variates (corresp.),” IEEE Transactions on Informa-
[4] N. N. Taleb, Technical Incerto Vol 1: The Statistical Consequences of tion Theory, vol. 23, no. 2, pp. 275–280, 1977.
Fat Tails, Papers and Commentaries. Monograph, 2018. [35] N. C. Beaulieu, A. A. Abu-Dayya, and P. J. McLane, “Estimating
[5] T. Pham-Gia and T. Hung, “The mean and median absolute deviations,” the distribution of a sum of independent lognormal random variables,”
Mathematical and Computer Modelling, vol. 34, no. 7-8, pp. 921–936, Communications, IEEE Transactions on, vol. 43, no. 12, p. 2869, 1995.
2001. [36] D. Dufresne, “Sums of lognormals,” in Proceedings of the 43rd actuarial
[6] X. Gabaix, “Power laws in economics and finance,” National Bureau of research conference. University of Regina, 2008.
Economic Research, Tech. Rep., 2008. [37] D. Dufresne et al., “The log-normal approximation in financial and other
[7] N. N. Taleb, “Finiteness of variance is irrelevant in the practice of computations,” Advances in Applied Probability, vol. 36, no. 3, pp. 747–
quantitative finance,” Complexity, vol. 14, no. 3, pp. 66–76, 2009. 773, 2004.
[8] V. V. Uchaikin and V. M. Zolotarev, Chance and stability: stable
distributions and their applications. Walter de Gruyter, 1999.
[9] J.-P. Bouchaud and M. Potters, Theory of financial risk and derivative
pricing: from statistical physics to risk management. Cambridge
University Press, 2003.
[10] D. Sornette, Critical phenomena in natural sciences: chaos, fractals,
selforganization, and disorder: concepts and tools. Springer, 2004.
[11] B. Mandelbrot, “The pareto-levy law and the distribution of income,”
International Economic Review, vol. 1, no. 2, pp. 79–106, 1960.
[12] C. Dagum, “Inequality measures between income distributions with
applications,” Econometrica, vol. 48, no. 7, pp. 1791–1803, 1980.
[13] ——, Income distribution models. Wiley Online Library, 1983.
[14] P. Cirillo, “Are your data really pareto distributed?” Physica A: Statis-
tical Mechanics and its Applications, vol. 392, no. 23, pp. 5947–5962,
2013.
[15] H. Markowitz, “Portfolio selection*,” The journal of finance, vol. 7,
no. 1, pp. 77–91, 1952.
[16] H. Neth and G. Gigerenzer, “Heuristics: Tools for an uncertain world,”
Emerging trends in the social and behavioral sciences: An Interdisci-
plinary, Searchable, and Linkable Resource, 2015.
[17] N. N. Taleb, Skin in the Game: Hidden Asymmetries in Daily Life.
Penguin (London) and Random House (N.Y.), 2018.
[18] E. O. Thorp, “Optimal gambling systems for favorable games,” Revue
de l’Institut International de Statistique, pp. 273–293, 1969.
[19] N. N. Taleb, “Errors, robustness, and the fourth quadrant,” International
Journal of Forecasting, vol. 25, no. 4, pp. 744–759, 2009.
[20] S. Benartzi and R. H. Thaler, “Naive diversification strategies in defined
contribution saving plans,” American economic review, vol. 91, no. 1,
pp. 79–98, 2001.
[21] S. Benartzi and R. Thaler, “Heuristics and biases in retirement savings
behavior,” Journal of Economic perspectives, vol. 21, no. 3, pp. 81–104,
2007.
[22] H. Windcliff and P. P. Boyle, “The 1/n pension investment puzzle,” North
American Actuarial Journal, vol. 8, no. 3, pp. 32–45, 2004.
[23] V. DeMiguel, L. Garlappi, and R. Uppal, “Optimal versus naive diver-
sification: How inefficient is the 1/n portfolio strategy?” The review of
Financial studies, vol. 22, no. 5, pp. 1915–1953, 2007.
[24] S. Makridakis and N. Taleb, “Decision making and planning under low
levels of predictability,” 2009.
[25] B. B. Mandelbrot and N. N. Taleb, “Random jump, not random walk,”
2010.
[26] “NIST Digital Library of Mathematical Functions,” http://dlmf.nist.gov/,
Release 1.0.19 of 2018-06-22, f. W. J. Olver, A. B. Olde Daalhuis,
D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller
and B. V. Saunders, eds. [Online]. Available: http://dlmf.nist.gov/
[27] D. J. Newman, A problem seminar. Springer Science & Business Media,
2012.
[28] E. Pitman, “Subexponential distribution functions,” J. Austral. Math.
Soc. Ser. A, vol. 29, no. 3, pp. 337–347, 1980.
[29] H. Nie and S. Chen, “Lognormal sum approximation with type iv
pearson distribution,” IEEE Communications Letters, vol. 11, no. 10,
2007.
[30] S. Chen, H. Nie, and B. Ayers-Glassey, “Lognormal sum approximation
with a variant of type iv pearson distribution,” IEEE Communications
Letters, vol. 12, no. 9, 2008.
[31] M. Di Renzo, F. Graziosi, and F. Santucci, “Further results on the
approximation of log-normal power sum via pearson type iv distribution:
a general formula for log-moments computation,” IEEE Transactions on
Communications, vol. 57, no. 4, 2009.

You might also like