0% found this document useful (0 votes)
797 views11 pages

Hierarchical Clustering-Based Asset Allocation: Homas Affinot

Uploaded by

somrup
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
797 views11 pages

Hierarchical Clustering-Based Asset Allocation: Homas Affinot

Uploaded by

somrup
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Hierarchical Clustering-Based

Asset Allocation
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

Thomas Raffinot

N
T homas R affinot obel Pr ize winner Harr y offset the benefits of diversification (López
is the Head of Quantitative Markowitz described diversifi­ de Prado [2016b]).
Macro Strategy at Silex-IP
cation, with its ability to enhance Exploring a new way of capital allo-
in Paris, France.
traffinot@gmail.com portfolio returns while reducing cation, López de Prado [2016a] introduces
risk, as the “only free lunch” in investing a portfolio diversification technique called
(Markowitz [1952]). Yet, diversifying a port- hierarchical risk parity (HRP). One of the
folio in real life is easier said than done. main advantages of HRP is in computing
Investors are aware of the benefits of a portfolio on an ill-degenerated or even a
diversification but form portfolios without singular covariance matrix. Lau et al. [2017]
giving proper consideration to the corre- apply HRP to different cross-asset universes
lations (Goetzmann and Kumar [2008]). consisting of many tradable risk premia
Moreover, modern and complex port- indexes and conf irm that HRP delivers
folio optimization methods are optimal in superior risk-adjusted returns. Alipour et al.
sample but often provide rather poor out-of- [2016] propose a quantum-inspired version
sample forecast performance. For instance, of HRP, which outperforms HRP and thus
DeMiguel et al. [2009] demonstrate that other conventional methods.
the equal-weighted allocation, which gives The starting point of HRP is that a cor-
the same importance to each asset, beats the relation matrix is too complex to be prop-
entire set of commonly used portfolio opti- erly analyzed and understood. If you have
mization techniques. In fact, optimized port- 1
N assets of interest, there are N ( N − 1)
folios depend on expected returns and risks, 2
but even small estimation errors can result in pairwise correlations among them and that
large deviations from optimal allocations in number grows quickly. For example, there
an optimizer’s result (Michaud [1989]). are as many as 4,950 correlation coefficients
To overcome this issue, academics between stocks of the FTSE 100 and 124,750
and practitioners have developed risk-based between stocks of the S&P 500. More impor-
portfolio optimization techniques (min- tantly, correlation matrices lack the notion
imum variance, equal-risk contribution, risk of hierarchy. Actually, Nobel Prize laureate
budgeting, etc.) that do not rely on return Herbert Simon has argued that complex sys-
forecasts (Roncalli [2013]). However, these tems can be arranged in a natural hierarchy
still require the inversion of a positive- comprising nested substructures (Simon
definite covariance matrix, which leads to [1962]). But, a correlation matrix makes no
errors of such magnitude that they entirely differentiation between assets. Yet, some

Multi-A sset Special Issue 2018 The Journal of Portfolio M anagement    89


assets seem closer substitutes of one another, while are robust, truly diversified, and achieve statistically
others seem complementary to one another. This lack better risk-adjusted performances than commonly
of hierarchical structure allows weights to vary freely in used portfolio optimization techniques. Among clus-
unintended ways (López de Prado [2016a]). tering methods, there is no clear winner. DBHT-
To simplify the analysis of the relationships based portfolios produce slightly superior risk-adjusted
between this large group of relative prices, López de returns, but average-linkage-based portfolios are clearly
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

Prado [2016a] applies a correlation-network method more robust.


known as the “minimum spanning tree (MST).”1 Its
main principle is easy to understand: the heart of correla- HIERARCHICAL CLUSTERING
tion analysis is choosing which correlations really matter; AND ASSET ALLOCATION
in other words, choosing which links in the network are
important and removing the rest, keeping N − 1 links. Notion of Hierarchy
Graph theory is linked to unsupervised machine
learning. For instance, the MST is strictly related to a Nobel Prize winner Herbert Simon has argued that
hierarchical clustering algorithm, named the single linkage complex systems, such as financial markets, have a struc-
(Tumminello et al. [2010]). Hierarchical clustering refers ture and are usually organized in a hierarchical manner,
to the formation of a recursive clustering, suggested by with separate and separable substructures (Simon
the data, not defined a priori. The objective is to build [1962]). The hierarchical structure of interactions
a binary tree of the data that successively merges similar among elements strongly affects the dynamics of com-
groups of points. Hierarchical clustering is thus another plex systems. The need of a quantitative description of
way to filter correlations. hierarchies to model complex systems is thus straight-
Finally, there are many generalizations of the forward (Anderson [1972]).
MST. The planar maximally filtered graph (Tumminello López de Prado [2016a] points out that correla-
et al. [2005]) is a recent and prominent one. It is associ- tion matrices lack the notion of hierarchy, which allows
ated with a hierarchical clustering method, the directed weights to vary freely in unintended ways. He provides a
bubble hierarchical tree (DBHT) (Musmeci et al. [2015]). concrete example to highlight the interest of the notion
Building upon López de Prado [2016a] and Simon of hierarchy for asset allocation:
[1962], this article exploits the notion of hierarchy.
Different hierarchical clustering methods are presented Stocks could be grouped in terms of liquidity, size,
and tested, namely, simple linkage, complete linkage, industry, and region, where stocks within a given group
average linkage, Ward’s method, and DBHT. Once the compete for allocations. In deciding the allocation to a large
assets are hierarchically clustered, a simple and efficient publicly-traded U.S. financial stock like J.P. Morgan, we
capital allocation within and across clusters of assets at will consider adding or reducing the allocation to another
multiple hierarchical levels is computed. large publicly-traded U.S. bank like Goldman Sachs,
The out-of-sample performances of hierarchical rather than a small community bank in Switzerland, or
clustering-based portfolios and risk-based portfolios are a real estate holding in the Caribbean.
evaluated across three empirical datasets, which differ in —López de Prado [2016a]
terms of number of assets and composition of the uni-
verse (S&P sectors, multi-assets, and individual stocks). To sum up, a correlation matrix makes no dif-
To avoid data snooping, which occurs when a given set ferentiation between assets. Yet, some assets seem to be
of data is used more than once for purposes of inference closer substitutes of one another, while others seem to
or model selection, the comparison of profit measures be complementary to one another.
is assessed using the bootstrap-based model confidence
set procedure proposed by Hansen et al. [2011]. It pre- Hierarchical Clustering
vents strategies that perform by luck to be considered
as effective. The purpose of cluster analysis is to place enti-
The findings of this article can be summarized ties into groups, or clusters, suggested by the data, not
as follows: hierarchical clustering-based portfolios defined a priori, such that entities in a given cluster

90    Hierarchical Clustering-Based Asset A llocation Multi-A sset Special Issue 2018
tend to be similar to each other and entities in different • CL: the distance between two clusters is the
clusters tend to be dissimilar. maximum of the distance between any two points
Hierarchical clustering refers to the formation of in the clusters. For clusters Ci, Cj,
a recursive clustering. The objective is to build a binary
tree of the data that successively merges similar groups dCi ,C j = max x ,y {D( x, y )|x ∈C i , y ∈C j } (3)

of points. The tree-based representation of the observa-
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

tions is called a dendrogram. Visualizing this tree provides This method tends to produce compact clusters of
a useful summary of the data. similar size but is quite sensitive to outliers.
Hierarchical clustering requires a suitable distance • AL: the distance between two clusters is the
measure. The following distance is used (Mantegna average of the distance between any two points in
[1999]): the clusters. For clusters Ci, Cj,

Di , j = 2(1 − ρi , j ) (1) dCi ,C j = meanx ,y {D( x, y )|x ∈C i , y ∈C j } (4)


where D i,j is the correlation-distance index between This is considered to be a fairly robust method.
the ith and jth asset, and ρi,j is the respective Pearson’s • WM (Ward [1963]): the distance between two
correlation coefficient. clusters is the increase of the squared error that
Four agglomerative clustering variants are tested results when two clusters are merged. For clusters
in this study—namely, single linkage (SL), complete Ci, Cj with sizes mi, mj, respectively,
linkage (CL), average linkage (AL), and Ward’s method
mi m j
(WM). dCi ,C j = ||c i − c j ||2 (5)
An agglomerative clustering starts with every mi + m j

observation representing a singleton cluster and then
combines the clusters sequentially, reducing the number where ci, cj are the centroids for the clusters.
of clusters at each step until only one cluster is left. At
each of the N − 1 steps, the closest two (least dissimilar) This method is biased toward globular clusters but
clusters are merged into a single cluster, producing one less susceptible to noise and outliers. It is one of the most
less cluster at the next higher level. Therefore, a measure popular methods.
of dissimilarity between two clusters must be defined, To determine the number of clusters, we employ
and different definitions of the distance between clus- the Gap index (Tibshirani et al. [2001]). It compares the
ters can produce radically different dendrograms. The logarithm of the empirical within-cluster dissimilarity
clustering variants are described below: and the corresponding one for uniformly distributed
data, which is a distribution with no obvious clustering.
• SL: the distance between two clusters is the min- The last approach differs completely from the
imum of the distance between any two points in agglomerative one: the idea of DBHT is to use the hier-
the clusters. For clusters Ci, Cj, archy hidden in the topology of a planar maximally
filtered graph (PMFG) (Tumminello et al. [2005]).
dCi ,C j = minx,y {D( x, y )|x ∈C i , y ∈C j } (2) The PMFG network keeps the hierarchical struc-

ture of the MST network but contains a greater amount of
This method is relatively simple and can handle information by connecting N nodes (assets) with 3(N - 2)
nonelliptical shapes. Nevertheless, it is sensitive edges. The basic elements of a PMFG are three-cliques
to outliers and can result in a problem called (subgraphs made of three nodes all reciprocally con-
chaining, whereby clusters end up being long and nected). For a detailed introduction of MST and PMFG,
straggly. The SL algorithm is strictly related to see Tumminello et al. [2005] and Aste et al. [2010].
the one that provides an MST. However the MST Musmeci et al. [2015] explains that the DBHT
retains some information that the SL dendrogram exploits this topological structure, and in particular, the
throws away. distinction between separating and nonseparating three-
cliques, to identify a clustering partition of all nodes

Multi-A sset Special Issue 2018 The Journal of Portfolio M anagement    91


in the PMFG. First of all, the clusters are identified by Exhibit 1
means of topological considerations on the planar graph, Asset Allocation Weights: A Small Example
then the hierarchy is constructed both between clusters
and within clusters. Therefore, the difference involves
both the kind of information exploited and the meth-
odological approach. Note that the “optimal” number
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

of clusters is determined during the process (see Song


et al. [2012] for more on this subject).

Asset Allocation Weights

Once the clusters have been determined, the capital


should be efficiently allocated both within and across
groups. Indeed, a compromise between diversification
across all investments and diversification across clusters of
investments at multiple hierarchical levels has to be found.
Because asset allocation within and across clusters
can be based on the same or different methodologies,
there are countless options.
The chosen weighting scheme attempts to stay very
simple and focuses not only on the clusterings but also
on the entire hierarchies associated with those cluster-
ings. The principle is to find a diversified weighting by
distributing capital equally to each cluster hierarchy, so The will to exploit the nested clusters or, in other
that many correlated assets receive the same total alloca- words, the notion of hierarchy explains why clustering
tion as a single uncorrelated one. Then, within a cluster, methods such as K-means or K-medoids have not been
an equal-weighted allocation is computed. tested. Indeed, these algorithms provide a single set of
For example, Exhibit 1 illustrates a small dendro- clusters with no particular organization or structure
gram with five assets and three clusters. The first cluster is within them.2
made up of Assets 1 and 2; Asset 5 constitutes the second
cluster, and the third cluster consists of Assets 3 and 4. RISK-BUDGETING APPROACH
Based on the hierarchical clustering weighting, weights
1 This section brief ly describes risk-budgeting port-
for cluster 1 is 0.5( = 0.5) and weights for clusters 2 and
2 folios. Refer to Roncalli [2013] for a detailed exposi-
0.5 tion of this approach. In a risk-budgeting approach, the
3 are 0.25( = 0.25). Because there are two assets in
2 investor only chooses the risk repartition between assets
0.5 of the portfolio, without any consideration of returns,
Cluster 1, final weights for Assets 1 and 2 are = 0.25.
2 thereby partially dealing with the issues of traditional
0.25 portfolio optimization methods.
Asset 5 would have a weight of = 0.25. Finally,
1
0.25 Notations and Definitions
Assets 3 and 4 would get a weight of = 0.125.
2
This weighting scheme should guarantee the Consider a portfolio invested in N assets with port-
diversification and the robustness of the portfolio. For folio weights vector w = (w 1, w 2, …, wN )′. Returns are
instance, because we consider at least two clusters, the assumed to be arithmetic: r t,i = (pt,i − pt−1,i )/pt−1,i = pt,i /
weights are constrained: ∀i: 0 ≤ wi ≤ 0.5. Moreover, if pt−1,i − 1. The portfolio return at time t is thus
clusters are lasting, the weights should be very stable.
rP ,t = ∑ i =1 w i rt ,i
N
Finally, neither expected returns nor risk measures are (6)
required, thereby making the method more robust.

92    Hierarchical Clustering-Based Asset A llocation Multi-A sset Special Issue 2018
Let σ i2 be the variance of asset i, σij be the covari- • The minimum-variance (MV) portfolio is a risk-
ance between assets i and j, and Σ be the covariance budgeting portfolio in which the risk budget is
matrix. equal to the weight of the asset:
The volatility is defined as the risk of the portfolio:
bi = w i (12)
R w = σ w = w ′Σw (7)
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

• The most diversified portfolio (MDP) (Choueifaty


et al. [2013]) is a risk-budgeting portfolio in which
and μ is the expected return: the risk budgets are linked to the product of the
weight of the asset and its volatility:
µ = E(rP ) = ∑ i =1 w i E(ri ) (8)
N

wi σi
bi = (13)

N
wi σi
Risk-Budgeting Portfolios i =1

In a risk-budgeting portfolio, the risk contribution • The equal-risk-contribution portfolio (ERC)


from each components is equal to the budget of risk (Maillard et al. [2010]) is a risk-budgeting portfolio
defined by the portfolio manager. in which the risk contribution from each asset is
Because the risk measure is coherent and convex, made equal:
the Euler decomposition is verified:
1
bi = (14)
N
∂R w N
Rw = ∑ wi (9)
i =1 ∂w i • The inverse-variance (IVRB) risk-budgeting port-
folio defines risk budgets as follows:
With the volatility as the risk measure, the risk
contribution of the ith asset becomes σ i−2
bi = (15)

N
σ −2
( Σw )i i =1 i
RC wi = w i (10)
w ′Σw We use the cyclical coordinate descent (CCD)
algorithm for solving high-dimensional risk parity
A long-only, full invested risk-budgeting portfolio problems (Griveau-Billion et al. [2013]) to estimate the
is defined as follows (Roncalli [2013]): risk-based models.

RC wi = bi RC w
 INVESTMENT STRATEGIES COMPARISON
bi > 0
 N Portfolios are updated on a daily basis via a 252-
∑ i =1 bi = 1 day rolling-window approach, with no forward-looking
w ≥ 0 biases. This approach differs from the traditional one,
 i in which portfolios are rebalanced on a more realistic
 N w =1
 ∑ i =1 i
(11) monthly basis.4 Nevertheless, the main objective of this
article is not to create a real investment strategy but
Once a set of risk budgets is defined, the weights of to compare asset allocation methods. The daily rebal-
the portfolio are computed so that the risk contributions ancing framework should help highlight the strengths
match the risk budgets. and weaknesses of the different approaches, especially
In this article, four risk-budgeting portfolios are the robustness.
considered:3

Multi-A sset Special Issue 2018 The Journal of Portfolio M anagement    93


Datasets Although more data history would have been desir-
able, the different periods cover a number of different
The out-of-sample performances of models are market regimes and shocks to the financial markets and
evaluated across three very disparate datasets. The three the world economy, including the “dot-com” bubble the
considered datasets differ in term of assets’ composition Great Recession, and the 1994 and 1998 bond market
and number of assets: 5 crashes considered in the multi-asset dataset.
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

• The S&P sectors dataset consists of daily returns on


Comparison Measures
10 value-weighted industry portfolios formed by
using the Global Industry Classification Standard Given the time series of daily out-of-sample
(GICS) developed by Standard & Poor’s. The 10 returns generated by each strategy in each dataset, sev-
industries considered are energy, material, indus- eral comparison criteria are computed:
trials, consumer-discretionary, consumer-staples,
healthcare, financials, information-technology, • The adjusted sharpe ratio (ASR) (Pezier and White
telecommunications, and utilities. The data span [2008]) 6 explicitly adjusts for skewness and kur-
from January 1995 to August 2016. tosis by incorporating a penalty factor for negative
• The multi-assets dataset consists of asset classes skewness and excess kurtosis:
exhibiting different risk–return characteristics
(in local currencies): S&P 500 (U.S. large cap),  µ (µ − 3) 2 
Russell 2000 (U.S. small-cap), Euro Stoxx 50 (EA ASR = SR 1 +  3  SR − 4 SR  (16)
  6  24 
large cap), Euro Stoxx Small Cap (EA small-cap),
FTSE 100 (U.K. large cap), FTSE Small Cap (U.K. where μ3 and μ4 are the skewness, and kurtosis of
small-cap), France 2-year bonds, France 5-year the returns distribution and SR denotes the tradi-
bonds, France 10-year bonds, France 30-year µ− r
tional Sharpe ratio (SR = σ f , where r f is the risk-
bonds, U.S. 2-year bonds, U.S. 5-year bonds, free rate7).
U.S. 10-year bonds, U.S. 30-year bonds, MSCI • The certainty-equivalent return (CEQ) is the risk-
Emerging Markets (dollars), and gold (dollars). free rate of return that the investor is willing to
• We chose France over Germany for data avail- accept instead of undertaking the risky portfolio
ability reasons. A difficult decision was made for strategy. DeMiguel et al. [2009] define the CEQ as
fixed-income indexes: coupons are not reinvested,
because rates are low and are expected to stay low γ 2
for a long time. This implies that performances in CEQ = (µ − r f ) − σ (17)
2
the future will not come from coupons. Because
our aim is to build portfolios that will perform and where γ is the risk aversion. Results are reported for
not ones that have performed, we prefer this solu- the case of γ  = 1, but other values of the coefficient
tion. As a consequence, no dividends are reinvested. of risk aversion are also considered as a robustness
The data span from February 1989 to August 2016. check. More precisely, the employed definition
• Individual stocks with a sufficiently long historical of CEQ captures the level of expected utility of
data from the current S&P 500 compose the last a mean–variance investor, which is approximately
dataset. That gives us 357 series to work with. equal to the certainty-equivalent return for an
The objective is to get “real” correlations between investor with quadratic utility (DeMiguel et al.
stocks. Obviously, this dataset does not incorporate [2009]). It is the most important number to consider
information on delistings. Because there is a strong for building profitable portfolios (Levy [2016]).
survivor bias, comparisons with the S&P 500 are • The max drawdown (MDD) is an indicator of per-
meaningless. Nevertheless, comparisons between manent loss of capital. It measures the largest single
different models are meaningful. The data span drop from peak to bottom in the value of a port-
from January 1996 to August 2016. folio. In brief, the MDD offers investors a worst-
case scenario.

94    Hierarchical Clustering-Based Asset A llocation Multi-A sset Special Issue 2018
• The average turnover per rebalancing (TO) is models in the set containing the best model will depend
on how informative the data are. The MCS aims thus
1 F
TO = ∑|wi ,t − wi ,t −1| (18)
F t=2
at finding the best model and all models that are indis-
tinguishable from the best.

where F is the number of out-of-sample forecasts. EMPIRICAL RESULTS


Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

• The sum of squared portfolio weights (SSPW )


used in Goetzmann and Kumar [2008] exhibits S&P Sectors
the underlying level of diversification in a portfolio
and is defined as follows: Exhibit 2 highlights the attractiveness of hierar-
chical-clustering-based portfolios, especially the DBHT-
1 F N 2 based model. It is the only model selected in the best
SSPW = ∑ ∑ wi ,t
F t = 2 i =1
(19)
models set Mˆ 70%

for both ASR and CEQ. This portfolio

is diversified (SSPW = 0.122), but the average turnover
SSPW ranges from 0 to 1, where 1 represents the per rebalancing is elevated in comparison with other
most concentrated portfolio. models (TO = 3.45%).
The MV is included in Mˆ ASR

− 20% but its diversifica-
No transaction costs or economic costs gener- tion ratio SSPW is by far the highest of all models: The
ated by the turnover are reported. Indeed, the study of portfolio is concentrated instead of being diversified.
transaction costs is difficult because investors face dif-
ferent fees and the same strategy can be implemented via
futures or exchange-traded funds (ETFs), or contracts
for difference (CFDs), or cash. Moreover, taxes and Exhibit 2
the chosen rebalancing strategy inf luence costs. Nev- Investment Strategies Comparison: S&P 500 Sectors
ertheless, high average turnover per rebalancing leads (January 1996–August 2016)
to expensive strategies.

Data Snooping

Data snooping occurs when the same dataset is


employed more than once for inference and model selec-
tion. It leads to the possibility that any successful results
may be spurious because they could be due to chance
(White [2000]). In other words, looking long enough
and hard enough at a given dataset will often reveal one
or more forecasting models that look good but are in
fact useless.
To avoid data snooping (White [2000]), we Notes: This exhibit reports comparison criteria used to evaluate the quality
of the models: the adjusted Sharpe ratio (ASR), the certainty-equivalent
compute the model confidence set (MCS) procedure return (CEQ) in percent, the max drawdown (MDD) in percent, the
proposed by Hansen et al. [2011]. The MCS procedure average turnover per rebalancing (TO) in percent, and the sum of squared
is a model selection algorithm that filters a set of models portfolio weights (SSPW). EW is the equal-weight allocation, MV is
the minimum-variance allocation, MDP is the most diversified portfolio
from a given entirety of models. The resulting set con- allocation, ERC is the equal-risk-contribution allocation, IVRB is
tains the best models with a probability that is no less the inverse-volatility risk budget allocation, SL is the simple-linkage-
than 1 - α, with α being the size of the test (see the based allocation, CL is the complete-linkage-based allocation, AL is the
average-linkage-based allocation, WM is the Ward’s-method-based alloca-
appendix for a formal description). tion, DBHT is the directed bubble hierarchical tree-based allocation.
An advantage of the test is that it does not nec- a ˆ∗ ˆ∗
and b indicate the model is in the set of best models M 20% and M 70%,
essarily select a single model; it instead acknowledges respectively.
possible limitations in the data because the number of

Multi-A sset Special Issue 2018 The Journal of Portfolio M anagement    95


CL belongs to Mˆ CEQ

− 20%. The portfolio is diver- Hierarchical clustering-based portfolios do not
sified (SSPW = 0.114), and the turnover is quite low face the same problems. AL, SL, and DBHT compose
(TO = 0.817%). Mˆ CEQ

− 70%, while delivering reasonably good ASR. All
portfolios are diversified, and the average turnover
Multi-Assets Dataset per rebalancing is low for SL and AL. Again, DBHT’s
average turnover per rebalancing is elevated in com-
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

Exhibit 3 paints a contrasting picture: Risk- parison with other models.


based portfolios achieve impressive ASR along with
low CEQ. For instance, IVRB constitutes the best Individual Stocks
models set Mˆ ASR

− 70%. To highlight the interest of the
MCS, it is important to note that IVRB does not obtain Exhibit 4 illustrates that hierarchical clustering-
the higher ASR, yet it is the best model. Moreover, based portfolios outperform risk-based portfolios.
MDP and ERC are selected in Mˆ ASR ∗
− 20%. That said, Indeed, DBHT is the only model selected in the best
risk-based portfolios attain very low CEQ, especially models set Mˆ ASR
∗ ˆ∗
− 70% , and the best models set M CEQ − 70% is
IVRB (CEQ = 0.951). Above all, they do not produce only constituted by one model: AL. Both portfolios are
diversified portfolios. This implies that portfolios are diversified.
invested almost solely in bonds, thereby being very The main drawback is the surprising elevated
exposed to shocks from this asset class. This is not the average turnover per rebalancing. This point needs to
aim of diversified portfolios. be further investigated—in particular, the impact of the
criteria employed to select the number of clusters and

Exhibit 3 Exhibit 4
Investment Strategies Comparison: S&P 500 Sectors, Investment Strategies Comparison: Individual
February 1989–August 2016 Stocks, January 1996–August 2016

Notes: This exhibit reports comparison criteria used to evaluate the quality Notes: This exhibit reports comparison criteria used to evaluate the quality
of the models: the adjusted Sharpe ratio (ASR), the certainty-equivalent of the models: the adjusted Sharpe ratio (ASR), the certainty-equivalent
return (CEQ) in percent, the max drawdown (MDD) in percent, the return (CEQ) in percent, the max drawdown (MDD) in percent, the
average turnover per rebalancing (TO) in percent, and the sum of squared average turnover per rebalancing (TO) in percent, and the sum of squared
portfolio weights (SSPW). EW is the equal-weight allocation, MV is portfolio weights (SSPW). EW is the equal-weight allocation, MV is
the minimum-variance allocation, MDP is the most diversified portfolio the minimum-variance allocation, MDP is the most diversified portfolio
allocation, ERC is the equal-risk-contribution allocation, IVRB is allocation, ERC is the equal-risk-contribution allocation, IVRB is
the inverse-volatility risk budget allocation, SL is the simple-linkage- the inverse-volatility risk budget allocation, SL is the simple-linkage-
based allocation, CL is the complete-linkage-based allocation, AL is the based allocation, CL is the complete-linkage-based allocation, AL is the
average-linkage-based allocation, WM is the Ward’s-method-based alloca- average-linkage-based allocation, WM is the Ward’s-method-based alloca-
tion, DBHT is the directed bubble hierarchical tree–based allocation. tion, DBHT is the directed bubble hierarchical tree–based allocation.
a ˆ∗ ˆ∗
and b indicate the model is in the set of best models M 20% and M 70%, a
and b indicate the model is in the set of best models Mˆ ∗ and Mˆ ∗
20% 70%
respectively. respectively.

96    Hierarchical Clustering-Based Asset A llocation Multi-A sset Special Issue 2018
the use of “shrinkage” to improve the estimation of the winner: DBHT-based portfolios attain slightly supe-
correlation matrix (see Ledoit and Wolf [2004], Ledoit rior risk-adjusted returns, but AL-based portfolios are
and Wolf [2014], and Gerber et al. [2015]). clearly more robust.
Last but not least, this article opens the door for
CONCLUSION further research. Testing other clustering methods and
investigating typical machine learning issues, such as
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

Diversification is often spoken of as the only free the choice of the distance measure and the criteria
lunch in investing. Yet, truly diversifying a portfolio is used to select the number of clusters, come naturally
easier said than done. For instance, modern portfolio to mind. Above all, improving the estimation of the
optimization techniques often fail to outperform a basic correlation matrix seems to be the most important pri-
equal-weighted allocation (DeMiguel et al. [2009]). ority. Potential improvements may come from the use
Building upon the fundamental notion of hier- of “shrinkage.”
archy (Simon [1962]), López de Prado [2016a] intro-
duces a new portfolio diversification technique called
hierarchical risk parity, which uses graph theory and
Appendix
machine learning techniques. MODEL CONFIDENCE SET
Exploiting the same basic idea in a different way,
we propose a hierarchical clustering-based asset alloca- Define a set M 0 that contains the set of models under
tion. Classical and more modern hierarchical clustering evaluation indexed by i = 0, …, m 0. Let di,j,t denote the loss
methods are tested, namely, simple linkage, complete differential between two models by
linkage, average linkage, Ward’s method, and the
di , j ,t = L i ,t − L j ,t , ∀i, j ∈ M 0 (A-1)
directed bubble hierarchical tree. Once the assets are
hierarchically clustered, a simple and efficient capital
L is the loss calculated from some loss function for each evalu-
allocation within and across clusters of investments at ation point t = 1, …, F. The set of superior models is defined as
multiple hierarchical levels is computed. The main prin-
ciple is to find a diversified weighting by distributing M ∗ = {i ∈ M 0 : E[di , j ,t ] ≤ 0 ∀j ∈ M 0 } (A-2)
capital equally to each cluster hierarchy, so that many
correlated assets receive the same total allocation as a The MCS uses a sequential testing procedure to deter-
single uncorrelated one. mine M*. The null hypothesis being tested is
The out-of-sample performances of hierarchical-
clustering-based portfolios and more traditional risk- H 0,M : E[di , j ,t ] = 0 ∀i, j ∈ M where M is a subset of M 0
H
based portfolios are evaluated across three empirical  A,M : E[di , j ,t ] ≠ 0 for some i, j ∈ M
(A-3)
datasets, which differ in terms of number of assets and
composition of the universe (S&P sectors, multi-assets, When the equivalence test rejects the null hypothesis, at least
and individual stocks). To prevent strategies that perform one model in the set M is considered inferior and the model
by luck from being considered effective, we assess the that contributes the most to the rejection of the null is elimi-
comparison of profit measures using the bootstrap-based nated from the set M. This procedure is repeated until the null
model confidence set procedure (Hansen et al. [2011]). is accepted and the remaining models in M now equal Mˆ 1−α ∗
.
According to Hansen et al. [2011], the following two
The empirical results point out that hierarchical-
statistics can be used for the sequential testing of the null
clustering-based portfolios are truly diversified and hypothesis:
achieve statistically better risk-adjusted performances,
as measured by the the adjusted Sharpe ratio (Pezier di , j di
and White [2008]) and by the certainty-equivalent ti , j = and ti = (A-4)
 (d i , j )
var  (d i )
var
return on all datasets. The only exception concerns
the multi-assets dataset in which risk-based portfolios
produce impressive ASR along with ridiculously low where m is the number of models in M, di = (m − 1)−1 Σ j ∈M di , j,
is the simple loss of the ith model relative to the averages losses
CEQ. Among clustering methods, there is no clear
across models in the set M, and di , j = (m )−1 Σ mt =1di , j ,t measures the

Multi-A sset Special Issue 2018 The Journal of Portfolio M anagement    97


relative sample loss between the ith and jth models. Because the Goetzmann, W.N., and A. Kumar. “Equity Portfolio
distribution of the test statistic depends on unknown parame- Diversification.” Review of Finance, Vol. 12, No. 3 (2008),
ters, a bootstrap procedure is used to estimate the distribution. pp. 433-463.

Griveau-Billion, T., J.C. Richard, and T. Roncalli. “A Fast


ENDNOTES
Algorithm for Computing High-Dimensional Risk Parity
Portfolios.” Unpublished paper, 2013.
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

1
Since the seminal work of Mantegna [1999], correla-
tion networks have been extensively used in econophysics as
tools to filter, visualize, and analyze financial market data. Hansen, P., A. Lunde, and J. Nason. “The Model Confidence
2
The results of applying K-means or K-medoids clus- Set.” Econometrica, Vol. 79, No. 2 (2011), pp. 453-497.
tering algorithms depend on the choice for the number of
clusters to be searched and a starting configuration assign- Lau, A., M. Kolanovic, T. Lee, and R. Krishnamachari.
ment. In contrast, hierarchical clustering methods do not “Cross Asset Portfolios of Tradable Risk Premia Indices.”
require such specifications. Global Quantitative and Derivatives Strategy, JP Morgan,
3
We consider five if the equal-weighted portfolio is seen 2017.
as a risk-budgeting portfolio.
4
For investors, the choice of the rebalancing strategy is Ledoit, O., and M. Wolf. “Honey, I Shrunk the Sample Cova-
crucial. The periodic rebalancing is not optimal, and other riance Matrix.” The Journal of Portfolio Management, Vol. 30,
options should be investigated (Sun et al. [2006]). No. 4 (2004), pp. 110-119.
5
Data are available from the author upon request.
6
Similar to the adjusted Sharpe ratio, the modified ——. “Nonlinear Shrinkage of the Covariance Matrix
Sharpe ratio uses modified VaR adjusted for skewness and for Portfolio Selection: Markowitz Meets Goldilocks.”
kurtosis as a risk measure. Unpublished paper, 2014.
7
A risk-free interest rate of zero is assumed when cal-
culating the ASR and CEQ. Levy, M. “Measuring Portfolio Performance: Sharpe, Alpha,
or the Geometric Mean?” Unpublished paper, 2016.

REFERENCES López de Prado, M. “Building Diversified Portfolios That


Outperform Out of Sample.” The Journal of Portfolio Manage-
Alipour, E., C. Adolphs, A. Zaribafiyan, and M. Rounds.
ment, Vol. 42, No. 4 (2016a), pp. 59-69.
“Quantum-Inspired Hierarchical Risk Parity.” Unpublished
paper, 2016.
——. Mathematics and Economics: A Reality Check.” The
Journal of Portfolio Management, Vol. 43, No. 1 (2016b), pp. 5-8.
Anderson, P.W. “More Is Different.” Science, Vol. 177,
No. 4047 (1972), pp. 393-396.
Maillard, S., T. Roncalli, and J. Teiletche. “The Properties of
Equally Weighted Risk Contribution Portfolios.” The Journal
Aste, T., W. Shaw, and T.D. Matteo. “Correlation Structure
of Portfolio Management, Vol. 36, No. 4 (2010), pp. 60-70.
and Dynamics in Volatile Markets.” New Journal of Physics,
Vol. 12, No. 8 (2010), pp. 5-9.
Mantegna, R.N. “Hierarchical Structure in Financial Mar-
kets.” The European Physical Journal B: Condensed Matter and
Choueifaty, Y., T. Froidure, and J. Reynier. “Properties of
Complex Systems, Vol. 11, No. 1 (1999), pp. 193-197.
the Most Diversified Portfolio.” Journal of Investment Strategies,
Vol. 2, No. 2 (2013), pp. 44-70.
Markowitz, H. “Portfolio Selection.” The Journal of Finance,
Vol. 7, No. 1 (1952), pp. 77-91.
DeMiguel, V., L. Garlappi, and R. Uppal. “Optimal Versus
Naive Diversification: How Inefficient Is the 1/N Portfolio
Michaud, R. “The Markowitz Optimization Enigma: Is
Strategy?” The Review of Financial Studies, Vol. 22, No. 5
“Optimized” Optimal?” Financial Analysts Journal, Vol. 45,
(2009), pp. 1915-1953.
No. 1 (1989), pp. 31-42.
Gerber, S., H. Markowitz, and P. Pujara. “Enhancing Multi-
Musmeci, N., T. Aste, and T. Di Matteo. “Relation
Asset Portfolio Construction Under Modern Portfolio Theory
between Financial Market Structure and the Real Economy:
with a Robust Co-Movement Measure.” Unpublished paper,
Comparison Between Clustering Methods.” PLoS One,
2015.
Vol. 10, No. 3 (2015), pp. 201-210.

98    Hierarchical Clustering-Based Asset A llocation Multi-A sset Special Issue 2018
Pezier, J., and A. White. “The Relative Merits of Alternative Tumminello, M., T. Aste, T. Di Matteo, and R. Mantegna.
Investments in Passive Portfolios.” The Journal of Alternative “A Tool for Filtering Information in Complex Systems.”
Investments, Vol. 10, No. 4 (2008), pp. 37-39. Proceedings of the National Academy of Sciences of the United States
of America, Vol. 102, No. 30 (2005), pp. 11-23.
Roncalli, T. Introduction to Risk Parity and Budgeting. Boca
Raton, FL: Chapman & Hall, 2013. Tumminello, M., F. Lillo, and R. Mantegna. “Correlation,
Hierarchies, and Networks in Financial Markets.” Journal
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.

Simon, H.A. “The Architecture of Complexity.” Proceedings of Economic Behavior & Organization, Vol. 75, No. 1 (2010),
of the American Philosophical Society, Vol. 106, No. 6 (1962), pp. 40-58.
pp. 467-482.
Ward, J.H. “Hierarchical Grouping to Optimize an Objective
Song, W.M., T. Di Matteo, and T. Aste. “Hierarchical Function.” Journal of the American Statistical Association, Vol. 58,
Information Clustering by Means of Topologically Embedded No. 301 (1963), pp. 236-244.
Graphs.” PLoS One, Vol. 7, No. 3 (2012), pp. 41-50.
White, H. “A Reality Check for Data Snooping.” Econometrica,
Sun, W., A. Fan, L.W. Chen, T. Schouwenaars, and M. Albota. Vol. 68, No. 5 (2000), pp. 1097-1126.
“Optimal Rebalancing for Institutional Portfolios.” The
Journal of Portfolio Management, Vol. 32, No. 2 (2006),
pp. 33-43. To order reprints of this article, please contact David Rowe at
drowe@ iijournals.com or 212-224-3045.
Tibshirani, R., G. Walther, and T. Hastie. “Estimating the
Number of Clusters in a Data Set via the Gap Statistic.” Journal
of the Royal Statistical Society: Series B (Statistical Methodology),
Vol. 63, No. 2 (2001), pp. 411-423.

Multi-A sset Special Issue 2018 The Journal of Portfolio M anagement    99

You might also like