Hierarchical Clustering-Based Asset Allocation: Homas Affinot
Hierarchical Clustering-Based Asset Allocation: Homas Affinot
Asset Allocation
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.
Thomas Raffinot
N
T homas R affinot obel Pr ize winner Harr y offset the benefits of diversification (López
is the Head of Quantitative Markowitz described diversifi de Prado [2016b]).
Macro Strategy at Silex-IP
cation, with its ability to enhance Exploring a new way of capital allo-
in Paris, France.
traffinot@gmail.com portfolio returns while reducing cation, López de Prado [2016a] introduces
risk, as the “only free lunch” in investing a portfolio diversification technique called
(Markowitz [1952]). Yet, diversifying a port- hierarchical risk parity (HRP). One of the
folio in real life is easier said than done. main advantages of HRP is in computing
Investors are aware of the benefits of a portfolio on an ill-degenerated or even a
diversification but form portfolios without singular covariance matrix. Lau et al. [2017]
giving proper consideration to the corre- apply HRP to different cross-asset universes
lations (Goetzmann and Kumar [2008]). consisting of many tradable risk premia
Moreover, modern and complex port- indexes and conf irm that HRP delivers
folio optimization methods are optimal in superior risk-adjusted returns. Alipour et al.
sample but often provide rather poor out-of- [2016] propose a quantum-inspired version
sample forecast performance. For instance, of HRP, which outperforms HRP and thus
DeMiguel et al. [2009] demonstrate that other conventional methods.
the equal-weighted allocation, which gives The starting point of HRP is that a cor-
the same importance to each asset, beats the relation matrix is too complex to be prop-
entire set of commonly used portfolio opti- erly analyzed and understood. If you have
mization techniques. In fact, optimized port- 1
N assets of interest, there are N ( N − 1)
folios depend on expected returns and risks, 2
but even small estimation errors can result in pairwise correlations among them and that
large deviations from optimal allocations in number grows quickly. For example, there
an optimizer’s result (Michaud [1989]). are as many as 4,950 correlation coefficients
To overcome this issue, academics between stocks of the FTSE 100 and 124,750
and practitioners have developed risk-based between stocks of the S&P 500. More impor-
portfolio optimization techniques (min- tantly, correlation matrices lack the notion
imum variance, equal-risk contribution, risk of hierarchy. Actually, Nobel Prize laureate
budgeting, etc.) that do not rely on return Herbert Simon has argued that complex sys-
forecasts (Roncalli [2013]). However, these tems can be arranged in a natural hierarchy
still require the inversion of a positive- comprising nested substructures (Simon
definite covariance matrix, which leads to [1962]). But, a correlation matrix makes no
errors of such magnitude that they entirely differentiation between assets. Yet, some
90 Hierarchical Clustering-Based Asset A llocation Multi-A sset Special Issue 2018
tend to be similar to each other and entities in different • CL: the distance between two clusters is the
clusters tend to be dissimilar. maximum of the distance between any two points
Hierarchical clustering refers to the formation of in the clusters. For clusters Ci, Cj,
a recursive clustering. The objective is to build a binary
tree of the data that successively merges similar groups dCi ,C j = max x ,y {D( x, y )|x ∈C i , y ∈C j } (3)
of points. The tree-based representation of the observa-
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.
tions is called a dendrogram. Visualizing this tree provides This method tends to produce compact clusters of
a useful summary of the data. similar size but is quite sensitive to outliers.
Hierarchical clustering requires a suitable distance • AL: the distance between two clusters is the
measure. The following distance is used (Mantegna average of the distance between any two points in
[1999]): the clusters. For clusters Ci, Cj,
where D i,j is the correlation-distance index between This is considered to be a fairly robust method.
the ith and jth asset, and ρi,j is the respective Pearson’s • WM (Ward [1963]): the distance between two
correlation coefficient. clusters is the increase of the squared error that
Four agglomerative clustering variants are tested results when two clusters are merged. For clusters
in this study—namely, single linkage (SL), complete Ci, Cj with sizes mi, mj, respectively,
linkage (CL), average linkage (AL), and Ward’s method
mi m j
(WM). dCi ,C j = ||c i − c j ||2 (5)
An agglomerative clustering starts with every mi + m j
observation representing a singleton cluster and then
combines the clusters sequentially, reducing the number where ci, cj are the centroids for the clusters.
of clusters at each step until only one cluster is left. At
each of the N − 1 steps, the closest two (least dissimilar) This method is biased toward globular clusters but
clusters are merged into a single cluster, producing one less susceptible to noise and outliers. It is one of the most
less cluster at the next higher level. Therefore, a measure popular methods.
of dissimilarity between two clusters must be defined, To determine the number of clusters, we employ
and different definitions of the distance between clus- the Gap index (Tibshirani et al. [2001]). It compares the
ters can produce radically different dendrograms. The logarithm of the empirical within-cluster dissimilarity
clustering variants are described below: and the corresponding one for uniformly distributed
data, which is a distribution with no obvious clustering.
• SL: the distance between two clusters is the min- The last approach differs completely from the
imum of the distance between any two points in agglomerative one: the idea of DBHT is to use the hier-
the clusters. For clusters Ci, Cj, archy hidden in the topology of a planar maximally
filtered graph (PMFG) (Tumminello et al. [2005]).
dCi ,C j = minx,y {D( x, y )|x ∈C i , y ∈C j } (2) The PMFG network keeps the hierarchical struc-
ture of the MST network but contains a greater amount of
This method is relatively simple and can handle information by connecting N nodes (assets) with 3(N - 2)
nonelliptical shapes. Nevertheless, it is sensitive edges. The basic elements of a PMFG are three-cliques
to outliers and can result in a problem called (subgraphs made of three nodes all reciprocally con-
chaining, whereby clusters end up being long and nected). For a detailed introduction of MST and PMFG,
straggly. The SL algorithm is strictly related to see Tumminello et al. [2005] and Aste et al. [2010].
the one that provides an MST. However the MST Musmeci et al. [2015] explains that the DBHT
retains some information that the SL dendrogram exploits this topological structure, and in particular, the
throws away. distinction between separating and nonseparating three-
cliques, to identify a clustering partition of all nodes
92 Hierarchical Clustering-Based Asset A llocation Multi-A sset Special Issue 2018
Let σ i2 be the variance of asset i, σij be the covari- • The minimum-variance (MV) portfolio is a risk-
ance between assets i and j, and Σ be the covariance budgeting portfolio in which the risk budget is
matrix. equal to the weight of the asset:
The volatility is defined as the risk of the portfolio:
bi = w i (12)
R w = σ w = w ′Σw (7)
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.
RC wi = bi RC w
INVESTMENT STRATEGIES COMPARISON
bi > 0
N Portfolios are updated on a daily basis via a 252-
∑ i =1 bi = 1 day rolling-window approach, with no forward-looking
w ≥ 0 biases. This approach differs from the traditional one,
i in which portfolios are rebalanced on a more realistic
N w =1
∑ i =1 i
(11) monthly basis.4 Nevertheless, the main objective of this
article is not to create a real investment strategy but
Once a set of risk budgets is defined, the weights of to compare asset allocation methods. The daily rebal-
the portfolio are computed so that the risk contributions ancing framework should help highlight the strengths
match the risk budgets. and weaknesses of the different approaches, especially
In this article, four risk-budgeting portfolios are the robustness.
considered:3
94 Hierarchical Clustering-Based Asset A llocation Multi-A sset Special Issue 2018
• The average turnover per rebalancing (TO) is models in the set containing the best model will depend
on how informative the data are. The MCS aims thus
1 F
TO = ∑|wi ,t − wi ,t −1| (18)
F t=2
at finding the best model and all models that are indis-
tinguishable from the best.
Data Snooping
Exhibit 3 Exhibit 4
Investment Strategies Comparison: S&P 500 Sectors, Investment Strategies Comparison: Individual
February 1989–August 2016 Stocks, January 1996–August 2016
Notes: This exhibit reports comparison criteria used to evaluate the quality Notes: This exhibit reports comparison criteria used to evaluate the quality
of the models: the adjusted Sharpe ratio (ASR), the certainty-equivalent of the models: the adjusted Sharpe ratio (ASR), the certainty-equivalent
return (CEQ) in percent, the max drawdown (MDD) in percent, the return (CEQ) in percent, the max drawdown (MDD) in percent, the
average turnover per rebalancing (TO) in percent, and the sum of squared average turnover per rebalancing (TO) in percent, and the sum of squared
portfolio weights (SSPW). EW is the equal-weight allocation, MV is portfolio weights (SSPW). EW is the equal-weight allocation, MV is
the minimum-variance allocation, MDP is the most diversified portfolio the minimum-variance allocation, MDP is the most diversified portfolio
allocation, ERC is the equal-risk-contribution allocation, IVRB is allocation, ERC is the equal-risk-contribution allocation, IVRB is
the inverse-volatility risk budget allocation, SL is the simple-linkage- the inverse-volatility risk budget allocation, SL is the simple-linkage-
based allocation, CL is the complete-linkage-based allocation, AL is the based allocation, CL is the complete-linkage-based allocation, AL is the
average-linkage-based allocation, WM is the Ward’s-method-based alloca- average-linkage-based allocation, WM is the Ward’s-method-based alloca-
tion, DBHT is the directed bubble hierarchical tree–based allocation. tion, DBHT is the directed bubble hierarchical tree–based allocation.
a ˆ∗ ˆ∗
and b indicate the model is in the set of best models M 20% and M 70%, a
and b indicate the model is in the set of best models Mˆ ∗ and Mˆ ∗
20% 70%
respectively. respectively.
96 Hierarchical Clustering-Based Asset A llocation Multi-A sset Special Issue 2018
the use of “shrinkage” to improve the estimation of the winner: DBHT-based portfolios attain slightly supe-
correlation matrix (see Ledoit and Wolf [2004], Ledoit rior risk-adjusted returns, but AL-based portfolios are
and Wolf [2014], and Gerber et al. [2015]). clearly more robust.
Last but not least, this article opens the door for
CONCLUSION further research. Testing other clustering methods and
investigating typical machine learning issues, such as
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.
Diversification is often spoken of as the only free the choice of the distance measure and the criteria
lunch in investing. Yet, truly diversifying a portfolio is used to select the number of clusters, come naturally
easier said than done. For instance, modern portfolio to mind. Above all, improving the estimation of the
optimization techniques often fail to outperform a basic correlation matrix seems to be the most important pri-
equal-weighted allocation (DeMiguel et al. [2009]). ority. Potential improvements may come from the use
Building upon the fundamental notion of hier- of “shrinkage.”
archy (Simon [1962]), López de Prado [2016a] intro-
duces a new portfolio diversification technique called
hierarchical risk parity, which uses graph theory and
Appendix
machine learning techniques. MODEL CONFIDENCE SET
Exploiting the same basic idea in a different way,
we propose a hierarchical clustering-based asset alloca- Define a set M 0 that contains the set of models under
tion. Classical and more modern hierarchical clustering evaluation indexed by i = 0, …, m 0. Let di,j,t denote the loss
methods are tested, namely, simple linkage, complete differential between two models by
linkage, average linkage, Ward’s method, and the
di , j ,t = L i ,t − L j ,t , ∀i, j ∈ M 0 (A-1)
directed bubble hierarchical tree. Once the assets are
hierarchically clustered, a simple and efficient capital
L is the loss calculated from some loss function for each evalu-
allocation within and across clusters of investments at ation point t = 1, …, F. The set of superior models is defined as
multiple hierarchical levels is computed. The main prin-
ciple is to find a diversified weighting by distributing M ∗ = {i ∈ M 0 : E[di , j ,t ] ≤ 0 ∀j ∈ M 0 } (A-2)
capital equally to each cluster hierarchy, so that many
correlated assets receive the same total allocation as a The MCS uses a sequential testing procedure to deter-
single uncorrelated one. mine M*. The null hypothesis being tested is
The out-of-sample performances of hierarchical-
clustering-based portfolios and more traditional risk- H 0,M : E[di , j ,t ] = 0 ∀i, j ∈ M where M is a subset of M 0
H
based portfolios are evaluated across three empirical A,M : E[di , j ,t ] ≠ 0 for some i, j ∈ M
(A-3)
datasets, which differ in terms of number of assets and
composition of the universe (S&P sectors, multi-assets, When the equivalence test rejects the null hypothesis, at least
and individual stocks). To prevent strategies that perform one model in the set M is considered inferior and the model
by luck from being considered effective, we assess the that contributes the most to the rejection of the null is elimi-
comparison of profit measures using the bootstrap-based nated from the set M. This procedure is repeated until the null
model confidence set procedure (Hansen et al. [2011]). is accepted and the remaining models in M now equal Mˆ 1−α ∗
.
According to Hansen et al. [2011], the following two
The empirical results point out that hierarchical-
statistics can be used for the sequential testing of the null
clustering-based portfolios are truly diversified and hypothesis:
achieve statistically better risk-adjusted performances,
as measured by the the adjusted Sharpe ratio (Pezier di , j di
and White [2008]) and by the certainty-equivalent ti , j = and ti = (A-4)
(d i , j )
var (d i )
var
return on all datasets. The only exception concerns
the multi-assets dataset in which risk-based portfolios
produce impressive ASR along with ridiculously low where m is the number of models in M, di = (m − 1)−1 Σ j ∈M di , j,
is the simple loss of the ith model relative to the averages losses
CEQ. Among clustering methods, there is no clear
across models in the set M, and di , j = (m )−1 Σ mt =1di , j ,t measures the
1
Since the seminal work of Mantegna [1999], correla-
tion networks have been extensively used in econophysics as
tools to filter, visualize, and analyze financial market data. Hansen, P., A. Lunde, and J. Nason. “The Model Confidence
2
The results of applying K-means or K-medoids clus- Set.” Econometrica, Vol. 79, No. 2 (2011), pp. 453-497.
tering algorithms depend on the choice for the number of
clusters to be searched and a starting configuration assign- Lau, A., M. Kolanovic, T. Lee, and R. Krishnamachari.
ment. In contrast, hierarchical clustering methods do not “Cross Asset Portfolios of Tradable Risk Premia Indices.”
require such specifications. Global Quantitative and Derivatives Strategy, JP Morgan,
3
We consider five if the equal-weighted portfolio is seen 2017.
as a risk-budgeting portfolio.
4
For investors, the choice of the rebalancing strategy is Ledoit, O., and M. Wolf. “Honey, I Shrunk the Sample Cova-
crucial. The periodic rebalancing is not optimal, and other riance Matrix.” The Journal of Portfolio Management, Vol. 30,
options should be investigated (Sun et al. [2006]). No. 4 (2004), pp. 110-119.
5
Data are available from the author upon request.
6
Similar to the adjusted Sharpe ratio, the modified ——. “Nonlinear Shrinkage of the Covariance Matrix
Sharpe ratio uses modified VaR adjusted for skewness and for Portfolio Selection: Markowitz Meets Goldilocks.”
kurtosis as a risk measure. Unpublished paper, 2014.
7
A risk-free interest rate of zero is assumed when cal-
culating the ASR and CEQ. Levy, M. “Measuring Portfolio Performance: Sharpe, Alpha,
or the Geometric Mean?” Unpublished paper, 2016.
98 Hierarchical Clustering-Based Asset A llocation Multi-A sset Special Issue 2018
Pezier, J., and A. White. “The Relative Merits of Alternative Tumminello, M., T. Aste, T. Di Matteo, and R. Mantegna.
Investments in Passive Portfolios.” The Journal of Alternative “A Tool for Filtering Information in Complex Systems.”
Investments, Vol. 10, No. 4 (2008), pp. 37-39. Proceedings of the National Academy of Sciences of the United States
of America, Vol. 102, No. 30 (2005), pp. 11-23.
Roncalli, T. Introduction to Risk Parity and Budgeting. Boca
Raton, FL: Chapman & Hall, 2013. Tumminello, M., F. Lillo, and R. Mantegna. “Correlation,
Hierarchies, and Networks in Financial Markets.” Journal
Downloaded from https://jpm.iijournals.com by SOMRUP CHAKRABORTY on May 18, 2019. Copyright 2017 Pageant Media Ltd.
Simon, H.A. “The Architecture of Complexity.” Proceedings of Economic Behavior & Organization, Vol. 75, No. 1 (2010),
of the American Philosophical Society, Vol. 106, No. 6 (1962), pp. 40-58.
pp. 467-482.
Ward, J.H. “Hierarchical Grouping to Optimize an Objective
Song, W.M., T. Di Matteo, and T. Aste. “Hierarchical Function.” Journal of the American Statistical Association, Vol. 58,
Information Clustering by Means of Topologically Embedded No. 301 (1963), pp. 236-244.
Graphs.” PLoS One, Vol. 7, No. 3 (2012), pp. 41-50.
White, H. “A Reality Check for Data Snooping.” Econometrica,
Sun, W., A. Fan, L.W. Chen, T. Schouwenaars, and M. Albota. Vol. 68, No. 5 (2000), pp. 1097-1126.
“Optimal Rebalancing for Institutional Portfolios.” The
Journal of Portfolio Management, Vol. 32, No. 2 (2006),
pp. 33-43. To order reprints of this article, please contact David Rowe at
drowe@ iijournals.com or 212-224-3045.
Tibshirani, R., G. Walther, and T. Hastie. “Estimating the
Number of Clusters in a Data Set via the Gap Statistic.” Journal
of the Royal Statistical Society: Series B (Statistical Methodology),
Vol. 63, No. 2 (2001), pp. 411-423.