0% found this document useful (0 votes)
53 views14 pages

SUROP Report

This document discusses using random matrix theory to analyze stock market data and model correlation matrices. It presents the following key points: 1) Random matrix theory, specifically the Marchenko-Pastur law, can be used to test if stock market data is purely random by comparing the empirical eigenvalue distribution to the predicted distribution. Analysis of S&P 500 data found many deviations, indicating the data is not random. 2) The author proposes a multi-layer structured correlation model constructed using mode and clustering analysis of the data to better model the underlying correlation matrix. 3) Preliminary analysis of market modes found the largest eigenvalue mode reflects overall market trends while some other modes are more localized, reflecting sector-specific movements.

Uploaded by

qwsx098
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views14 pages

SUROP Report

This document discusses using random matrix theory to analyze stock market data and model correlation matrices. It presents the following key points: 1) Random matrix theory, specifically the Marchenko-Pastur law, can be used to test if stock market data is purely random by comparing the empirical eigenvalue distribution to the predicted distribution. Analysis of S&P 500 data found many deviations, indicating the data is not random. 2) The author proposes a multi-layer structured correlation model constructed using mode and clustering analysis of the data to better model the underlying correlation matrix. 3) Preliminary analysis of market modes found the largest eigenvalue mode reflects overall market trends while some other modes are more localized, reflecting sector-specific movements.

Uploaded by

qwsx098
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Seeking Gold in Sand: financial applications of Random

Matrix Theory in stock market data

Mike Shengbo Wang∗

Faculty of Mathematics & Department of Chemistry, University of Cambridge

November, 2016

Abstract

Covariance-correlation matrix estimation is central to modern portfolio theory; in this project,


we take a Random Matrix Theory based approach to compare a proposed multi-layer structured
correlation model, constructed using mode and clustering analyses, with the observed spectrum
of the empirical correlation matrix of S&P 500 stock market data. We will analyse the depen-
dence on layer depth of the constructed model, and obtain an accurate match between predicted
and observed empirical correlation matrix spectral distributions.

1 Introduction

Covariance-correlation matrix is of fundamental importance wherever large data sets are involved
and relations between many random variables are to be understood. In modern portfolio theory,
accurate estimation of covariance-correlation is crucial to risk management and asset allocation [1],
as correlations measure the tendency of collective movement of different stocks, and underpin the
interactions between them.
In this project we will consider a Random Matrix Theory based approach to testing a proposed
multi-layer structured correlation model, constructed with information extracted through mode and
clustering analyses.
The following section gives an overview of the data analysed and its processing. In Section 3 we
will introduce the Marčenko-Pastur law in Random Matrix Theory; in Sections 4 and 5 we will
consider the features of our data using mode and hierarchical clustering analyses; then in Section 7
we propose a multi-layered structure in our correlation model, with the effect of layer depth analysed
in Section 8; finally, in Section 10, we will briefly discuss possible developments to this project.
All computational work in this project has been carried out in MATLAB. The Live Scripts are
published on the author’s web-page [2].

This project is part of the Summer Undergraduate Research Opportunities Programme (SUROP), and is supported
by the Bridgwater Scheme.

1
2 Data Overview

The S&P 500 stock market data1 studied are stored as a matrix whose rows represent the trading
days and columns the different stocks. We first calculate the logarithmic returns for all consecutive
trading days for each stock,
pi pi − pi−1
log return = log (≈ ), i>1
pi−1 pi−1

where pi represent the price index of a stock on the i-th trading day.
This leaves us a matrix of T = 1258 rows of observations and P = 452 columns corresponding
to each individual stock. We will demean each column by subtracting the column average and
normalise the entries so that the total variance for any stock is one. The data matrix X : T × P is
now standardised, and the empirical (covariance-)correlation matrix is simply
1 T
E= X X. (1)
T

We will denote the underlying, or true, correlation matrix by C – this is the object that we will attempt
to model based on the spectrum of E.

3 Random Matrix Theory: The Marčenko-Pastur Law

Random Matrix Theory (RMT) was first introduced by John Wishart in 1928, who was the first di-
rector at the Statistical Laboratory at the University of Cambridge. It became a prominent field of
study when the physicist Eugene Wigner applied it to spacing of energy levels in nuclear physics
[3].

3.1 Statement of the Marčenko-Pastur law

The theory has a collection of universality laws, since it concerns the emergent behaviours of large
classes of random matrices in the asymptotic limit, that is to say, when the dimensions of the matrix
tend to infinity. One important instance of these is the Marčenko-Pastur law for a class of random
matrices called the Wishart ensemble, which include all correlation matrices:

Theorem 1 (The Marčenko-Pastur law). If X is a T × P random matrix whose entries are indepen-
dently identically distributed (i.i.d.) random variables (r.v.’s) with mean 0 and variance σ 2 < ∞, then
the eigenvalue density function (e.d.f) of matrix (1) is
p
1 (λ+ − λ)(λ − λ− )
f (λ) = 2
(2)
2πσ rλ

in the limit P, T → ∞ and P/T → r ∈ (0, 1), where λ± = σ 2 (1 ± r)2 .
1
See the author’s webpage [2].

2
histogram of observed eigenvalues
1.2
Marčenko-Pastur distribution

normalised eigenvalue density


1 0.015

0.8 0.01
market mode
0.005
0.6
0
20 40 60 80 100
0.4

0.2 signals

0
0 1 2 3 4 5 6 7 8
eigenvalue

Figure 1: The Marčenko-Pastur distribution does not match the empirical eigenvalue distribution of
the our S&P 500 stock market correlations. The eigenvalues lying outside the prediction range are
regarded as signals that our stock market data are not purely random.

3.2 Interpretation of the Marčenko-Pastur law

For our standardised data, variance σ 2 = 1. The key parameter of the Marčenko-Pastur distribution
is then the concentration r = P/T , which intuitively represents the abundance of observations
compared to the number of variables. An interpretation of the Marčenko-Pastur law in our context
is that if the logarithmic returns of our stocks are independently, identically distributed, i.e. totally
random, then regardless of the underlying distribution2 , the observed eigenvalue distribution of E is
governed by (2).
This result provides a natural test for the null hypothesis that the data are completely random: if we
plot the observed eigenvalue density function against the Marčenko-Pastur distribution, any eigen-
values that lie far out from the Marčenko-Pastur prediction can be regarded as signals, suggesting
that the data are not truly random. In Figure 1, we see that the Marčenko-Pastur distribution is far
from a match to our observed empirical eigenvalue distribution. There are many signals above and
below the edges of the Marčenko-Pastur prediction, all suggesting our S&P 500 stock market data
are not purely random.
However, this is not at all surprising. We know that in reality many stocks are related and the
market structure is entangled and complex. Our aim is to construct a better model for the underlying
correlation matrix C, and then use the techniques in RMT to derive a new prediction for the limiting
empirical eigenvalue distribution when P, T → ∞, which improves the detection for any new
signals.

4 Mode Analysis

The eigenvalue and eigenvector pairs of the empirical correlation matrix E are referred to as the
modes of the market. They give insight into the interactions between individual stocks as well as
market sectors.
One feature of these modes is the localisation of the eigenvector components, conveniently mea-
2
As long as its first and second moments are bounded, which is a reasonable assumption.

3
sured by the inverse participation ratio
P
X
IPR(v) = |ṽi |4
i=1

where ṽ is the vector v demeaned and normalised. In Figure 2 we have shown the components
of the market mode corresponding to the largest eigenvalue, the 9-th mode and the lowest mode
corresponding to the least eigenvalue. We see that the market mode has a low IPR, which means in
this mode all stocks move in a similar fashion, responding to the overall trend of the market (and
hence its name). The 8th mode is more localised, and we can differentiate the edges of the market
sectors. The lowest mode is highly localised, and the interaction between two companies in the
energy and industrial sectors are clearly visible.
These different types of modes suggest there are correlation interactions at the stock, sector and
market levels, so a layered model may be appropriate to capture such interactions.

4.1 Digression: uniformity of the market mode

It is observed that components of the market mode eigenvector are relatively uniform, and more
crucially, have the same sign. The proposition below may explain this.
Proposition 1. If A be a positive square matrix, then
1) it has a positive real eigenvalue λ1 with multiplicity 1 that has the largest magnitude of all its
eigenvalues;
2) it has a unique positive unit eigenvector and it corresponds to λ1 . All other eigenvectors must
have at least one negative or non-real component.

Proof. Let e1 be the unit eigenvector for the largest eigenvalue λ1 of A. Then
e1 = arg max xT Ax.
kxk=1

By reordering the basis we can without loss of generality (w.l.o.g.) assume that the first k compo-
nents of e1 are negative, and the rest are positive, where 1 < k < n and n is the dimension of matrix
A. Hence XX XX
eT1 Ae1 = (e1 )i Aij (e1 )j + (e1 )i Aij (e1 )j .
i≤k j≤k i>k j>k

However, switching the signs of (e1 )i≤k preserves the norm while increasing the sum above, so by
reductio ad absurdum, the non-zero components of e1 must be of the same sign. In fact, e1 cannot
have a zero component by the consistency condition Ae1 = λ1 e1 .
By orthogonality of eigenvectors belonging to different eigen-subspaces, the other eigenvectors
must have components of mixed signs.

5 Hierarchical Clustering Analysis

If we remove the market mode from the correlation matrix E, i.e. compute the ‘modified’ correlation
matrix
E 0 = E − λ1 v1 v1T (3)

4
0.08

0.07

0.06

0.05

0.04

0.03

0.02
Financials Materials
0.01 Energy IT Telecom
Consumer Staples Industrials Utilities
Consumer Discretionary Healthcare
0
0 50 100 150 200 250 300 350 400 450

(a) The market mode with eigenvalue 99.1 and IPR 3.98 × 10−5 .

Financials Industrials
Consumer Discretionary Healthcare IT
0.1 Consumer Staples
Energy

0.05

-0.05
Materials
Telecom
-0.1 Utilities

-0.15
0 50 100 150 200 250 300 350 400 450

(b) The 8th mode with eigenvalue 3.46 and IPR 6.51 × 10−3 .

Materials
Financials Utilities
0.4 Energy Telecom
Consumer Staples
Consumer Discretionary
0.2

-0.2
Healthcare
Industrials
IT
-0.4

-0.6
0 50 100 150 200 250 300 350 400 450

(c) The lowest mode with eigenvalue 0.0596 and IPR 0.149.

Figure 2: Plots of the eigenvector components of three different modes with their IPRs calculated.

5
ROST
RSH
BBBY
JWN
HD
BBY
COST
JCP
KSS
WMT
TGT
FDO
LTD
TJX
GPS
ANF
WAG
HCBK
CVS
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

(a) Dendrogram (b) Minimum spanning tree

Figure 3: Two visualisations of the stock market structure and relations without the market mode.
The minimum spanning tree stock nodes are coloured by market sector.

where λ1 , v1 are the largest eigenvalue and eigenvector, then hierarchical clustering analysis may
reveal hidden market structure and relations under the overall market movement.
To perform clustering we must specify a distance measure; a natural choice is the dissimilarity dis-
tance, defined by
dij = 1 − corr(i, j) (4)
where corr(i, j) is the correlation between stocks i and j. Since correlations are reflexive and always
between -1 and 1, this meets the criteria of a metric 3 . It is also a convenient choice as dij is linear in
the correlations.
We will here adopt the average linkage method, which means the distance between two clusters I, J
is the average distance between all pairs of stocks from the two sector
1 X
DIJ = dij (5)
|I||J| i∈I,j∈J

where |·| is the order of a cluster set.


Performing the clustering the analysis in MATLAB generates a dendrogram shown in Figure 3,
alongside a minimal spanning tree. These two visualisations of the stock market structure comple-
ment each other. The MATLAB output provides information about clusters which we will use to
construct our multi-layer structured correlation model in the following section.

6 Re-classification of Market Sectors

Our clustering analysis has given us a new way of defining different market sectors based on the
average dissimilarity distances. We have found that if we do not remove the market mode, the aver-
age linkage method produces a large number of singleton clusters. To avoid this, we have removed
3
The triangle inequality can be easily checked.

6
the market mode before performing clustering. We then redo the mode analysis in Section 4 to
find whether this new classification of market sectors is satisfactory. The results are presented in
Figure 4.
We see that the new classification of the market sectors unfortunately does not have clear boundaries
as we have seen in Figure 2. This is due to the removal of the market mode: despite being reasonably
uniform, the market mode still contains a substantial amount of structural information about the
market, as we could already differentiate the sectors in the original market mode plot in Figure 2.

7 A Multi-layer Structured Correlation Model and Its Predic-


tions

There is a major caveat here: although we have removed the market mode in our clustering analysis,
to construct the multi-layer structured correlation model we will restore it. This is because, as could
be seen in Figure 4a, the market mode is not perfectly uniform and the boundaries of one particular
sector could already be distinguished.
This means that in this particular case the market mode contains structural information about the
stock market, and we need to keep it if we are to build an accurate correlation model. In fact, if
we did not, as computations have shown, the constructed correlation matrix might not be positive
semi-definite and arbitrary control of the negative entries would have to be implemented.

7.1 The construction of the correlation model

In the multi-layered model, the diagonal blocks of the correlation matrix model C represent the
correlations inside the sub-clusters in the lowest layer from the top of the hierarchy. These diagonal
blocks make up large diagonal blocks at a higher layer, and at each layer the off-diagonal blocks
represent the correlation interactions between intermediate clusters in that layer. All background
entries will be filled in with average correlations in that part of the layer, and the diagonal entries
will be set to unit.
For example, for a two-sector market toy model,
 
1 α1 · · · α1
.
 α1 1 . . . ..
 

 .
 . ... ... β 
 . α1


α · · · α 1
 
C= 1 1
(6)
 
1 α2 ··· α2

 
 ... .. 
 α2 1 . 

 β .. ... ...


 . α2 
α2 ··· α2 1

where diagonal blocks represent two clusters with respective average internal correlations α1,2 , and
the constant off-diagonal block entry β represents the average interaction correlation between the
two clusters.

7
0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
0 50 100 150 200 250 300 350 400 450

(a) The market mode with eigenvalue 99.1 and IPR 3.98 × 10−5 .

0.15

0.1

0.05

-0.05

-0.1

-0.15
0 50 100 150 200 250 300 350 400 450

(b) The 8th mode with eigenvalue 3.46 and IPR 6.51 × 10−3 .

0.6

0.4

0.2

-0.2

-0.4

-0.6
0 50 100 150 200 250 300 350 400 450

(c) The lowest mode with eigenvalue 0.0596 and IPR 0.149.

Figure 4: Plots of the eigenvector components of the same three modes. The vertical coloured lines
define the boundaries of newly classified market sectors.

8
1
histogram of observed eigenvalues
50 0.9 1.2
Marčenko-Pastur distribution

normalised eigenvalue density


100 0.8 simulated analytic prediction
1
150 0.7
0.8
200 0.6

0.5 0.6
250
0.4
300
0.4
0.3
350
0.2 0.2
400
0.1
450 0
100 200 300 400 0 5 10 15
eigenvalue

(a) Heat-map (b) Simulated analytic prediction

Figure 5: The heat-map of a 50-layer correlation matrix model and its simulated analytic prediction
for the empirical correlation matrix spectrum.

7.2 New predictions based on the model

Now that we have a model from the underlying correlation matrix C with eigenvalues denoted
λ1 , . . . , λP , we will used techniques in RMT to derive a predicted limiting eigenvalue distribution
for the empirical correlation matrix E.
Let the limiting empirical eigenvalue density function (e.d.f.) of E be f (λ), then Marčenko and
Pastur have shown [4] that its Stieltjes transform, related by the transform pair
Z ∞
f (λ)
G(z) = dλ , f (λ) = lim Im G(λ + i), (7)
−∞ λ−z →0

must satisfy the integral equation


Z ∞
1 λν(λ)
− =z−r dλ , (8)
G(z) −∞ 1 + λG(z)
PQ
where the eigenvalue density function of the underlying correlation matrix is ν(λ) = j=1 pj δ(λ −
λj ) with pj ≡ nj /P , where nj are the multiplicity of the Q distinct eigenvalues λj .
This results in a polynomial equation of degree Q:
Q Q Q
Y X Y
[1 + zG(z)] [1 + λi G(z)] = rG(z) pi λi [1 + λj G(z)] . (9)
i=1 i=1 j6=i

Solving this polynomial equation would give the new predicted distribution for the empirical cor-
relation matrix spectrum, but in practice it is computationally costly and may suffer numerical in-
stability. Instead, simulations of the empirical correlation matrix by randomly generated Gaussian
data subject to correlation matrix C would suffice.
In Figure 5, we have shown the heat-map of a 30-layer correlation model constructed with informa-
tion from clustering analysis along with the simulated analytic prediction for the empirical correla-
tion matrix spectrum.

9
8 Dependence on Layer Depth of the Proposed Model

The key parameter of the proposed model that we could control is the layer depth, i.e. the number
of layers constructed. To increase the layer depth, we essentially divide the lowest layer: in this
process the original diagonal sub-blocks are split into two smaller diagonal sub-blocks, and new
off-diagonal blocks are created.

8.1 A fundamental structure of the proposed model

To understand this process in detail as well as its effect on the eigenvalues of the correlation model,
we consider the following matrices:
 
0 M1 B
M := Mm (1, α), M1,2 := Mm1,2 (1, α1,2 ) and M := (10)
B T M2
where m = m1 + m2 , and
···
 
x y y
... ... .. 
y .

Mn (x, y) ≡  . ...
... , B = β(1, . . . , 1)T (1, 1, . . . , 1) .
 .. y | {z } | {z }
m1 m2
y ··· y x
| {z }
n

The interpretation of these matrices is that M, M1,2 are all diagonal sub-blocks in the correlation
model C and they have the same general form of Mn (x, y) : n × n with diagonal entries x = 1 and
off-diagonal y = α, α1,2 . When layer division happens, the sub-block M becomes M 0 , and we can
view this change as a perturbation to entries α to α1,2 , β depending on its location, whether in M1,2
or in B (T ) .

8.2 Determining the characteristic equations of matrices (10)

The matrix Mn (x, y) can be reduced to a lower-triangular form by elementary operations:


x−λ−y 0 ... 0 y−x+λ
. .. ..
0 x − λ − y .. . .
det [Mn (x, y) − λI] = .. .. ..
. . . 0 y−x+λ
0 ... 0 x−λ−y y−x+λ
y ... y y x−λ
x−λ−y 0 ... 0 0
. .. ..
0 x − λ − y .. . .
= .. .. ..
. . . 0 0
0 ... 0 x−λ−y 0
y ... y y x − λ + (n − 1)y
= (x − λ − y)n−1 [x − λ + (n − 1)y] .

Hence the original diagonal sub-block M has an eigenvalue 1 − α of multiplicity m − 1 and a


non-degenerate eigenvalue 1 + (m − 1)α.

10
Using the identity for invertible matrix block V
S − T V −1 U T
    
S T I 0
= ,
U V −V −1 U I 0 V
we have
det(M 0 − λI) = det (M1 − λI) − B(M2 − λI)−1 B T det(M2 − λI)
 

for λ 6= 1 − α2 , 1 + (m2 − 1)α2 not an eigenvalue of M2 . Therefore


   
1
0 2
XX
−1  ..  
det(M − λI) = det (M1 − λI) − β (M2 − λI) 1 ··· 1  det(M2 − λI)

ij  . 
i j 1
= det [Mm1 (1 − γ, α1 − γ) − λI] det(M2 − λI)
= (1 − λ − α1 )m1 −1 [1 − λ − α1 + m1 (α1 − γ)] det(M2 − λI) (11)
where γ(λ) = β 2 i,j {(M2 − λI)−1 }ij .
P

Hence we see that for α1 6= α2 , by symmetry 1 ↔ 2, the characteristic equation of M 0 must be of


the form
0 = det(M 0 − λI) = (1 − λ − α1 )m1 −1 (1 − λ − α2 )m2 −1 p(λ) (12)
where p(λ) = 0 is a quadratic equation related to
1 − λ − α1 + m1 (α1 − γ) = 0. (13)
The eigenvalues 1 − α1,2 of M1,2 are still eigenvalues of M 0 with respective multiplicities m1,2 − 1,
and the remaining two eigenvalues of M 0 are roots of p(λ) = 0. To solve this we need to find γ, so
we must be able to invert M2 − λI to find γ(λ).
To this end, we turn to the Sherman-Morrison formula for help:
Theorem 2 (Sherman–Morrison). For an invertible matrix A and column vectors u, v of compatible
dimensions such that 1 + vT A−1 u 6= 0, the following formula holds:
T −1 −1 A−1 uvT A−1
(A + uv ) =A − .
1 + vT A−1 u

Proof. By direct verification.



By setting A = (1 − λ − α2 )I, u = v = α2 (1, . . . , 1)T , we have
| {z }
m2
  
1 ··· 1
1 α2  .. . . ..  .
(M2 − λI)−1 = I −

.
1 − λ − α2 1 − λ − α2 + m2 α2 . . 

1 ··· 1
Therefore
β2 m22 α2 β 2 m2
 
γ(λ) = m2 − =
1 − λ − α2 1 − λ − α2 + m2 α2 1 − λ − α2 + m2 α2
and equation (13) can be reduced to a more symmetric form in variable µ ≡ 1 − λ after rearranging,

q(µ) = µ2 +(m1 α1 +m2 α2 −α1 −α2 )µ+ m1 m2 α1 α2 − (m1 + m2 − 1)α1 α2 − β 2 m1 m2 = 0. (14)


 

Although in this derivation we have assumed λ 6= 1 − α2 , 1 + (m2 − 1)α2 , the characteristic equa-
tion (12) with p(λ) ≡ q(µ) is valid ∀λ because any divergence is offset by the det(M2 − λI) factor
in equation (11).

11
8.3 Interpretation of the solutions of the polynomial equation (14)

In the case α1 = α2 = β ≡ α, equation (14) has two roots λ1 = 1 + (m1 + m2 − 1)α and λ2 = 1 − α,
just as expected for eigenvalues of M since now M 0 = M . Here we note that λ2 coincides with the
other eigenvalues arising from the factor (1 − λ − α1 )m1 −1 (1 − λ − α2 )m2 −1 in the characteristic
polynomial (11).
When layer division takes place to increase the layer number, we may have α1 = α2 ≡ α 6= β so
that equation (14) is perturbed to

µ2 + (m − 2)αµ + (m1 − 1)(m2 − 1)α2 − β 2 m1 m2 = 0


 
(15)

with roots denoted by λ01,2 , where we recall m = m1 + m2 . By trace consideration of M 0 or the


properties of quadratic equations, we see that now λ01 + λ02 = λ1 + λ2 = 2 + (m − 2)α.
What this means is that in increasing the layer number by perturbing β = α in M to β  1 in
M 0 (as observed in the computed model of C), we have decreased the product of the roots λ1 λ2 to
λ01 λ02 while their sum must be kept the same. Intuitively and conclusively, this tells us more layers
in our model result in greater abundance of very large eigenvalues like λ01 and positive eigenvalues
λ02 really close to zero.
Indeed, in Figure 6 where we compare the predicted empirical spectral density functions of a 10-layer
correlation model and a 148-layer correlation model, we see that the latter is a closer match for the
small positive eigenvalues. In fact, the latter is an excellent match with the observed eigenvalues,
the best we have achieved!

1.2 1.2
1 1
normalised eigenvalue density
normalised eigenvalue density

1 1
0.5 0.5

0.8 0.8
0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
0.6 0.6

0.4 0.4
histogram of observed eigenvalues histogram of observed eigenvalues
Marčenko-Pastur distribution Marčenko-Pastur distribution
0.2 0.2
simulated analytic prediction simulated analytic prediction

0 0
0 5 10 15 0 5 10 15
eigenvalue eigenvalue

(a) 10 layers (b) 148 layers

Figure 6: Comparison of the predicted empirical spectral density functions of a 10-layer correlation
model and a 148-layer one with the market mode.

9 Alternative Model with Prior Sectoring

We saw in Section 6 that the new classification of sectors was not robust, whereas in Section 4 we
saw at the market mode level the distinction between pre-assigned sectors was already obvious. We
wonder if this prior information on sectoring could be incorporate in our model. Here we propose
the following construction based on the observations.
We will assume the pre-assigned sectors have zero (little) correlations, whereas inside each sector

12
1.2
1

normalised eigenvalue density


1
0.5

0.8
0
0 0.5 1 1.5 2 2.5 3
0.6

0.4 histogram of observed eigenvalues


Marčenko-Pastur distribution
0.2 original proposed model
alternative proposed model
0
0 5 10 15
eigenvalue

Figure 7: Predicted empirical spectral density functions of the alternative model with prior sectoring,
superposed with the M-P law and the previous analytic prediction.

the stocks have equal mutual correlations. The underlying correlation matrix then takes the block-
diagonal form  
M1

 M2 

 . . 
 . 
Ms
where Mi ≡ Mmi as defined in equation (10), and s = 10 is the number of pre-assigned sectors (as
in Figure 2). The data are processed without the removal of the market mode.
We repeat the procedure as in Section 7.2 to plot the predicted empirical spectrum of the correlation
matrix in Figure 7.
We see that this model, even without detailed layer division, is a reasonable fit to the observed eigen-
value density, but it resembles in shape more of the M-P distribution than the observed distribution,
or the previous 148-layer model.

10 Summary and Further Developments

Through mode and clustering analyses, we have been able to construct a multi-layer structured
correlation model for the S&P 500 stock market. By analysing the dependence of the spectrum of
our model on the layer depth, we have shown analytically that increasing the number of layers
improves the match between (simulated) prediction of the empirical spectral density function with
observed eigenvalues of the empirical correlation matrix.
This results in a reliable estimation for the underlying correlation structure of the market analysed,
which may then have a positive impact on investment portfolios.
However, our study of the stock market correlations can be further developed by considering:
1) edge asymptotics;
Our correlation matrix is finite-dimensional, which means eigenvalues are expected to leak out of
the edges of the predicted distribution of the empirical correlation matrix spectrum, which is derived

13
in the asymptotic limit. This leakage effect could be studied using the Tracy-Widom law.
2) time evolution;
The underlying stock market data have been assumed to have a stationary distribution in time, and
this is unlikely a good assumption. We need to build the time parameter/variable t into our model.
3) fine-tuning.
We have performed the hierarchical clustering analysis with average linkage to build a binary tree
structure. The correlation model can be improved with tailored clustering method suited for the
stock market data.

Acknowledgements

Many thanks goes to my project supervisor, Dr Lucy Colwell, and her PhD student Chongli Qin at
the Department of Chemistry, University of Cambridge, whose guidance and help have been crucial
to this research project. I am also grateful for the generous support by the Bridgwater Scheme.

References
[1] S. Pafka, M. Potters, and I. Kondor. Exponential Weighting and Random-Matrix-Theory-Based
Filtering of Financial Covariance Matrices for Portfolio Optimization. arXiv:cond-mat/0402573,
February 2004.

[2] S. Wang. SUROP Project. http://sw664.user.srcf.net/SUROP%20Project%202016/SUROP.html,


2016. [On-line, updated and accessed 25/09/16].

[3] N. C. Snaith, P. J. Forrester, and J. J. M. Verbaarschot. Developments in Random Matrix Theory.


arXiv:cond-mat/0303207, March 2003.

[4] V. A. Marčenko and L. A. Pastur. Distribution of eigenvalues in certain sets of random matrices.
Mathematics of the USSR-Sbornik, 1(4), 1967.

14

You might also like