0% found this document useful (0 votes)

53 views14 pages

SUROP Report

This document discusses using random matrix theory to analyze stock market data and model correlation matrices. It presents the following key points: 1) Random matrix theory, specifically the Marchenko-Pastur law, can be used to test if stock market data is purely random by comparing the empirical eigenvalue distribution to the predicted distribution. Analysis of S&P 500 data found many deviations, indicating the data is not random. 2) The author proposes a multi-layer structured correlation model constructed using mode and clustering analysis of the data to better model the underlying correlation matrix. 3) Preliminary analysis of market modes found the largest eigenvalue mode reflects overall market trends while some other modes are more localized, reflecting sector-specific movements.

Uploaded by

qwsx098

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views14 pages

SUROP Report

Uploaded by

qwsx098

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Seeking Gold in Sand: financial applications of Random

Matrix Theory in stock market data

Mike Shengbo Wang∗

Faculty of Mathematics & Department of Chemistry, University of Cambridge

November, 2016

Abstract

Covariance-correlation matrix estimation is central to modern portfolio theory; in this project,

we take a Random Matrix Theory based approach to compare a proposed multi-layer structured
correlation model, constructed using mode and clustering analyses, with the observed spectrum
of the empirical correlation matrix of S&P 500 stock market data. We will analyse the depen-
dence on layer depth of the constructed model, and obtain an accurate match between predicted
and observed empirical correlation matrix spectral distributions.

1 Introduction

Covariance-correlation matrix is of fundamental importance wherever large data sets are involved
and relations between many random variables are to be understood. In modern portfolio theory,
accurate estimation of covariance-correlation is crucial to risk management and asset allocation [1],
as correlations measure the tendency of collective movement of different stocks, and underpin the
interactions between them.
In this project we will consider a Random Matrix Theory based approach to testing a proposed
multi-layer structured correlation model, constructed with information extracted through mode and
clustering analyses.
The following section gives an overview of the data analysed and its processing. In Section 3 we
will introduce the Marčenko-Pastur law in Random Matrix Theory; in Sections 4 and 5 we will
consider the features of our data using mode and hierarchical clustering analyses; then in Section 7
we propose a multi-layered structure in our correlation model, with the effect of layer depth analysed
in Section 8; finally, in Section 10, we will briefly discuss possible developments to this project.
All computational work in this project has been carried out in MATLAB. The Live Scripts are
published on the author’s web-page [2].
∗
This project is part of the Summer Undergraduate Research Opportunities Programme (SUROP), and is supported
by the Bridgwater Scheme.

1
2 Data Overview

The S&P 500 stock market data1 studied are stored as a matrix whose rows represent the trading
days and columns the different stocks. We first calculate the logarithmic returns for all consecutive
trading days for each stock,
pi pi − pi−1
log return = log (≈ ), i>1
pi−1 pi−1

where pi represent the price index of a stock on the i-th trading day.
This leaves us a matrix of T = 1258 rows of observations and P = 452 columns corresponding
to each individual stock. We will demean each column by subtracting the column average and
normalise the entries so that the total variance for any stock is one. The data matrix X : T × P is
now standardised, and the empirical (covariance-)correlation matrix is simply
1 T
E= X X. (1)
T

We will denote the underlying, or true, correlation matrix by C – this is the object that we will attempt
to model based on the spectrum of E.

3 Random Matrix Theory: The Marčenko-Pastur Law

Random Matrix Theory (RMT) was first introduced by John Wishart in 1928, who was the first di-
rector at the Statistical Laboratory at the University of Cambridge. It became a prominent field of
study when the physicist Eugene Wigner applied it to spacing of energy levels in nuclear physics
[3].

3.1 Statement of the Marčenko-Pastur law

The theory has a collection of universality laws, since it concerns the emergent behaviours of large
classes of random matrices in the asymptotic limit, that is to say, when the dimensions of the matrix
tend to infinity. One important instance of these is the Marčenko-Pastur law for a class of random
matrices called the Wishart ensemble, which include all correlation matrices:

Theorem 1 (The Marčenko-Pastur law). If X is a T × P random matrix whose entries are indepen-
dently identically distributed (i.i.d.) random variables (r.v.’s) with mean 0 and variance σ 2 < ∞, then
the eigenvalue density function (e.d.f) of matrix (1) is
p
1 (λ+ − λ)(λ − λ− )
f (λ) = 2
(2)
2πσ rλ
√
in the limit P, T → ∞ and P/T → r ∈ (0, 1), where λ± = σ 2 (1 ± r)2 .
1
See the author’s webpage [2].

2
histogram of observed eigenvalues
1.2
Marčenko-Pastur distribution

normalised eigenvalue density

1 0.015

0.8 0.01
market mode
0.005
0.6
0
20 40 60 80 100
0.4

0.2 signals

0
0 1 2 3 4 5 6 7 8
eigenvalue

Figure 1: The Marčenko-Pastur distribution does not match the empirical eigenvalue distribution of
the our S&P 500 stock market correlations. The eigenvalues lying outside the prediction range are
regarded as signals that our stock market data are not purely random.

3.2 Interpretation of the Marčenko-Pastur law

For our standardised data, variance σ 2 = 1. The key parameter of the Marčenko-Pastur distribution
is then the concentration r = P/T , which intuitively represents the abundance of observations
compared to the number of variables. An interpretation of the Marčenko-Pastur law in our context
is that if the logarithmic returns of our stocks are independently, identically distributed, i.e. totally
random, then regardless of the underlying distribution2 , the observed eigenvalue distribution of E is
governed by (2).
This result provides a natural test for the null hypothesis that the data are completely random: if we
plot the observed eigenvalue density function against the Marčenko-Pastur distribution, any eigen-
values that lie far out from the Marčenko-Pastur prediction can be regarded as signals, suggesting
that the data are not truly random. In Figure 1, we see that the Marčenko-Pastur distribution is far
from a match to our observed empirical eigenvalue distribution. There are many signals above and
below the edges of the Marčenko-Pastur prediction, all suggesting our S&P 500 stock market data
are not purely random.
However, this is not at all surprising. We know that in reality many stocks are related and the
market structure is entangled and complex. Our aim is to construct a better model for the underlying
correlation matrix C, and then use the techniques in RMT to derive a new prediction for the limiting
empirical eigenvalue distribution when P, T → ∞, which improves the detection for any new
signals.

4 Mode Analysis

The eigenvalue and eigenvector pairs of the empirical correlation matrix E are referred to as the
modes of the market. They give insight into the interactions between individual stocks as well as
market sectors.
One feature of these modes is the localisation of the eigenvector components, conveniently mea-
2
As long as its first and second moments are bounded, which is a reasonable assumption.

3
sured by the inverse participation ratio
P
X
IPR(v) = |ṽi |4
i=1

where ṽ is the vector v demeaned and normalised. In Figure 2 we have shown the components
of the market mode corresponding to the largest eigenvalue, the 9-th mode and the lowest mode
corresponding to the least eigenvalue. We see that the market mode has a low IPR, which means in
this mode all stocks move in a similar fashion, responding to the overall trend of the market (and
hence its name). The 8th mode is more localised, and we can differentiate the edges of the market
sectors. The lowest mode is highly localised, and the interaction between two companies in the
energy and industrial sectors are clearly visible.
These different types of modes suggest there are correlation interactions at the stock, sector and
market levels, so a layered model may be appropriate to capture such interactions.

4.1 Digression: uniformity of the market mode

It is observed that components of the market mode eigenvector are relatively uniform, and more
crucially, have the same sign. The proposition below may explain this.
Proposition 1. If A be a positive square matrix, then
1) it has a positive real eigenvalue λ1 with multiplicity 1 that has the largest magnitude of all its
eigenvalues;
2) it has a unique positive unit eigenvector and it corresponds to λ1 . All other eigenvectors must
have at least one negative or non-real component.

Proof. Let e1 be the unit eigenvector for the largest eigenvalue λ1 of A. Then
e1 = arg max xT Ax.
kxk=1

By reordering the basis we can without loss of generality (w.l.o.g.) assume that the first k compo-
nents of e1 are negative, and the rest are positive, where 1 < k < n and n is the dimension of matrix
A. Hence XX XX
eT1 Ae1 = (e1 )i Aij (e1 )j + (e1 )i Aij (e1 )j .
i≤k j≤k i>k j>k

However, switching the signs of (e1 )i≤k preserves the norm while increasing the sum above, so by
reductio ad absurdum, the non-zero components of e1 must be of the same sign. In fact, e1 cannot
have a zero component by the consistency condition Ae1 = λ1 e1 .
By orthogonality of eigenvectors belonging to different eigen-subspaces, the other eigenvectors
must have components of mixed signs.

5 Hierarchical Clustering Analysis

If we remove the market mode from the correlation matrix E, i.e. compute the ‘modified’ correlation
matrix
E 0 = E − λ1 v1 v1T (3)

4
0.08

0.07

0.06

0.05

0.04

0.03

0.02
Financials Materials
0.01 Energy IT Telecom
Consumer Staples Industrials Utilities
Consumer Discretionary Healthcare
0
0 50 100 150 200 250 300 350 400 450

(a) The market mode with eigenvalue 99.1 and IPR 3.98 × 10−5 .

Financials Industrials
Consumer Discretionary Healthcare IT
0.1 Consumer Staples
Energy

0.05

-0.05
Materials
Telecom
-0.1 Utilities

-0.15
0 50 100 150 200 250 300 350 400 450

(b) The 8th mode with eigenvalue 3.46 and IPR 6.51 × 10−3 .

Materials
Financials Utilities
0.4 Energy Telecom
Consumer Staples
Consumer Discretionary
0.2

-0.2
Healthcare
Industrials
IT
-0.4

-0.6
0 50 100 150 200 250 300 350 400 450

Figure 2: Plots of the eigenvector components of three different modes with their IPRs calculated.

5
ROST
RSH
BBBY
JWN
HD
BBY
COST
JCP
KSS
WMT
TGT
FDO
LTD
TJX
GPS
ANF
WAG
HCBK
CVS
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

(a) Dendrogram (b) Minimum spanning tree

Figure 3: Two visualisations of the stock market structure and relations without the market mode.
The minimum spanning tree stock nodes are coloured by market sector.

where λ1 , v1 are the largest eigenvalue and eigenvector, then hierarchical clustering analysis may
reveal hidden market structure and relations under the overall market movement.
To perform clustering we must specify a distance measure; a natural choice is the dissimilarity dis-
tance, defined by
dij = 1 − corr(i, j) (4)
where corr(i, j) is the correlation between stocks i and j. Since correlations are reflexive and always
between -1 and 1, this meets the criteria of a metric 3 . It is also a convenient choice as dij is linear in
the correlations.
We will here adopt the average linkage method, which means the distance between two clusters I, J
is the average distance between all pairs of stocks from the two sector
1 X
DIJ = dij (5)
|I||J| i∈I,j∈J

where |·| is the order of a cluster set.

Performing the clustering the analysis in MATLAB generates a dendrogram shown in Figure 3,
alongside a minimal spanning tree. These two visualisations of the stock market structure comple-
ment each other. The MATLAB output provides information about clusters which we will use to
construct our multi-layer structured correlation model in the following section.

6 Re-classification of Market Sectors

Our clustering analysis has given us a new way of defining different market sectors based on the
average dissimilarity distances. We have found that if we do not remove the market mode, the aver-
age linkage method produces a large number of singleton clusters. To avoid this, we have removed
3
The triangle inequality can be easily checked.

6
the market mode before performing clustering. We then redo the mode analysis in Section 4 to
find whether this new classification of market sectors is satisfactory. The results are presented in
Figure 4.
We see that the new classification of the market sectors unfortunately does not have clear boundaries
as we have seen in Figure 2. This is due to the removal of the market mode: despite being reasonably
uniform, the market mode still contains a substantial amount of structural information about the
market, as we could already differentiate the sectors in the original market mode plot in Figure 2.

7 A Multi-layer Structured Correlation Model and Its Predic-

tions

There is a major caveat here: although we have removed the market mode in our clustering analysis,
to construct the multi-layer structured correlation model we will restore it. This is because, as could
be seen in Figure 4a, the market mode is not perfectly uniform and the boundaries of one particular
sector could already be distinguished.
This means that in this particular case the market mode contains structural information about the
stock market, and we need to keep it if we are to build an accurate correlation model. In fact, if
we did not, as computations have shown, the constructed correlation matrix might not be positive
semi-definite and arbitrary control of the negative entries would have to be implemented.

7.1 The construction of the correlation model

In the multi-layered model, the diagonal blocks of the correlation matrix model C represent the
correlations inside the sub-clusters in the lowest layer from the top of the hierarchy. These diagonal
blocks make up large diagonal blocks at a higher layer, and at each layer the off-diagonal blocks
represent the correlation interactions between intermediate clusters in that layer. All background
entries will be filled in with average correlations in that part of the layer, and the diagonal entries
will be set to unit.
For example, for a two-sector market toy model,
 
1 α1 · · · α1
.
 α1 1 . . . ..
 

 .
 . ... ... β 
 . α1


α · · · α 1
 
C= 1 1
(6)
 
1 α2 ··· α2

 
 ... .. 
 α2 1 . 

 β .. ... ...


 . α2 
α2 ··· α2 1

where diagonal blocks represent two clusters with respective average internal correlations α1,2 , and
the constant off-diagonal block entry β represents the average interaction correlation between the
two clusters.

7
0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
0 50 100 150 200 250 300 350 400 450

(a) The market mode with eigenvalue 99.1 and IPR 3.98 × 10−5 .

0.15

0.1

0.05

-0.05

-0.1

-0.15
0 50 100 150 200 250 300 350 400 450

(b) The 8th mode with eigenvalue 3.46 and IPR 6.51 × 10−3 .

0.6

0.4

0.2

-0.2

-0.4

-0.6
0 50 100 150 200 250 300 350 400 450

Figure 4: Plots of the eigenvector components of the same three modes. The vertical coloured lines
define the boundaries of newly classified market sectors.

8
1
histogram of observed eigenvalues
50 0.9 1.2
Marčenko-Pastur distribution

normalised eigenvalue density

100 0.8 simulated analytic prediction
1
150 0.7
0.8
200 0.6

0.5 0.6
250
0.4
300
0.4
0.3
350
0.2 0.2
400
0.1
450 0
100 200 300 400 0 5 10 15
eigenvalue

(a) Heat-map (b) Simulated analytic prediction

Figure 5: The heat-map of a 50-layer correlation matrix model and its simulated analytic prediction
for the empirical correlation matrix spectrum.

7.2 New predictions based on the model

Now that we have a model from the underlying correlation matrix C with eigenvalues denoted
λ1 , . . . , λP , we will used techniques in RMT to derive a predicted limiting eigenvalue distribution
for the empirical correlation matrix E.
Let the limiting empirical eigenvalue density function (e.d.f.) of E be f (λ), then Marčenko and
Pastur have shown [4] that its Stieltjes transform, related by the transform pair
Z ∞
f (λ)
G(z) = dλ , f (λ) = lim Im G(λ + i), (7)
−∞ λ−z →0

must satisfy the integral equation

Z ∞
1 λν(λ)
− =z−r dλ , (8)
G(z) −∞ 1 + λG(z)
PQ
where the eigenvalue density function of the underlying correlation matrix is ν(λ) = j=1 pj δ(λ −
λj ) with pj ≡ nj /P , where nj are the multiplicity of the Q distinct eigenvalues λj .
This results in a polynomial equation of degree Q:
Q Q Q
Y X Y
[1 + zG(z)] [1 + λi G(z)] = rG(z) pi λi [1 + λj G(z)] . (9)
i=1 i=1 j6=i

Solving this polynomial equation would give the new predicted distribution for the empirical cor-
relation matrix spectrum, but in practice it is computationally costly and may suffer numerical in-
stability. Instead, simulations of the empirical correlation matrix by randomly generated Gaussian
data subject to correlation matrix C would suffice.
In Figure 5, we have shown the heat-map of a 30-layer correlation model constructed with informa-
tion from clustering analysis along with the simulated analytic prediction for the empirical correla-
tion matrix spectrum.

9
8 Dependence on Layer Depth of the Proposed Model

The key parameter of the proposed model that we could control is the layer depth, i.e. the number
of layers constructed. To increase the layer depth, we essentially divide the lowest layer: in this
process the original diagonal sub-blocks are split into two smaller diagonal sub-blocks, and new
off-diagonal blocks are created.

8.1 A fundamental structure of the proposed model

To understand this process in detail as well as its effect on the eigenvalues of the correlation model,
we consider the following matrices:

0 M1 B
M := Mm (1, α), M1,2 := Mm1,2 (1, α1,2 ) and M := (10)
B T M2
where m = m1 + m2 , and
···
 
x y y
... ... .. 
y .

Mn (x, y) ≡  . ...
... , B = β(1, . . . , 1)T (1, 1, . . . , 1) .
 .. y | {z } | {z }
m1 m2
y ··· y x
| {z }
n

The interpretation of these matrices is that M, M1,2 are all diagonal sub-blocks in the correlation
model C and they have the same general form of Mn (x, y) : n × n with diagonal entries x = 1 and
off-diagonal y = α, α1,2 . When layer division happens, the sub-block M becomes M 0 , and we can
view this change as a perturbation to entries α to α1,2 , β depending on its location, whether in M1,2
or in B (T ) .

8.2 Determining the characteristic equations of matrices (10)

The matrix Mn (x, y) can be reduced to a lower-triangular form by elementary operations:

x−λ−y 0 ... 0 y−x+λ
. .. ..
0 x − λ − y .. . .
det [Mn (x, y) − λI] = .. .. ..
. . . 0 y−x+λ
0 ... 0 x−λ−y y−x+λ
y ... y y x−λ
x−λ−y 0 ... 0 0
. .. ..
0 x − λ − y .. . .
= .. .. ..
. . . 0 0
0 ... 0 x−λ−y 0
y ... y y x − λ + (n − 1)y
= (x − λ − y)n−1 [x − λ + (n − 1)y] .

Hence the original diagonal sub-block M has an eigenvalue 1 − α of multiplicity m − 1 and a

non-degenerate eigenvalue 1 + (m − 1)α.

10
Using the identity for invertible matrix block V
S − T V −1 U T

S T I 0
= ,
U V −V −1 U I 0 V
we have
det(M 0 − λI) = det (M1 − λI) − B(M2 − λI)−1 B T det(M2 − λI)

for λ 6= 1 − α2 , 1 + (m2 − 1)α2 not an eigenvalue of M2 . Therefore

   
1
0 2
XX
−1  ..  
det(M − λI) = det (M1 − λI) − β (M2 − λI) 1 ··· 1  det(M2 − λI)

ij  . 
i j 1
= det [Mm1 (1 − γ, α1 − γ) − λI] det(M2 − λI)
= (1 − λ − α1 )m1 −1 [1 − λ − α1 + m1 (α1 − γ)] det(M2 − λI) (11)
where γ(λ) = β 2 i,j {(M2 − λI)−1 }ij .
P

Hence we see that for α1 6= α2 , by symmetry 1 ↔ 2, the characteristic equation of M 0 must be of

the form
0 = det(M 0 − λI) = (1 − λ − α1 )m1 −1 (1 − λ − α2 )m2 −1 p(λ) (12)
where p(λ) = 0 is a quadratic equation related to
1 − λ − α1 + m1 (α1 − γ) = 0. (13)
The eigenvalues 1 − α1,2 of M1,2 are still eigenvalues of M 0 with respective multiplicities m1,2 − 1,
and the remaining two eigenvalues of M 0 are roots of p(λ) = 0. To solve this we need to find γ, so
we must be able to invert M2 − λI to find γ(λ).
To this end, we turn to the Sherman-Morrison formula for help:
Theorem 2 (Sherman–Morrison). For an invertible matrix A and column vectors u, v of compatible
dimensions such that 1 + vT A−1 u 6= 0, the following formula holds:
T −1 −1 A−1 uvT A−1
(A + uv ) =A − .
1 + vT A−1 u

Proof. By direct verification.

√
By setting A = (1 − λ − α2 )I, u = v = α2 (1, . . . , 1)T , we have
| {z }
m2
  
1 ··· 1
1 α2  .. . . ..  .
(M2 − λI)−1 = I −

.
1 − λ − α2 1 − λ − α2 + m2 α2 . . 

1 ··· 1
Therefore
β2 m22 α2 β 2 m2

γ(λ) = m2 − =
1 − λ − α2 1 − λ − α2 + m2 α2 1 − λ − α2 + m2 α2
and equation (13) can be reduced to a more symmetric form in variable µ ≡ 1 − λ after rearranging,

q(µ) = µ2 +(m1 α1 +m2 α2 −α1 −α2 )µ+ m1 m2 α1 α2 − (m1 + m2 − 1)α1 α2 − β 2 m1 m2 = 0. (14)

Although in this derivation we have assumed λ 6= 1 − α2 , 1 + (m2 − 1)α2 , the characteristic equa-
tion (12) with p(λ) ≡ q(µ) is valid ∀λ because any divergence is offset by the det(M2 − λI) factor
in equation (11).

11
8.3 Interpretation of the solutions of the polynomial equation (14)

In the case α1 = α2 = β ≡ α, equation (14) has two roots λ1 = 1 + (m1 + m2 − 1)α and λ2 = 1 − α,
just as expected for eigenvalues of M since now M 0 = M . Here we note that λ2 coincides with the
other eigenvalues arising from the factor (1 − λ − α1 )m1 −1 (1 − λ − α2 )m2 −1 in the characteristic
polynomial (11).
When layer division takes place to increase the layer number, we may have α1 = α2 ≡ α 6= β so
that equation (14) is perturbed to

µ2 + (m − 2)αµ + (m1 − 1)(m2 − 1)α2 − β 2 m1 m2 = 0

(15)

with roots denoted by λ01,2 , where we recall m = m1 + m2 . By trace consideration of M 0 or the

properties of quadratic equations, we see that now λ01 + λ02 = λ1 + λ2 = 2 + (m − 2)α.
What this means is that in increasing the layer number by perturbing β = α in M to β 1 in
M 0 (as observed in the computed model of C), we have decreased the product of the roots λ1 λ2 to
λ01 λ02 while their sum must be kept the same. Intuitively and conclusively, this tells us more layers
in our model result in greater abundance of very large eigenvalues like λ01 and positive eigenvalues
λ02 really close to zero.
Indeed, in Figure 6 where we compare the predicted empirical spectral density functions of a 10-layer
correlation model and a 148-layer correlation model, we see that the latter is a closer match for the
small positive eigenvalues. In fact, the latter is an excellent match with the observed eigenvalues,
the best we have achieved!

1.2 1.2
1 1
normalised eigenvalue density
normalised eigenvalue density

1 1
0.5 0.5

0.8 0.8
0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
0.6 0.6

0.4 0.4
histogram of observed eigenvalues histogram of observed eigenvalues
Marčenko-Pastur distribution Marčenko-Pastur distribution
0.2 0.2
simulated analytic prediction simulated analytic prediction

0 0
0 5 10 15 0 5 10 15
eigenvalue eigenvalue

(a) 10 layers (b) 148 layers

Figure 6: Comparison of the predicted empirical spectral density functions of a 10-layer correlation
model and a 148-layer one with the market mode.

9 Alternative Model with Prior Sectoring

We saw in Section 6 that the new classification of sectors was not robust, whereas in Section 4 we
saw at the market mode level the distinction between pre-assigned sectors was already obvious. We
wonder if this prior information on sectoring could be incorporate in our model. Here we propose
the following construction based on the observations.
We will assume the pre-assigned sectors have zero (little) correlations, whereas inside each sector

12
1.2
1

normalised eigenvalue density

1
0.5

0.8
0
0 0.5 1 1.5 2 2.5 3
0.6

0.4 histogram of observed eigenvalues

Marčenko-Pastur distribution
0.2 original proposed model
alternative proposed model
0
0 5 10 15
eigenvalue

Figure 7: Predicted empirical spectral density functions of the alternative model with prior sectoring,
superposed with the M-P law and the previous analytic prediction.

the stocks have equal mutual correlations. The underlying correlation matrix then takes the block-
diagonal form  
M1

 M2 

 . . 
 . 
Ms
where Mi ≡ Mmi as defined in equation (10), and s = 10 is the number of pre-assigned sectors (as
in Figure 2). The data are processed without the removal of the market mode.
We repeat the procedure as in Section 7.2 to plot the predicted empirical spectrum of the correlation
matrix in Figure 7.
We see that this model, even without detailed layer division, is a reasonable fit to the observed eigen-
value density, but it resembles in shape more of the M-P distribution than the observed distribution,
or the previous 148-layer model.

10 Summary and Further Developments

Through mode and clustering analyses, we have been able to construct a multi-layer structured
correlation model for the S&P 500 stock market. By analysing the dependence of the spectrum of
our model on the layer depth, we have shown analytically that increasing the number of layers
improves the match between (simulated) prediction of the empirical spectral density function with
observed eigenvalues of the empirical correlation matrix.
This results in a reliable estimation for the underlying correlation structure of the market analysed,
which may then have a positive impact on investment portfolios.
However, our study of the stock market correlations can be further developed by considering:
1) edge asymptotics;
Our correlation matrix is finite-dimensional, which means eigenvalues are expected to leak out of
the edges of the predicted distribution of the empirical correlation matrix spectrum, which is derived

13
in the asymptotic limit. This leakage effect could be studied using the Tracy-Widom law.
2) time evolution;
The underlying stock market data have been assumed to have a stationary distribution in time, and
this is unlikely a good assumption. We need to build the time parameter/variable t into our model.
3) fine-tuning.
We have performed the hierarchical clustering analysis with average linkage to build a binary tree
structure. The correlation model can be improved with tailored clustering method suited for the
stock market data.

Acknowledgements

Many thanks goes to my project supervisor, Dr Lucy Colwell, and her PhD student Chongli Qin at
the Department of Chemistry, University of Cambridge, whose guidance and help have been crucial
to this research project. I am also grateful for the generous support by the Bridgwater Scheme.

References
[1] S. Pafka, M. Potters, and I. Kondor. Exponential Weighting and Random-Matrix-Theory-Based
Filtering of Financial Covariance Matrices for Portfolio Optimization. arXiv:cond-mat/0402573,
February 2004.

[2] S. Wang. SUROP Project. http://sw664.user.srcf.net/SUROP%20Project%202016/SUROP.html,

2016. [On-line, updated and accessed 25/09/16].

[3] N. C. Snaith, P. J. Forrester, and J. J. M. Verbaarschot. Developments in Random Matrix Theory.

arXiv:cond-mat/0303207, March 2003.

[4] V. A. Marčenko and L. A. Pastur. Distribution of eigenvalues in certain sets of random matrices.
Mathematics of the USSR-Sbornik, 1(4), 1967.

Bouchaud Potters Random Matrix Finance
No ratings yet
Bouchaud Potters Random Matrix Finance
23 pages
Laloux Et Al - RANDOM MATRIX THEORY AND FINANCIAL CORRELATIONS
No ratings yet
Laloux Et Al - RANDOM MATRIX THEORY AND FINANCIAL CORRELATIONS
7 pages
Coronnello Et Al. - 2005 - Sector Identification in A Set of Stock Return Time Series Traded at The London Stock Exchange (2) - Annotated
No ratings yet
Coronnello Et Al. - 2005 - Sector Identification in A Set of Stock Return Time Series Traded at The London Stock Exchange (2) - Annotated
27 pages
Lecture 1 Quant
No ratings yet
Lecture 1 Quant
29 pages
Quantifying and Interpreting Collective Behavior in Financial Markets
100% (1)
Quantifying and Interpreting Collective Behavior in Financial Markets
4 pages
SemyonLecture1 New Template
No ratings yet
SemyonLecture1 New Template
135 pages
Financial Random Matrix Theory
No ratings yet
Financial Random Matrix Theory
18 pages
Eigenvector Stability in Finance
No ratings yet
Eigenvector Stability in Finance
31 pages
Spectral Theory and Application
No ratings yet
Spectral Theory and Application
230 pages
Machine Learning Asset Allocation
No ratings yet
Machine Learning Asset Allocation
35 pages
Random Matrix Theory Innovative
No ratings yet
Random Matrix Theory Innovative
29 pages
An Ornstein-Uhlenbeck Framework For Pairs Trading
No ratings yet
An Ornstein-Uhlenbeck Framework For Pairs Trading
58 pages
Predictive Intraday Correlations in Stable and Volatile Market Environments: Evidence From Deep Learning
No ratings yet
Predictive Intraday Correlations in Stable and Volatile Market Environments: Evidence From Deep Learning
15 pages
Financial Econometrics: Instructor Sergio Focardi PHD Tel: + 33 (0) 4 9318 7820 Email: Sergio - Focardi@Edhec - Edu
No ratings yet
Financial Econometrics: Instructor Sergio Focardi PHD Tel: + 33 (0) 4 9318 7820 Email: Sergio - Focardi@Edhec - Edu
55 pages
Tests of The Efficient Markets Hypothesis
No ratings yet
Tests of The Efficient Markets Hypothesis
22 pages
Ties, Tails and Spectra: On Rank-Based Dependency Measures in High Dimensions
No ratings yet
Ties, Tails and Spectra: On Rank-Based Dependency Measures in High Dimensions
28 pages
Hierarchical Structure in Financial Markets
No ratings yet
Hierarchical Structure in Financial Markets
18 pages
Pone 0097711
No ratings yet
Pone 0097711
16 pages
Statistical Arbitrage Risk Premium by Machine Learning: Raymond C. W. Leung and Yu-Man Tam
No ratings yet
Statistical Arbitrage Risk Premium by Machine Learning: Raymond C. W. Leung and Yu-Man Tam
46 pages
EMH
No ratings yet
EMH
62 pages
Ding Et Al 1993 PDF
No ratings yet
Ding Et Al 1993 PDF
24 pages
A Long Memory Property of Stock Returns and A New Model (Ding, Granger and Engle)
No ratings yet
A Long Memory Property of Stock Returns and A New Model (Ding, Granger and Engle)
24 pages
Robust Estimation of Risk Factor Model Covariance Matrix
No ratings yet
Robust Estimation of Risk Factor Model Covariance Matrix
5 pages
Cleaning Correlation Matrices
No ratings yet
Cleaning Correlation Matrices
6 pages
Lec 3
No ratings yet
Lec 3
38 pages
Thesis Axel Gustavsson
No ratings yet
Thesis Axel Gustavsson
44 pages
Forecasting The Correlation Structure of German Stock Returns - A Test of Firm-Specific Factor Models
No ratings yet
Forecasting The Correlation Structure of German Stock Returns - A Test of Firm-Specific Factor Models
18 pages
Properties and Significance
No ratings yet
Properties and Significance
19 pages
hw1 Wenbo Zhang
No ratings yet
hw1 Wenbo Zhang
13 pages
Eigenvector Overlaps of Random Covariance Matrices
No ratings yet
Eigenvector Overlaps of Random Covariance Matrices
32 pages
경제 논문 (서울대)
No ratings yet
경제 논문 (서울대)
36 pages
Random Matrix Theory: Manjunath Krishnapur Indian Institute of Science, Bangalore
No ratings yet
Random Matrix Theory: Manjunath Krishnapur Indian Institute of Science, Bangalore
77 pages
SSRN Id4560455
No ratings yet
SSRN Id4560455
21 pages
Random Walk Theory in Stock Prices
No ratings yet
Random Walk Theory in Stock Prices
6 pages
C6 Gomez
No ratings yet
C6 Gomez
186 pages
Ising Model As A Model of Multi-Agent Based Financial Market
No ratings yet
Ising Model As A Model of Multi-Agent Based Financial Market
11 pages
Eigenvalues & Eigenvectors Explained
No ratings yet
Eigenvalues & Eigenvectors Explained
20 pages
Marc Potters - A First Course in Random Matrix Theory - For Physicists, Engineers and Data Scientists-Cambridge University Press (2020)
No ratings yet
Marc Potters - A First Course in Random Matrix Theory - For Physicists, Engineers and Data Scientists-Cambridge University Press (2020)
371 pages
Matrix GARCH Model: Inference and Application: Cheng Yu, Dong Li, Feiyu Jiang, and Ke Zhu
No ratings yet
Matrix GARCH Model: Inference and Application: Cheng Yu, Dong Li, Feiyu Jiang, and Ke Zhu
32 pages
Principal Portfolios
No ratings yet
Principal Portfolios
98 pages
Notas de Clase
No ratings yet
Notas de Clase
61 pages
Chaos and The Stock Market
No ratings yet
Chaos and The Stock Market
50 pages
Time Series Analysis: Gibbs Measures
No ratings yet
Time Series Analysis: Gibbs Measures
75 pages
Anirban CMI StatFin 2019 I
No ratings yet
Anirban CMI StatFin 2019 I
78 pages
Portfolio Risk and The Quantum Majorization of Correlation Matrices
No ratings yet
Portfolio Risk and The Quantum Majorization of Correlation Matrices
22 pages
Financial Time Series Analysis
No ratings yet
Financial Time Series Analysis
13 pages
Random Walks in Stock Market Prices
0% (1)
Random Walks in Stock Market Prices
19 pages
Intensive 2011 1 10 30007
No ratings yet
Intensive 2011 1 10 30007
9 pages
Industry Equi-Correlation - A Powerful Predictor of Stock Returns
No ratings yet
Industry Equi-Correlation - A Powerful Predictor of Stock Returns
60 pages
Empirical Methods in Finance, Part II
No ratings yet
Empirical Methods in Finance, Part II
75 pages
Two Pillars of Asset Pricing
No ratings yet
Two Pillars of Asset Pricing
20 pages
9.3 Correlation and Covariation
No ratings yet
9.3 Correlation and Covariation
15 pages
Large Covariance and Autocovariance Matrices 1st Edition Arup Bose Download
No ratings yet
Large Covariance and Autocovariance Matrices 1st Edition Arup Bose Download
90 pages
002 LS 2 Spectral Theory
No ratings yet
002 LS 2 Spectral Theory
17 pages
Random Matrix Theory Insights
No ratings yet
Random Matrix Theory Insights
33 pages
SSRN Id2802753
No ratings yet
SSRN Id2802753
44 pages
Hese de Doctorat De: Hai Dang NGO
No ratings yet
Hese de Doctorat De: Hai Dang NGO
206 pages
Fo MM3 Sli 17
No ratings yet
Fo MM3 Sli 17
90 pages
Algorithmic Trading Strategies
No ratings yet
Algorithmic Trading Strategies
21 pages
Alvaro's Slides - Lecture 1
No ratings yet
Alvaro's Slides - Lecture 1
41 pages
Interest Rate Swaps for Finance Students
No ratings yet
Interest Rate Swaps for Finance Students
23 pages
U.S. Treasury Market & Monetary Policy
No ratings yet
U.S. Treasury Market & Monetary Policy
27 pages
05.4 PP 43 57 Wishart Ensemble and MarcenkoPastur Distribution
No ratings yet
05.4 PP 43 57 Wishart Ensemble and MarcenkoPastur Distribution
15 pages
Slides Curve
No ratings yet
Slides Curve
22 pages
Rudin
No ratings yet
Rudin
26 pages
Kings College Jan2020
No ratings yet
Kings College Jan2020
75 pages
Further Maths
No ratings yet
Further Maths
2 pages
Grade 11 Holiday Revision 2024 Term 1
No ratings yet
Grade 11 Holiday Revision 2024 Term 1
20 pages
Individual Competition Sept. 20, 2014 English Version: Problem I-1
No ratings yet
Individual Competition Sept. 20, 2014 English Version: Problem I-1
3 pages
CBSE Class 10 Quadratic Equation
No ratings yet
CBSE Class 10 Quadratic Equation
5 pages
M001 Midterm
No ratings yet
M001 Midterm
9 pages
Advanced Differential Equations
No ratings yet
Advanced Differential Equations
20 pages
JEE 2023 Mathematics Class Test
No ratings yet
JEE 2023 Mathematics Class Test
9 pages
Trigonometry Handbook
No ratings yet
Trigonometry Handbook
114 pages
Spectral Theory and Mathematical Physics Pablo Miranda, Nicolas Popoff, Georgi Raikov
100% (1)
Spectral Theory and Mathematical Physics Pablo Miranda, Nicolas Popoff, Georgi Raikov
277 pages
QP - PS - CBSE - X - Math - 3.pair of Linear Equations in Two Variables
No ratings yet
QP - PS - CBSE - X - Math - 3.pair of Linear Equations in Two Variables
6 pages
Analysis of Variance and Design of Experiment 1 by DR Shalabh
No ratings yet
Analysis of Variance and Design of Experiment 1 by DR Shalabh
382 pages
1 Introduction To LP: F (X) S.T. X S
No ratings yet
1 Introduction To LP: F (X) S.T. X S
7 pages
A Performance Task in
No ratings yet
A Performance Task in
8 pages
MAT 171 Final Reviewer
No ratings yet
MAT 171 Final Reviewer
54 pages
Fourier Series Two Marks Questions
100% (1)
Fourier Series Two Marks Questions
3 pages
TIMO Practice (Easy)
No ratings yet
TIMO Practice (Easy)
9 pages
Advanced Number Theory Exam
No ratings yet
Advanced Number Theory Exam
3 pages
Lesson 17: Vectors in The Coordinate Plane: Student Outcomes
No ratings yet
Lesson 17: Vectors in The Coordinate Plane: Student Outcomes
18 pages
Solomon Press C2F
No ratings yet
Solomon Press C2F
18 pages
Simultaneous, Quadrilaterals and Inequalities
No ratings yet
Simultaneous, Quadrilaterals and Inequalities
1 page
A First Course in Topology Continuity and Dimension Student Mathematical Library
100% (3)
A First Course in Topology Continuity and Dimension Student Mathematical Library
141 pages
Engineering Graphics: Plane Curves
No ratings yet
Engineering Graphics: Plane Curves
1 page
Third Periodical Test Mathematics
0% (1)
Third Periodical Test Mathematics
6 pages
Engineering Math III Syllabus
No ratings yet
Engineering Math III Syllabus
2 pages
1st Summative Test in Math 2nd QTR
No ratings yet
1st Summative Test in Math 2nd QTR
1 page
Ch.3 Methods in Calculus QP
No ratings yet
Ch.3 Methods in Calculus QP
7 pages
Math Syllabus for Undergrads
No ratings yet
Math Syllabus for Undergrads
6 pages
FRTN10 Multivariable Control
No ratings yet
FRTN10 Multivariable Control
38 pages
Ieee Floating-Point Decimal Number
No ratings yet
Ieee Floating-Point Decimal Number
12 pages
Differential Equations Guide
No ratings yet
Differential Equations Guide
5 pages

SUROP Report

Uploaded by

SUROP Report

Uploaded by

Seeking Gold in Sand: financial applications of Random

Matrix Theory in stock market data

Mike Shengbo Wang∗

Faculty of Mathematics & Department of Chemistry, University of Cambridge

Covariance-correlation matrix estimation is central to modern portfolio theory; in this project,

3 Random Matrix Theory: The Marčenko-Pastur Law

3.1 Statement of the Marčenko-Pastur law

normalised eigenvalue density

3.2 Interpretation of the Marčenko-Pastur law

4.1 Digression: uniformity of the market mode

5 Hierarchical Clustering Analysis

(a) Dendrogram (b) Minimum spanning tree

where |·| is the order of a cluster set.

6 Re-classification of Market Sectors

7 A Multi-layer Structured Correlation Model and Its Predic-

7.1 The construction of the correlation model

normalised eigenvalue density

(a) Heat-map (b) Simulated analytic prediction

7.2 New predictions based on the model

must satisfy the integral equation

8.1 A fundamental structure of the proposed model

8.2 Determining the characteristic equations of matrices (10)

The matrix Mn (x, y) can be reduced to a lower-triangular form by elementary operations:

Hence the original diagonal sub-block M has an eigenvalue 1 − α of multiplicity m − 1 and a

for λ 6= 1 − α2 , 1 + (m2 − 1)α2 not an eigenvalue of M2 . Therefore

Hence we see that for α1 6= α2 , by symmetry 1 ↔ 2, the characteristic equation of M 0 must be of

Proof. By direct verification.

q(µ) = µ2 +(m1 α1 +m2 α2 −α1 −α2 )µ+ m1 m2 α1 α2 − (m1 + m2 − 1)α1 α2 − β 2 m1 m2 = 0. (14)

µ2 + (m − 2)αµ + (m1 − 1)(m2 − 1)α2 − β 2 m1 m2 = 0

with roots denoted by λ01,2 , where we recall m = m1 + m2 . By trace consideration of M 0 or the

(a) 10 layers (b) 148 layers

9 Alternative Model with Prior Sectoring

normalised eigenvalue density

0.4 histogram of observed eigenvalues

10 Summary and Further Developments

[2] S. Wang. SUROP Project. http://sw664.user.srcf.net/SUROP%20Project%202016/SUROP.html,

[3] N. C. Snaith, P. J. Forrester, and J. J. M. Verbaarschot. Developments in Random Matrix Theory.

You might also like