Multivariate Time Series Models
Asad Dossani
Fin 625: Quantitative Methods in Finance
1 / 28
Multivariate Time Series Models
Multivariate time series models generalize univariate time series mod-
els to allow for the interaction between multiple variables. The pri-
mary model we use is the vector autoregressive model, or vector
autoregressions (VAR). In a VAR, each variable depends on its own
lags, and lags of the other variables. VARs are used to determine
how one variable affects and forecasts another over time.
2 / 28
Stationarity and Autocorrelation Matrices
Suppose yt is a vector of d time series. yt is stationary if all first
and second moments are time invariant. The matrix valued function
Γ(.) is the cross covariance function.
yt = (yt1 , . . . , ytd )0
µ ≡ E(yt )
Γ(k) ≡ E[(yt+k − µ)(yt − µ)0 ]
3 / 28
Cross Covariance Function
γij (k) is the cross covariance between the i th and j th component at
lag k for i 6= j. It is the autocovariance for j th component when
i = j. Γ(k) not symmetric unless k = 0.
γij (k) = Cov(yt+k,i , ytj )
γjj (k) = γjj (−k)
γij (k) 6= γij (−k) for i 6= j and k = 1, 2, . . .
γij (k) = γji (−k)
4 / 28
Cross Correlation Matrix
Let D −1/2 be a diagonal matrix with γjj (0)−1/2 , the reciprocal of
the standard deviation of the j th series, as its j th main diagonal
element. R (k) is the cross correlation matrix, and ρij (k) is the
cross correlation coefficient
R (k) = D −1/2 Γ(k)D −1/2
ρij (k) = Corr(yt+k,i , ytj )
5 / 28
Cross Correlation Matrix
Autocorrelation coefficients are symmetric, but the cross correlation
coefficient is not necessarily symmetric.
ρjj (k) = ρjj (−k)
ρij (k) 6= ρij (−k) for i 6= j and k = 1, 2, . . .
ρij (k) = ρji (−k)
6 / 28
Sample Cross Covariance/Correlation Matrices
With available observations y1 , . . . , yT , a natural estimator of the
cross covariance matrix is the sample cross covariance matrix.
T −k
(yt+k − µ̂)(yt − µ̂)0
1 X
Γ̂(k) =
T
t=1
T
µ̂ =
1 X
T
yt
t=1
7 / 28
Sample Cross Covariance/Correlation Matrices
Similarly, a natural estimator of the cross correlation matrix is the
sample cross correlation matrix.
R̂ (k) = D̂ −1/2 Γ̂(k)D̂ −1/2
D̂ = diag(γ̂11 (0), . . . , γ̂dd (0))
8 / 28
Vector White Noise
A vector noise process has no serial correlation across all the compo-
nents of ut . However, different components of ut may be contem-
poraneously correlated, as Σu is not necessarily a diagonal matrix.
ut ∼ WN(a, Σu )
E(ut ) = a
Var(ut ) = Σu
Cov(ut , us ) = 0 for all t 6= s
9 / 28
Vector Autoregressive Models: VAR(1)
We define a d-vector autoregressive model with order 1. Each com-
ponent of yt is a linear combination of its lagged values and the
lagged values of other components. The concurrent linear relation-
ship among the different components of yt is reflected by the non-
zero off diagonal elements in Σu . yt , c , ut , and 0 are (d x 1) vectors
and B is a (d x d) matrix.
yt ∼ VAR(1)
yt = c + Byt−1 + ut
ut ∼ WN(0, Σu )
10 / 28
Bivariate VAR(1)
yt = c + Byt−1 + ut
y1,t c1 B11 B12 y1,t−1 u
= + + 1,t
y2,t c2 B21 B22 y2,t−1 u2,t
y1,t = c1 + B11 y1,t−1 + B12 y2,t−1 + u1,t
y2,t = c2 + B21 y1,t−1 + B22 y2,t−1 + u2,t
If B12 = 0 and B21 = 0, then both y1,t and y2,t are AR(1) processes.
11 / 28
Expected Value of yt
If yt is stationary, the expected value of yt is given by:
E(yt ) = E(c + Byt−1 + ut )
E(yt ) = c + B E(yt−1 )
E(yt ) − B E(yt ) = c
(I − B )E(yt ) = c
E(yt ) = (I − B )−1 c
12 / 28
Bivariate VAR(1)
Suppose yt follows a bivariate VAR(1) process, where yt = (y1,t , y2,t )0 :
yt = c + Ayt−1 + ut
ut ∼ WN(0, Σu )
c = 0.1
0.2
A = 0.5
0
0.25
0.5
Compute the expected value of the series, i.e. E(yt ).
13 / 28
Bivariate VAR(1)
E(yt ) = (I − A)−1 c
!−1
1 0 0.5 0.25 0.1
= −
0 1 0 0.5 0.2
!−1
0.5 −0.25 0.1
=
0 0.5 0.2
!
1 0.5 0.25 0.1
=
(0.5)(0.5) − (−0.25)(0) 0 0.5 0.2
!
2 1 0.1
=
0 2 0.2
0.4
=
0.4
14 / 28
Vector Autoregressive Models: VAR(p)
We similarly define a d-vector autoregressive model with order p.
yt , c , ut , and 0 are (d x 1) vectors. B1 , . . . , Bp , and Σu are (d x d)
matrices.
yt ∼ VAR(p)
yt = c + B1 yt−1 + · · · + Bp yt−p + ut
ut ∼ WN(0, Σu )
E(yt ) = (I − B1 − · · · − Bp )−1 c
15 / 28
Forecasting using a VAR(1)
yt+1 = c + Byt + ut+1
Et (yt+1 ) = Et (c + Byt + ut+1 )
Et (yt+1 ) = c + Byt
yt+2 = c + Byt+1 + ut+2
Et (yt+2 ) = Et (c + Byt+1 + ut+2 )
Et (yt+2 ) = c + B Et (yt+1 )
Et (yt+2 ) = c + B (c + Byt )
Et (yt+2 ) = c + Bc + B 2 yt
16 / 28
Forecasting using a VAR(1)
yt+k = c + Byt+k−1 + ut+k
k−1
Et (yt+k ) = c + B j c + B k yt
X
j=1
k−1
Et (yt+k ) = c + Bj c
X
as k → ∞
j=1
Et (yt+k ) = (I − B )−1 c as k → ∞
17 / 28
Forecasting using a VAR(1)
Suppose yt follows a bivariate VAR(1) process, where yt = (y1,t , y2,t )0 :
yt = c + Ayt−1 + ut
ut ∼ WN(0, Σu )
c = 0.1
0.2
A = 0 0.5
0.5 0.25
Suppose yt = (1, 0)0 . Compute the one period ahead and two period
ahead forecasts, i.e. Et (yt+1 ) and Et (yt+2 ).
18 / 28
Forecasting using a VAR(1)
Et (yt+1 ) = c + Ayt
0.1 0.5 0.25 1 0.6
= + =
0.2 0 0.5 0 0.2
Et (yt+2 ) = c + AEt (yt+1 )
0.1 0.5 0.25 0.6 0.45
= + =
0.2 0 0.5 0.2 0.3
19 / 28
Information Criteria
We can use multivariate information criteria to select the order p
of the VAR. We choose p such that one of the information criteria
is minimized. Let Σ̂u denote the variance covariance matrix of the
residuals, T denote the sample size, and k 0 = p 2 k + p, the total
number of regressors in all equations.
2k 0
MAIC = log(|Σ̂u |) +
T
k0
MBIC = log(|Σ̂u |) + ln T
T
2k 0
MHQIC = log(|Σ̂u |) + ln(ln T )
T
20 / 28
Granger Causality
Time series y1,t is said to Granger cause time series y2,t if lags of
y1,t are useful for forecasting y2,t , after controlling for lags of y2,t .
E(y2,t |y1,t−1 , y2,t−1 , y1,t−2 , y2,t−2 , . . . ) 6= E(y2,t |y2,t−1 , y2,t−2 , . . . )
21 / 28
Granger Causality
VARs can be used to test for Granger causality. Suppose we are
interested in testing whether y1,t Granger causes y2,t , and we esti-
mate a bivariate VAR(1):
yt = c + Byt−1 + ut
y1,t = c1 + B11 y1,t−1 + B12 y2,t−1 + u1,t
y2,t = c2 + B21 y1,t−1 + B22 y2,t−1 + u2,t
We can use a F-test to test the restriction that B21 = 0. If we reject
the null hypothesis, we conclude that y1,t Granger causes y2,t .
22 / 28
Granger Causality
Suppose we are interested in testing whether y1,t Granger causes
y2,t , and we now estimate a bivariate VAR(p):
yt = c + B1 yt−1 + · · · + Bp yt−p + ut
Testing for Granger causality is a test of the following restriction:
B1,21 = B2,21 = · · · = Bp,21 = 0
23 / 28
Granger Causality
To test for Granger causality, we first estimate the unrestricted VAR
and compute the residual sum of squares, RSS. Next, we estimate
a restricted VAR, and compute the residual sum of squares, RSSr .
Under the null hypothesis:
(RSSr − RSS)/p
F = ∼ F (p, 2T − 4p − 2)
RSS/(2T − 4p − 2)
If the test statistic is greater than the critical value, we reject the
null hypothesis of no Granger causality.
24 / 28
Impulse Response Functions
Impulse response functions measure the resulting changes in other
components at different time lags due to a unit change in one com-
ponent series. For a VAR(p), we write the MA(∞) representation.
The (i, j) element of Ak is the impulse response of yt+k,i , the i th
component at the k units of time ahead, from one unit of extra
shock in the j th component of yt .
∞
yt = c + ut + Ak ut−k
X
k=1
25 / 28
Impulse Response Functions
In practice, the components of ut are not independent, and a change
in one component of yt is typically associated with some changes in
other components. Thus, it is not possible to define the responses
with respect to a single component of yt .
26 / 28
Impulse Response Functions
We can apply the following transformation to alleviate correlation
among components of ut . Now, Ψ0 , Ψ1 , . . . are the impulse re-
sponse functions.
ut = Ψ0 t
Ψ0 Ψ00 = Σu
Var(t ) = I
∞
yt = c + Ψ0 t +
X
Ψk t−k
k=1
27 / 28
Impulse Response Functions
The definition of t is not unique. We generally assume that Ψ0 is
lower triangular, also known as the Cholesky decomposition. This
assumes that the first variable is contemporaneously independent of
the second variable. The second variable is in turn contemporane-
ously independent of the third variable, and so on.
28 / 28