Block 2
Block 2
Block 2
3 TS & forecasting
Static models
yt = β1 + β2 xt + ut , t = 1, 2, . . . , n
Dynamic models
Finite distributed Lag (FDL) model:
yt = α0 + β0 xt + β1 xt−1 + β2 xt−2 + ut
Infinite distributed lag (IDL) model:
yt = α0 + β0 xt + β1 xt−1 + β2 xt−2 + · · · + ut
Dynamic models: lag order
For convenience, β subscripts follow lag order,
Impact / lagged / long-run multiplier,
Effect of temporary (one-off) × permanent increase in x,
Lag distribution (function).
G-M assumptions for TSRMs
TS.1 Linearity
The stochastic process {(xt1 , xt2 , . . . , xtk , yt ); t = 1, 2, . . . , n}
follows a linear model yt = β1 + β2 xt2 + · · · + βK xtK + ut .
corr(ut , us |X) = 0, t 6= s
TS.6 Normality
ut are independent of X and i.i.d. : ut ∼ N (0, σ 2 ), i.i.d.
Regression: y on t
If there is a linear trend in y
Regression: log y on t
Exponential trend, constant rate of growth of y
Spurious regression:
We can find relationship between two or more trending variables
even if it does not exist in reality (non-stationarity and
cointegration topics discussed next).
Detrending and deseasonalizing
Coefficients βˆ2 , βˆ3 from this regression are the same as in the
original regression
Coefficient of determination when y is trending
2 σ̂u2
R =1−
σ̂y2
yˆt = βˆ1 + βˆ2 xt2 + βˆ3 xt3 + γˆ1 dummy1 + γˆ2 dummy2 + γˆ3 dummy3
Coefficients βˆ2 , βˆ3 from this regression are the same as in the original
regression
Stationary and weakly dependent time series
yt = et + α1 et−1 ,
TS.1’ Linearity
The stochastic process {(xt1 , xt2 , . . . , xtk , yt ); t = 1, 2, . . . , n}
follows the linear model yt = β0 + β1 xt1 + · · · + βk xtk + ut
We assume both dependent and independent variables are
stationary and weakly dependent.
corr(ut , us |xt , xs ) = 0, t 6= s
Asymptotic properties of OLS estimators
Under assumptions TS.1’, TS.2’ and TS.3’,
Causes:
DGP, dynamic incompleteness of models
ρ estimation:
yt = β0 + β1 xt1 + · · · + βk xtk + ut
ut = ρut−1 + et
we estimate ρ from:
ût = ρût−1 + error
H0 : ρ = 0
Durbin-Watson test:
n
P 2
ût − ût−1
t=2
d= n
û2t
P
t=1
.
small sample validity (conditions apply)
d-statistic: symmetric distribution h0, 4i , E(d) = 2
d ≈ 2(1 − ρ̂) i.e. ρ̂ ≈ 1 − d/2. Test for: H0 : ρ = 0.
Serial correlation
Testing AR(1):
H0 : ρ = 0
We can use a heteroscedasticity-robust version of the t-test.
H0 : ρ 1 = · · · = ρ q = 0
yt = x0t β + ut , ut = ρut−1 + εt ,
we can substitute the autoregressive process into main LRM:
yt = x0t β + ut , ut = ρut−1 + εt ,
(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt .
Hildreth-Lu:
yt = x0t β + ut , ut = ρut−1 + εt ,
(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt .
Cochrane-Orctutt:
yt = x0t β + ut , ut = ρut−1 + εt ,
(yt − ρyt−1 ) = (x0t − ρx0t−1 )β + εt .
Prais-Winsten:
yt = β1 + β2 xt + ut , ut = ρut−1 + εt ,
p
y1 — y1 1 − ρ2
y2
y2 − ρy1
y2 − ρy1
where y = y3 ,
∗
yCO =
y3 − ρy2
,
∗
yPW = y3 − ρy2 ,
.. .. ..
. . .
yn yn − ρyn−1 yn − ρyn−1
Stationarity
Cointegration
Weakly and strongly dependent TS
E(yt ) = E(y0 )
var(yt ) = σe2 t
p
cor(yt , yt+h ) = t/(t + h)
1
y
0 10 20 30 40 50
x
Strongly dependent TS
yt = α0 + yt−1 + et ⇒ yt = α0 t + et + et−1 + · · · + e1 + y0
E(yt ) = α0 t + E(y0 )
var(yt ) = σe2 t
p
cor(yt , yt+h ) = t/(t + h)
40
30
yd
20
10
0 10 20 30 40 50
x
yt = 1 · yt−1 + ut = yt−1 + ut
Informal procedures
Analyze autocorrelation of the first order
ρ̂ = corr(y
ˆ t , yt−1 )
TS with unit root can manifest various levels of complexity. Hence, DF test
for a given yt TS is usually performed using the following specifications:
∆yt = θyt−1 + et
random walk
∆yt = α + θyt−1 + et random walk with a drift
∆yt = α + θyt−1 + δt + et random walk with a drift and trend
DF test is the same (H0 : θ = 0) for all specifications /critical values difffer/
Augmented Dickey-Fuller (ADF) test is a common generalization of DF test
(example: Augmentation of the DF test for the 2nd specification)
1 type "none"
∆yt = θyt−1 + et
tau1: we test for H0 : θ = 0 (unit root)
2 type "drift"
∆yt = α + θyt−1 + et
tau2: H0 : θ = 0 (unit root)
phi1: H0 : θ = α = 0 (unit root and no drift)
3 type "trend"
∆yt = α + θyt−1 + δt + et
tau3: H0 : θ = 0 (unit root)
phi2: H0 : θ = α = δ = 0 (unit root, no drift, no trend) phi3:
H0 : θ = δ = 0 (unit root and no trend)
Multiple other unit root tests exist:
(KPSS, tests for seasonal data, break in the DGP, etc.).
Unit root tests
The critical values in the ADF distribution with time trend are
even more negative as compared to random walk and random
walk with a drift.
Terminology:
Stochastic trend: θ = 0
Also called difference-stationary process: yt can be
turned into I(0) series by differencing. Terminology
emphasizes stationarity after differencing yt instead of weak
dependence in differenced TS.
Deterministic trend: δ 6= 0, θ < 0
Also called trend-stationary process: has a linear trend,
not a unit root. yt is weakly dependent - I(0) - around its
trend. We can use such series in LRMs, if trend is also used
as regressor.
Lxt = xt−1
L(Lxt ) = L2 xt = xt−2
···
p
L xt = xt−p
(1 − φ1 L − φ2 L2 − · · · − φp Lp )xt = α + ut
Stationarity in ar(p) processes
(1 − φ1 L − φ2 L2 − · · · − φp Lp )xt = α + ut (1)
Stochastic process (1) will only be stationary if the roots of
corresponding equation (2) are all greater than unity in absolute value
1 − φ1 L − φ2 L2 − · · · − φp Lp = 0 (2)
1 − φL = 0
L = 1/φ
For (3) to be stationary, |L| > 1 ↔ −1 < φ < 1
Stationarity in ar(p) processes
Cointegrated TS – motivation
Some properties of integrated processes
yt = α + βxt + et , et ∼ I(0)
et := yt − α − βxt ,
y ◦ = β0 + β1 y ◦ + (γ0 + γ1 )x◦
β0
y◦ = + λx◦ ,
1 − β1
γ0 + γ1
λ≡ , |β1 | < 1
1 − β1
yt = β0 + β1 yt−1 + γ0 xt + γ1 xt−1 + ut ,
Superconsistency: yt = β0 + β1 xt + ut
1 Provided xt and yt are cointegrated, the OLS estimators β̂0 and
β̂1 will be consistent.
2 β̂j converge in probability to their true values βj more quickly in
the cointegrated non-stationary case than in the stationary case
(asymptotic efficiency).
Consequences:
For simple static regression between two cointegrated variables:
yt , xt ∼ C(1, 1), super-consistency applies (with deterministic
regressors such as intercept and trend added upon relevance).
Dynamic misspecifications do not necessarily have serious
consequences. This is a large sample property - in small samples, OLS
estimators are biased.
(Specific statistical inference applies to cointegrating vectors.)
ECM: non-stationary & cointegrated series
1
Engle and Granger (1987)
ECM: non-stationary & cointegrated series
2nd stage: Use residuals ût−1 in (9) instead of ut−1 and estimate by
OLS
Chow tests
cumgpa = β1 + γ1 female
+ β2 sat + γ2 (female×sat)
+ β3 hsperc + γ3 (female×hsperc)
+ β4 tothrs + γ4 (female×tothrs) + u
Chow test - CS-based example (contd.)
Null hypothesis H0 : γ1 = γ2 = γ3 = γ4 = 0
F -statistic:
(SSRr − SSRur )/K (85.515 − 78.355)/4
F = = ≈ 8.18
SSRur /(n − 2K) 78.355/(366 − 8)
where
SSRur = SSR T1 + SSR T2
SSRr = SSR T
K is the number of parameters (including intercept) in LRM
SSRr −SSRur T1 −K
F = SSRur · T2 ∼ F ( T2 , T1 −K )
H0
where
SSRur = SSR T1 (from LRM estimated for “base” period)
SSRr = SSR T (from LRM estimated for the whole period)
K is the number of parameters (including intercept) in LRM
One-step-ahead forecast: ft
ft is the forecast of yt+1 made at time t
Forecast error et+1 = yt+1 − ft
Information set: It
Loss function: e2t+1 or |et+1 |
In forecasting, we minimize E(e2t+1 |It ) = E[(yt+1 − ft )2 |It ]
Solution: E(yt+1 |It )
3 Regression models
Static model: yt = β0 + β1 xt + ut
E(yt+1 |It ) = β0 + β1 xt+1 → Conditional forecasting
It contains xt+1 , yt , xt , . . . , y1 , x1
Here, knowledge of xt+1 is assumed (forecast condition).
E(yt+1 |It ) = β0 + β1 E(xt+1 |It ) → Unconditional forecasting
It contains yt , xt , . . . , y1 , x1
Here, xt+1 needs to be estimated before yt+1
u
\ nemt = 1.572 + .732 unemt−1
(.577) (.097)
2
n = 48, R = .544
u
\ nemt = 1.304 + .647 unemt−1 + .184 inft−1
(.490) (.084) (.041)
2
n = 48, R = .677
Note that these regressions are not meant as causal equations. The
hope is that the linear regressions approximate well the conditional
expectation.
TS & forecasting
Additional comments
Multiple-step-ahead forecasts are possible, but necessarily
less precise.
Forecasts may make use of deterministic trends, but the
error made by extrapolating time trends too far into the
future may be large.
Similarly, seasonal patterns may be incorporated into
forecasts.
It is possible to calculate confidence intervals for the point
multiple-step-ahead forecasts.
Forecasting I(1) time series can be based on adding
predicted changes (which are I(0)) to base levels.
Forecast intervals for I(0) series converge to the
unconditional variance, whereas for integrated series, they
are unbounded.
Finite and infinite distributed lag models
Rational expectations
Finite and infinite distributed lag models
yt = α0 + β0 xt + β1 xt−1 + β2 xt−2 + ut
yt = α0 + δ0 xt + δ1 xt−1 + δ2 xt−2 + · · · + ut
0.5
βi = k0 + k1 i + k2 i2 (12)
β0 = k 0 0.4
0.3
beta
β1 = k0 + k1 + k2
0.2
β2 = k0 + k1 ·2 + k2 ·4
0.1
...
0.0
0 2 4 6 8
βm = k0 + k1 m + k2 m2 lag
Polynomial distributed lag
RDL specification:
yt = α0 + γ0 xt + γ1 xt−1 + ρyt−1 + vt
can be used to calculate δh in the IDL model:
yt = α + δ0 xt + δ1 xt−1 + δ2 xt−2 + · · · + ut
With RDLs, impact propensity γ0 ≡ δ0 can differ in sign from lagged
coefficients.
δh = ρh−1 (ργ0 + γ1 ) corresponds to the xt−h variable for h ≥ 1.
. . . δ0 may differ in sign from “lags”, even if 0 < ρ < 1.
. . . for ρ > 0, δh doesn’t change sign with growing h ≥ 1.
γ0 +γ1
Long-run propensity: LRP = 1−ρ ,
where (1 − ρ) > 0 ⇒ the sign of LRP follows the sign of (γ0 + γ1 ).
Also, ut = vt + ρvt−1 + ρ2 vt−2 + · · · M A(∞)
Koyck transformation and RDL model – summary
yt∗ = α + βxt + ut
yt = θyt∗ + (1 − θ)yt−1 , 0<θ<1
Parameters:
yt = α + βx∗t + ut . (26)
yt = α + βx∗t + ut , (26)
yt = αφ + βφxt + (1 − φ)yt−1 + vt
(29)
yt = β00 + β10 xt + β20 yt−1 + vt
Example continued:
Example continued:
Rational expectations
xt = x∗t + vt
Rational vs. adaptive expectations