0% found this document useful (0 votes)
30 views217 pages

Tutorial

This tutorial focuses on forecasting univariate processes using ARMA models, guiding students through estimating models, analyzing residuals, and generating forecasts using stock price data of Merck & Co. from 2011 to 2012. It includes practical exercises on data manipulation in R, constructing variables like changes in prices and log returns, and evaluating model performance using AIC and BIC criteria. The tutorial culminates in residual analysis and hypothesis testing to identify adequate ARMA models.

Uploaded by

Sandesh Bashyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views217 pages

Tutorial

This tutorial focuses on forecasting univariate processes using ARMA models, guiding students through estimating models, analyzing residuals, and generating forecasts using stock price data of Merck & Co. from 2011 to 2012. It includes practical exercises on data manipulation in R, constructing variables like changes in prices and log returns, and evaluating model performance using AIC and BIC criteria. The tutorial culminates in residual analysis and hypothesis testing to identify adequate ARMA models.

Uploaded by

Sandesh Bashyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 217

ECON3350: Applied Econometrics

for Macroeconomics and Finance


Tutorial 3: Forecasting Univariate Processes - II

At the end of this tutorial you should be able to:

• estimate a set of specified ARMA models using for loops;

• reduce the set of models using information criteria and residuals analysis;

• generate forecasts and predictive intervals for a specified ARMA(p, q);

• compare and evaluate forecasts obtained from different ARMA models.

Problems with Solutions


1. The file Merck.csv contains daily data of stock prices of Merck & Co., Inc.
(MRK) during 2001–2013. In what follows, we use yt to denote the adjusted
closing prices (Adj_Close in the data) at time t.

(a) Load the data into R and construct a data set with observations in the
range 1 January 2011—31 January 2012.

Solution We will need to install and load the forecast library in R.

library(forecast)

Now, load the data using the read.delim command with the sep = "," option
as it is comma delimited.

mydata <- read.delim("Merck.csv", header = TRUE, sep = ",")

Use Date (first column) in the data set to select the required range, then save
Adj_Close as the yt variable.

1
sel_sample <- mydata$Date >= as.Date("2011-01-01") &
mydata$Date <= as.Date("2012-01-31")
y <- as.matrix(mydata$Adj_Close[sel_sample])

(b) Construct the following variables:


• changes in prices: ∆yt = yt − yt−1 ;
• log returns: rt = log(yt /yt−1 ).

Solution Differencing is easily done with diff command and lagging a series
can be done with indexing operations (although there are also other ways).

Dy <- diff(y)
r <- as.matrix(log(y[2:nrow(y)]) - log(lag(y[1:nrow(y) - 1])))

colnames(y) <- "Stock Prices of Merck (MRK)"


colnames(Dy) <- "Changes in Stock Prices"
colnames(r) <- "Log Returns"

(c) Draw time series plots of yt , ∆yt and rt ; comment on the stationarity
of the processes that may have generated these observations.

Solution Use the Date variable in the data set to select a subsample for 2011,
then plot each series using the plot command.

sel2011 <- mydata$Date[sel_sample] <= as.Date("2011-12-31")


dates = as.Date(mydata$Date[sel_sample])
plot(dates[sel2011], y[sel2011], type = "l", xlab = "Time (2011)",
ylab = colnames(y))

2
23 24 25 26 27 28 29 30
Stock Prices of Merck (MRK)

Jan 2011 Apr 2011 Jul 2011 Oct 2011 Jan 2012

Time (2011)

plot(dates[sel2011], Dy[sel2011], type = "l", xlab = "Time (2011)",


ylab = colnames(Dy))
1.0
Changes in Stock Prices

0.5
−0.5
−1.5

Jan 2011 Apr 2011 Jul 2011 Oct 2011 Jan 2012

Time (2011)

3
plot(dates[sel2011], r[sel2011], type = "l", xlab = "Time (2011)",
ylab = colnames(r))
0.02
Log Returns

−0.02
−0.06

Jan 2011 Apr 2011 Jul 2011 Oct 2011 Jan 2012

Time (2011)

The process {yt } is likely not stationary as its mean appears to vary over time.
The process {∆yt } seems to have a zero mean, but its variance may depend on
time. We will cover time-varying variance later in the course.

(d) Compute and plot the sample ACFs and PACFs of yt and ∆yt . Com-
ment on your findings.

Solution Use the acf and pacf commands as in Tutorial 1.

acf(y[sel2011], main = colnames(y))

4
Stock Prices of Merck (MRK)
1.0
0.8
0.6
ACF

0.4
0.2
0.0

0 5 10 15 20

Lag

pacf(y[sel2011], main = colnames(y))

Stock Prices of Merck (MRK)


1.0
0.8
0.6
Partial ACF

0.4
0.2
0.0

5 10 15 20

Lag

5
acf(Dy[sel2011], main = colnames(Dy))

Changes in Stock Prices


1.0
0.8
0.6
ACF

0.4
0.2
0.0

0 5 10 15 20

Lag

For prices (yt ), the SACF decays but the SPACF drops to zero after one lag. Also,
the first PAC is near 1, suggesting an AR(1) model of the form yt = yt + ϵt . You
should be able to see a connection to the implied model for ∆yt .
For price changes (∆yt ), the SPACF drops to zero after one lag but the SACF
oscillates. Also, the first AC is near 1, suggesting an MA(1) model of the form
∆yt = ϵt−1 +ϵt . You should notice the somewhat contradictory conclusions we have
reached with this line of reasoning. Can you think of any possible explanations?

(e) Propose and estimate 25 ARMA(p, q) models for ∆yt .

Solution The Arima command from the forecast package makes it easy to
estimate a specified ARMA(p, q). We can put this command inside a nested for
loop to process a lot of ARMA models quickly. For this exercise, we estimate
models with combinations of p = 0, . . . , 4 and q = 0, . . . , 4. For each estimated
model, we extract the estimated AIC and BIC, then save these in a neat matrix,
which we call ic, to be examined ex-post.

6
ic <- matrix( nrow = 25, ncol = 4 )
colnames(ic) <- c("p", "q", "aic", "bic")
for (p in 0:4)
{
for (q in 0:4)
{
fit_p_q <- Arima(Dy, order = c(p, 0, q))
ic[p * 5 + q + 1,] = c(p, q, fit_p_q[["aic"]], fit_p_q[["bic"]])
}
}

(f) Use the AIC and BIC to reduce the set of ARMA(p, q) models.

Solution We start by examining our constructed ic matrix.

print(ic)

## p q aic bic
## [1,] 0 0 221.1652 228.3695
## [2,] 0 1 222.9201 233.7265
## [3,] 0 2 221.3797 235.7882
## [4,] 0 3 221.7444 239.7550
## [5,] 0 4 223.3386 244.9513
## [6,] 1 0 222.8619 233.6683
## [7,] 1 1 218.0345 232.4430
## [8,] 1 2 217.6980 235.7086
## [9,] 1 3 219.6798 241.2926
## [10,] 1 4 221.6430 246.8578
## [11,] 2 0 220.8859 235.2944
## [12,] 2 1 217.7698 235.7804
## [13,] 2 2 219.6627 241.2755
## [14,] 2 3 221.5682 246.7830
## [15,] 2 4 223.5379 252.3549
## [16,] 3 0 219.9955 238.0061
## [17,] 3 1 219.6965 241.3092
## [18,] 3 2 221.4811 246.6960
## [19,] 3 3 221.2772 250.0942
## [20,] 3 4 219.9602 252.3793

7
## [21,] 4 0 221.6630 243.2757
## [22,] 4 1 221.6951 246.9099
## [23,] 4 2 223.4237 252.2406
## [24,] 4 3 219.5087 251.9277
## [25,] 4 4 219.5611 255.5823

The AIC and BIC appear to wildly disagree in their rankings of ARMA mod-
els. Unfortunately, there is no systematic approach to resolving this conflicting
information!
We need to proceed in a sensible way, so we look at the top 10 specifications
preferred by the AIC as well as the top 10 preferred by the BIC. This is easy to
do by sorting the matrix ic.

ic_aic <- ic[order(ic[,3], decreasing = FALSE),][1:10,]


ic_bic <- ic[order(ic[,4], decreasing = FALSE),][1:10,]
print(ic_aic)

## p q aic bic
## [1,] 1 2 217.6980 235.7086
## [2,] 2 1 217.7698 235.7804
## [3,] 1 1 218.0345 232.4430
## [4,] 4 3 219.5087 251.9277
## [5,] 4 4 219.5611 255.5823
## [6,] 2 2 219.6627 241.2755
## [7,] 1 3 219.6798 241.2926
## [8,] 3 1 219.6965 241.3092
## [9,] 3 4 219.9602 252.3793
## [10,] 3 0 219.9955 238.0061

print(ic_bic)

## p q aic bic
## [1,] 0 0 221.1652 228.3695
## [2,] 1 1 218.0345 232.4430
## [3,] 1 0 222.8619 233.6683
## [4,] 0 1 222.9201 233.7265
## [5,] 2 0 220.8859 235.2944
## [6,] 1 2 217.6980 235.7086
## [7,] 2 1 217.7698 235.7804
## [8,] 0 2 221.3797 235.7882
## [9,] 3 0 219.9955 238.0061
## [10,] 0 3 221.7444 239.7550

8
For our purpose, next, it is sensible to select the models that make the top 10 in
both lists. However, this is not a rule by any means—in other contexts, it is likely
that a different approach will be more sensible.

adq_set = list(c(1, 0, 1), c(1, 0, 2), c(2, 0, 1), c(3, 0, 0))

(g) Draw time series plots of the estimated residuals you obtained for the
ARMA models selected in part (f). Comment on your findings. Run the
Ljung-Box test (at the α = 5% significance level) to test the white noise
hypothesis on estimated residuals obtained from each ARMA in the set
obtain in part (f) and report the test results. Use this information to
identify the adequate set of specified ARMAs.

Solution The forecast package simplifies residuals analysis with the command
checkresiduals. It will plot the estimated residuals series along with the SACF
and a histogram. It also automatically performs the Ljung-Box test using K = 10
by default and the α = 5% significance level.

for (i in 1:length(adq_set))
{
checkresiduals(Arima(Dy[sel2011], order = adq_set[[i]]))
}

9
Residuals from ARIMA(1,0,1) with non−zero mean
1

−1

−2
0 50 100 150 200 250

0.10 40

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10
0
0 5 10 15 20 25 −2 −1 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1) with non-zero mean
## Q* = 8.0188, df = 8, p-value = 0.4316
##
## Model df: 2. Total lags used: 10

10
Residuals from ARIMA(1,0,2) with non−zero mean
1

−1

−2
0 50 100 150 200 250

0.10 40

0.05 30
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 25 −2 −1 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,2) with non-zero mean
## Q* = 6.42, df = 7, p-value = 0.4917
##
## Model df: 3. Total lags used: 10

11
Residuals from ARIMA(2,0,1) with non−zero mean
1

−1

−2
0 50 100 150 200 250

0.10 40

0.05 30
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 25 −2 −1 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,1) with non-zero mean
## Q* = 6.4833, df = 7, p-value = 0.4846
##
## Model df: 3. Total lags used: 10

12
Residuals from ARIMA(3,0,0) with non−zero mean
1

−1

−2
0 50 100 150 200 250

0.15 40

0.10
30
0.05
ACF

df$y
0.00 20

−0.05 10
−0.10
0
0 5 10 15 20 25 −2 −1 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,0,0) with non-zero mean
## Q* = 11.424, df = 7, p-value = 0.1212
##
## Model df: 3. Total lags used: 10

Nothing clearly stands out to suggest a blatent problem with correlated residuals,
so we proceed with the four specifications in the set.

(h) Forecast changes in MRK stock prices in January, 2012. For each
ARMA model in the adequate set, compare your predicted price
changes with real price changes in the data. Compare the forecasts you
obtained as well as their “quality” across ARMA models and comment
on the robustness of the generated forecasts.

13
Solution We forecast each model in the adequate set using the forecast com-
mand within a nested for loop. The option level = c(68, 95) instructs the
command to construct two sets of predictive intervals: one with 95% coverage and
another with 68% coverage.

hrz = sum(sel_sample) - sum(sel2011)


xticks <- c(sum(sel_sample) - 3 * hrz + c(1, 2 * hrz, 3 * hrz))
actual_Dy <- as.matrix(Dy[!sel2011])
fcst_Dy <- vector(mode = "list", length(adq_set))
for (i in 1:length(adq_set))
{
model_p_q <- adq_set[[i]]
fcst_Dy[[i]] <- forecast(Arima(Dy[sel2011], order = model_p_q),
h = hrz, level = c(68, 95))

title_p_q <- paste("ARMA(", as.character(model_p_q[1]), ", ",


as.character(model_p_q[3]), ")", sep = "")

plot(fcst_Dy[[i]], include = hrz * 2, ylab = colnames(Dy),


main = title_p_q, xaxt = "n")
lines(sum(sel2011) + 1:hrz, actual_Dy)
axis(1, at = xticks, labels = dates[xticks])
}

ARMA(1, 1)
1.0
Changes in Stock Prices

0.5
0.0
−0.5

2011−11−03 2011−12−30 2012−01−31

14
ARMA(1, 2)
1.0
Changes in Stock Prices

0.5
0.0
−0.5

2011−11−03 2011−12−30 2012−01−31

ARMA(2, 1)
1.0
Changes in Stock Prices

0.5
0.0
−0.5

2011−11−03 2011−12−30 2012−01−31

15
ARMA(3, 0)
1.0
Changes in Stock Prices

0.5
0.0
−0.5

2011−11−03 2011−12−30 2012−01−31

When we use the plot command with output from the forecast command, we
get a nice depiction of how the data is extrapolated into the future, complete with
predictive intervals to capture uncertainty. We can also add the actual outcomes
in the forecast period to help us compare the forecast performance of each ARMA
in the adequate set.
We first note that predictive intervals for price changes (∆yt ) appear to have a
fixed width even as the forecast horizon increases (from 1 day to 20 days).
Comparing further across specifications, it is clear that all four generate very
similar forecasts for January 2012. Therefore, the specification differences between
them are not important for our purpose.
Alternatively, we may conclude that our forecast of price changes is robust to
minor differences in the specification of ARMA models in that we cannot clearly
distinguish between them with our diagnostic tools.

(i) Forecast MRK prices yt (levels this time, instead of changes) using an
ARMA(2, 1) model only. Compare your predicted prices with real prices
in the data. Compare the price forecasts obtained in this part with price
forecasts obtained by transforming the forecasts in part (h). HINT: you
will need convert predicted prices changes to predicted prices.

16
Solution To forecast prices yt , we use the same approach but replace Dy with y
throughout.

actual_y <- as.matrix(y[!sel2011])


fcst_y_lev = forecast(Arima(y[sel2011], order = c(2, 0, 1)),
h = hrz, level = c(68, 95) )

plot(fcst_y_lev, include = hrz * 2, ylab = colnames(y),


main = "ARMA(2, 1)", xaxt = "n", ylim = c(26.1, 33.4))
lines(sum(sel2011) + 1:hrz, actual_y)
axis(1, at = xticks, labels = dates[xticks])

ARMA(2, 1)
Stock Prices of Merck (MRK)

32
30
28
26

2011−11−03 2011−12−30 2012−01−31

The forecasts we obtain here are for prices, where as the forecasts that were
obtained for part (h) were for price changes. In order to compare them, we need
to convert predicted price changes to prices, which is achieved by cumulatively
suming:
t
X
yt = y0 + ∆yj ,
j=1

where y0 in our case is the last observation in the “pre-sample’ ’.


However, if we did that manually, we would need to re-calculate the predictive
intervals manually as well. We do not want to do this (and it does not work by
simply cumulatively summing the interval limits either)!
Instead, we use another option in the Arima command. In particular, we pass
the option order = c(p, 1, q) instead of order = c(p, 0, q). The “1’ ’ in

17
the middle instructs R to difference yt when estimating the ARMA parameters.
However, it will automatically reverse the differencing when computing forecasts
and predictive intervals! This specification is called the ARIMA(p, q). You can
verify (by trying it yourself) that an ARIMA(p, q) for yt yields the same AR and
MA coefficient estimates as an ARMA(p, q) for ∆yt .

y0 <- mydata$Adj_Close[sum(mydata$Date < as.Date("2011-01-01") - 1)]


y_ext = as.matrix(c(y0, y[sel2011]))
fcst_y <- vector(mode = "list", length(adq_set))
for (i in 1:length(adq_set))
{
model_p_q <- adq_set[[i]]
model_p_q[2] = 1
fcst_y[[i]] <- forecast(Arima(y_ext, order = model_p_q,
include.constant = T),
h = hrz, level = c(68, 95) )

title_p_q <- paste("ARIMA(", as.character(model_p_q[1]), ", ",


as.character(model_p_q[3]), ")", sep = "")

plot(fcst_y[[i]], include = hrz * 2, ylab = colnames(y),


main = title_p_q, xaxt = "n")
lines(1 + sum(sel2011) + 1:hrz, actual_y)
axis(1, at = 1 + xticks, labels = dates[xticks])
}

ARIMA(1, 1)
Stock Prices of Merck (MRK)

32
30
28
26

2011−11−03 2011−12−30 2012−01−31

18
ARIMA(1, 2)
Stock Prices of Merck (MRK)

32
30
28
26

2011−11−03 2011−12−30 2012−01−31

ARIMA(2, 1)
34
Stock Prices of Merck (MRK)

32
30
28
26

2011−11−03 2011−12−30 2012−01−31

19
ARIMA(3, 0)
Stock Prices of Merck (MRK)

32
30
28
26

2011−11−03 2011−12−30 2012−01−31

A number of interesting observations emerge in comparing these forecast results.


The first is that all ARIMAs in the adequate set generate very similar forecasts
(including predictive intervals). The second is that the predictive intervals for
yt increase as the forecast horizon increases (they are narrower for forecasts in
the beginning of the forecasting period and wider towards the end of the forecast
period). How might this relate to the stationarity properties of {yt }?
Comparing the forecasts obtained from the ARIMA(1, 1) to those obtained from
the ARMA(2, 1), we see that both produce predictive intevals that increase as
the horizon increases. However, the ARMA(2, 1) predictive intervals indicate
that prices should fall in January 2012. This is slightly different from what the
ARIMA(1, 1) produces—although the predictive intervals from the two specifica-
tions largely overlap, the ARIMA(1, 1) forecast clearly puts more weight on higher
prices in January.
When comparing to the actual observations in January 2012, it is easy to see that
the ARIMA forecasts are better (which can be confirmed by formal metrics).

(j) OPTIONAL: Repeat parts (d)-(h) for log returns rt . Note that here
you will forecast daily returns (yt − yt−1 )/yt−1 in January, 2012. Hint:
Recall that (yt − yt−1 )/yt−1 ≈ rt .

20
Solution The steps are nearly the same as those for working with ∆yt .

acf(r[sel2011], main = colnames(y))

Stock Prices of Merck (MRK)


1.0
0.8
0.6
ACF

0.4
0.2
0.0

0 5 10 15 20

Lag

pacf(r[sel2011], main = colnames(y))

21
Stock Prices of Merck (MRK)
0.00 0.05 0.10
Partial ACF

−0.10

5 10 15 20

Lag

ic <- matrix( nrow = 25, ncol = 4 )


colnames(ic) <- c("p", "q", "aic", "bic")
for (p in 0:4)
{
for (q in 0:4)
{
fit_p_q <- Arima(r, order = c(p, 0, q))
c(p * 5 + q + 1, p, q)
ic[p * 5 + q + 1,] = c(p, q, fit_p_q[["aic"]], fit_p_q[["bic"]])
}
}

ic_aic <- ic[order(ic[,3], decreasing = FALSE),][1:10,]


ic_bic <- ic[order(ic[,4], decreasing = FALSE),][1:10,]

adq_set = list(c(1, 0, 1), c(1, 0, 2), c(2, 0, 1), c(3, 0, 0))

for (i in 1:length(adq_set))
{
checkresiduals(Arima(r[sel2011], order = adq_set[[i]]))
}

22
Residuals from ARIMA(1,0,1) with non−zero mean
0.025

0.000

−0.025

−0.050

0 50 100 150 200 250

0.10 40

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10
0
0 5 10 15 20 25 −0.050 −0.025 0.000 0.025
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1) with non-zero mean
## Q* = 7.6925, df = 8, p-value = 0.4641
##
## Model df: 2. Total lags used: 10

23
Residuals from ARIMA(1,0,2) with non−zero mean

0.025

0.000

−0.025

−0.050

0 50 100 150 200 250

40
0.10

0.05 30
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 25 −0.075 −0.050 −0.025 0.000 0.025
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,2) with non-zero mean
## Q* = 6.1518, df = 7, p-value = 0.5221
##
## Model df: 3. Total lags used: 10

24
Residuals from ARIMA(2,0,1) with non−zero mean

0.025

0.000

−0.025

−0.050

0 50 100 150 200 250

0.10 40

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10
0
0 5 10 15 20 25 −0.075 −0.050 −0.025 0.000 0.025
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,1) with non-zero mean
## Q* = 6.2143, df = 7, p-value = 0.515
##
## Model df: 3. Total lags used: 10

25
Residuals from ARIMA(3,0,0) with non−zero mean
0.025

0.000

−0.025

−0.050

0 50 100 150 200 250

0.15
40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05 10
−0.10
0
0 5 10 15 20 25 −0.075 −0.050 −0.025 0.000 0.025
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,0,0) with non-zero mean
## Q* = 11.181, df = 7, p-value = 0.1309
##
## Model df: 3. Total lags used: 10

hrz <- sum(sel_sample) - sum(sel2011)


xticks <- c(sum(sel_sample) - 3 * hrz + c(1, 2 * hrz, 3 * hrz))
actual_r <- as.matrix(r[!sel2011])
fcst_r <- vector(mode = "list", length(adq_set))
for (i in 1:length(adq_set))
{
model_p_q <- adq_set[[i]]
fcst_r[[i]] <- forecast(Arima(r[sel2011], order = model_p_q),
h = hrz, level = c(68, 95))

title_p_q <- paste("ARMA(", as.character(model_p_q[1]), ", ",


as.character(model_p_q[3]), ")", sep = "")

plot(fcst_r[[i]], include = hrz * 2, ylab = colnames(r),


main = title_p_q, xaxt = "n")
lines(sum(sel2011) + 1:hrz, actual_r)

26
axis(1, at = xticks, labels = dates[xticks])
}

ARMA(1, 1)
0.03
Log Returns

0.01
−0.01
−0.03

2011−11−03 2011−12−30 2012−01−31

ARMA(1, 2)
0.03
Log Returns

0.01
−0.01
−0.03

2011−11−03 2011−12−30 2012−01−31

27
ARMA(2, 1)
0.03
Log Returns

0.01
−0.01
−0.03

2011−11−03 2011−12−30 2012−01−31

ARMA(3, 0)
0.03
Log Returns

0.01
−0.01
−0.03

2011−11−03 2011−12−30 2012−01−31

28
ECON3350: Applied Econometrics
for Macroeconomics and Finance

Tutorial 4: Dynamic Relationships

At the end of this tutorial you should be able to:

• derive the ECM representation of an ARDL(p, l, s) model;

• create a function in R;

• estimate IRFs to permanent and one-off shocks as well as LRMs;

• construct confidence intervals for IRFs and LRMs;

• Select and adequate set of ARDL models and draw inference on dynamic
relationships.

Problems with Solutions


1. Derive the ECM representation of the following ARDL(1, 1, 2) model:

yt = a0 + a1 yt−1 + b0 xt + b1 xt−1 + c0 wt + c1 wt−1 + c2 wt−2 + ϵy,t .

Which parameter(s) in the resulting ECM are long-run multiplier(s) and


adjustment parameter(s)?

Solution The error correction representation of an ARDL(1,1,2) is:

∆yt = γ + α(yt−1 − µ − βx xt−1 − βw wt−1 ) + b0 ∆xt + c0 ∆wt − c2 ∆wt−1 + ϵt ,

where
b0 + b 1 c0 + c1 + c2 a0 − γ
α = −(1 − a1 ), βx = , βw = , µ= .
1 − a1 1 − a1 1 − a1

1
To convert this to the ECM representation, we add and subtract five terms: γ,
yt−1 , b0 xt−1 , c0 wt−1 and c2 wt−1 . Specifically,

yt = a0 +γ − γ+yt−1 − yt−1 + a1 yt−1 + b0 xt +b0 xt−1 − b0 xt−1 + b1 xt−1


+ c0 wt +c0 wt−1 − c0 wt−1 + c1 wt−1 +c2 wt−1 − c2 wt−1 + c2 wt−2 + ϵt ,
yt −yt−1 = γ−yt−1 + a1 yt−1 +(a0 −γ) + b0 xt−1 + b1 xt−1 +c0 wt−1 + c1 wt−1 +c2 wt−1
+ b0 xt −b0 xt−1 + c0 wt −c0 wt−1 −c2 wt−1 + c2 wt−2 + ϵt ,
yt −yt−1 = γ − (1 − a1 )yt−1 + (a0 −γ) + (b0 + b1 )xt−1 + (c0 + c1 +c2 )wt−1
+ b0 (xt −xt−1 ) + c0 (wt −wt−1 ) − c2 (wt−1 − wt−2 ) + ϵt ,
∆yt = γ + α(yt−1 − µ − βx xt−1 − βw wt−1 ) + b0 ∆xt + c0 ∆wt − c2 ∆wt−1 + ϵt .

2. Create a function in R to compute coefficients θ0 , . . . , θh in

θ(L) = b(L)/a(L) = θ0 + θ1 L + · · · + θh Lh + · · · ,

where a(L) = a0 + a1 L + · · · + ap Lp and (L) = b0 + b1 L + · · · + bq Lq .


¯

Solution An example of such a functions as follows.

arma2ma <- function(a, b, h)


{
# we always start here
theta0 <- b[1] / a[1]

if (h == 0)
{
# if the horizon is zero, then just return theta0
# note: good practice would be to check that h is an integer
return(theta = theta0)
}

# get the orders of a(L) and b(L); in fact, the ARMA orders
# are p - 1 and q - 1 because we also have a0 and b0 to take
# into account
p <- length(a)
q <- length(b)

# augment the AR and MA coefficients vectors to match the


# number of thetas we are going to compute -- this makes

2
# things easier later
if (h > p)
{
a = c(a, numeric(1 + h - p))
}

if (h > q)
{
b = c(b, numeric(1 + h - q))
}

# allocate space for 1 + h thetas and set theta0 = b0 / a0


theta <- c(theta0, numeric(h))
for (j in 1:h)
{
theta[1 + j] <- (b[1 + j] - sum(a[2:(1 + j)]
* theta[j:1])) / a[1]
}

return(theta)
}

3. Create a function in R to compute IRFs (to both one-off and permanent


shocks) up to horizon h as well as the LRMs for the ARDL(p, l, s):

a(L)yt = a0 + b(L)xt + c(L)wt + ϵy,t .

Solution An example of such a function is as follows.

ardl_irfs <- function(ardl_est, h = 40, cumirf = T)


{
# extract the lag orders and coefficient estimates from the
# estimated ARDL
order <- ardl_est$order
coefficients <- ardl_est$coefficients

# extract the autoregressive coefficients and construct a(L)


j <- 1 + order[1]
a <- c(1, -coefficients[2:j])

3
# get the number of exogenous variables in the ARDL: we want to
# get IRFs to each one of these separately
k <- length(order) - 1

# allocate space for all the IRFs


irfs <- matrix(nrow = 1 + h, ncol = k)
colnames(irfs) <- rep("", k)

# allocate space for LRMs


lrm <- numeric(k)
names(lrm) <- rep("", k)

# now, cycle through each exogenous variable and compute


# IRFs/LRMs
for (i in 1:k)
{
# advance the index to where the estimated coefficients are
# stored in the variable "coefficients", then extract them
# and construct b(L) for the ith exogenous variable in the
# ARDL
j0 <- 1 + j
j <- j0 + order[1 + i]
b <- coefficients[j0:j]
colnames(irfs)[i] <- names(coefficients[j0])
names(lrm)[i] <- names(coefficients[j0])

if (cumirf)
{
# compute the first "h" terms of theta(L) = b(L)/a(L) if
# cumulative IRFs are requested, do a cumulative sum of
# theta coefficients
irfs[, i] <- cumsum(arma2ma(a, b, h))
}
else
{
# compute the first "h" terms of theta(L) = b(L)/a(L)
# and save them
irfs[, i] <- arma2ma(a, b, h)
}
lrm[i] <- sum(b) / sum(a)
}

return(list(irfs = irfs, lrm = lrm))


}

4
4. The file wealth.csv contains observations on:

• ct : the log of total real per capita expenditures on durables, nondurables


and services;
• at : the log of a measure of real per capita household net worth (includ-
ing all financial and household wealth); and
• yt : the log of after-tax labour income.

The sample period from 1952Q2 through 2006Q2 (see Koop, G., S. Potter
and R. W. Strachan (2008) “Re-examining the consumption-wealth relation-
ship: The role of uncertainty” Journal of Money, Credit and Banking, Vol.
40, No. 2.3, 341-367).

(a) Estimate an ARDL(1, 2, 2) specified for ct and use the functions created
in Questions 2 and 3 to obtain the estimated IRFs to permanent shocks
in at and yt as well the LRMs. Hint: to estimate the ARDL parameters,
try the ardl function that is provided by the ARDL package.

Solution We will need to install and load the ARDL library in R.

library(ARDL)

Next, load the data and use the ardl function to estimate the parameters of the
an ARDL(1, 1, 2).

mydata <- read.delim("wealth.csv", header = TRUE, sep = ",")


ardl_est <- ardl(CT ~ AT + YT, mydata, order = c(1, 2, 2))

Finally, use our ardl_irfs function to compute the cumulative IRFs and LRMs.

irfs_lrm <- ardl_irfs(ardl_est)


for (i in 1:ncol(irfs_lrm$irfs))
{
plot(0:40, irfs_lrm$irfs[, i], type = "l",
ylab = "Impulse Response", xlab = "Horizon",
main = paste("Cummulative IRFs to",
colnames(irfs_lrm$irfs)[i]))
}

5
Cummulative IRFs to AT
0.20
Impulse Response

0.15
0.10

0 10 20 30 40

Horizon

Cummulative IRFs to YT
0.7
Impulse Response

0.6
0.5
0.4

0 10 20 30 40

Horizon

print(irfs_lrm$lrm)

## AT YT
## 0.2687276 0.7873063

6
(b) Estimate the ECM representation of the ARDL(1, 2, 2) and report the
results. How do the LRMs in the estimated ECM compare to those
computed in part (a)? Hint: use the recm and multipliers functions
to convert the output produced by ardl.

Solution Given output from the ardl function, obtaining the ECM form entails
two steps. First, use the recm function with the case = 2 option. This option
corresponds to “no linear trend’ ’ and an intercept within the equlibrium relation
(γ = 0 and µ unrestricted in our notation).1
The function recm will produce estimates of the “short-run’ ’ coefficients in the
ECM, but not the LRMs (recm only produces the aggregate error correction term,
which it calls ect). To obtain estimates of the LRMs (i.e., details of the ect), use
the multipliers function.

ecm_sr <- recm(ardl_est, case = 2)


ecm_lrm <- multipliers(ardl_est)
print(summary(ecm_sr))

##
## Time series regression with "zooreg" data:
## Start = 3, End = 217
##
## Call:
## dynlm::dynlm(formula = full_formula, data = data, start = start,
## end = end)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0230749 -0.0039518 0.0003092 0.0034368 0.0156863
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## d(AT) 0.063151 0.017998 3.509 0.000551 ***
## d(L(AT, 1)) 0.049423 0.017916 2.759 0.006318 **
## d(YT) 0.351155 0.041389 8.484 3.90e-15 ***
## d(L(YT, 1)) 0.183718 0.043249 4.248 3.24e-05 ***
1
A linear trend amounts to including t in the ARDL as a regressor—more on this later in the
course.

7
## ect -0.035626 0.007755 -4.594 7.50e-06 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.005683 on 210 degrees of freedom
## (0 observations deleted due to missingness)
## Multiple R-squared: 0.6333, Adjusted R-squared: 0.6246
## F-statistic: 72.54 on 5 and 210 DF, p-value: < 2.2e-16

print(ecm_lrm)

## Term Estimate Std. Error t value Pr(>|t|)


## 1 (Intercept) -0.4683080 0.2067787 -2.264778 2.456199e-02
## 2 AT 0.2687276 0.1213092 2.215228 2.783439e-02
## 3 YT 0.7873063 0.1407531 5.593525 6.989283e-08

(c) Use the function ardl_irfs_ci that is provided in the file


ardl_irfs_ci.R to construct 68% confidence intervals for the
IRFs obtained in part (a).
The function ardl_irfs_ci takes the following inputs:
• ardl_est: this is the output of ardl;
• h: the maximum IRF horizon (default is 40);
• cumirf: whether to compute IRFs to a permanent shock (default
is TRUE);
• conf: the confidence level of the intervals (default is 0.95).
It returns the following outputs:
• lb: an h × k matrix of lower-bounds for confidence intervals;
• md: an h × k matrix of mid-points for confidence intervals;
• ub: an h × k matrix of upper-bounds for confidence intervals.
Note that k is the number of independent variables in the ARDL, so
that column j of lb, md and ub is related the confidence intervals for
IRFs to a shock in the jth independent variable.

Solution Include the functions provided in the file ardl_irfs_ci.R.

8
source("ardl_irfs_ci.R")

Use the function ardl_irfs_ci to estimate confidence intervals for IRFs to per-
manent shocks in at and yt .

irfs_ci <- ardl_irfs_ci(ardl_est, conf = 0.68)

Here, we choose 68% as the confidence level. This corresponds to an interval


that is approximately two standard deviations in width (in fact, it is exactly that
if the sampling distribution of the estimator is normal). You should try other
confidence levels to see how the width of the interval changes. In doing so, think
about how “confidence level’ ’ relates to risk in decisions that would be made based
on inference from these confidence intervals.
The best way to examine estimated IRFs is to plot them.

for (i in 1:ncol(irfs_ci$md))
{
plot(0:40, irfs_ci$md[, i], type = "l",
ylab = "Impulse Response", xlab = "Horizon",
main = paste("Cumulative IRFs to",
colnames(irfs_ci$md)[i]),
ylim = c(min(irfs_ci$lb[, i]), max(irfs_ci$ub[, i])))
lines(0:40, irfs_ci$ub[, i], type = "l", col = "blue")
lines(0:40, irfs_ci$lb[, i], type = "l", col = "blue")
}

Cumulative IRFs to AT
0.5
0.4
Impulse Response

0.3
0.2
0.1
0.0

0 10 20 30 40

Horizon

9
Cumulative IRFs to YT
1.2
Impulse Response

0.8
0.4
0.0

0 10 20 30 40

Horizon

We first observe that the IRF confidence intervals increase in width as the horizon h
increases. This is equally true for shocks in both at and yt . This is not surprising—
because we are analysing “cumulative’ ’ responses, we expect that responses at
more distant horizons are more difficult to estimate accurately with the same
sample size.
In this particular case, we also observe that our estimate of a1 is close to 1, which
means the estimated a(L) has a root close to unity, and the estimated ARDL is
close to unstable. Hence, we are drawing inference on realised ARDLs that are
“close’ ’ to the instability region.
Recall that an unstable ARDL does not have finite LRMs. When realised ARDLs
are “close’ ’ to this, but still stable, they will result in finite LRMs. However, the
sampling variance of these LRMs will be very large, thereby resulting in very wide
confidence intervals. Our cumulative IRFs tend towards such LRMs as the horizon
increases, and so, it is not surprising that the confidence intervals become wider
at more distant horizons.
The bottom line is that when an ARDL is estimated close to instability, cumulative
IRFs at longer horizons are not reliable. What this means in practical terms is
that certain dynamic effects are very difficult to infer in the long run. This is
the case here: the effect of permanent increases in either income or wealth on
expenditures are difficult to infer for horizons beyond 20 quarters or so (i.e., five
years).
Any attempt at long-run inference would be accompanied by a high degree of
uncertainty—i.e., be rather inaccurate. Is it still useful? That of course depends
on the objective. For example, we would conclude that with 68% confidence, a $1

10
increase in wealth leads to anywhere between $0 and $0.50 increase in expenditures
after 10 years.
Whether or not this is useful depends on the question. If the question is: does
more wealth lead to a permanent increase in expenditures? Then, our long-run
inference is not very useful because our confidence intervals include negative effects
for all horizons beyond 10 years. On the other hand, if the question is: do people
end up spending all wealth increases in the long run? In this case, our data tells
us that with 68% confidence they do not—less than half of the wealth increase
expected to be spent after 10 years.
Finally, even if long-run effects are difficult to infer, short-term dynamics are typ-
ically still accurately estimated and are often themselves of interest. For example,
in this case we can infer that with 68% confidence, a $1 increase in wealth leads
to an immediate increase in expenditures between $0.05 and $0.15. On the other
hand, a $1 increase in income leads to an immediate increase in expenditures
between $0.45 and $0.65. So conclusion we can confidently claim is that with
68% confidence, an increase in income leads to more expenditures relative to an
equivalent increase in wealth, in the short-run.
There are many other interesting inferential claims we can make from these plots!
Try and see if you can draw more conclusions.

(d) Compare the values in md to the IRFs estimates obtained in part (a).

Solution The midpoint of the confidence interval is exactly the estimate so the
values in md should match exactly to the estimated IRFs. However, this may not
be the case due to “rounding’ ’ errors in practice. The best way to compare two
vectors or matrices is to look at the norm of their difference. Even when rounding
errors result in a norm that is not zero (when in theory it should be), the norm
should yield an obviously small number.

print(norm(irfs_lrm$irfs - irfs_ci$md))

## [1] 0

(e) Use the LRM estimates and standard deviations obtained in part (b)
to construct 68% confidence intervals for the LRMs, assuming the sam-
pling distributions of the LRM estimators are approximately normal.

11
How do they compare to the IRF confidence intervals obtained in part
(c)?

Solution To construct the confidence intervals, we first need to obtain the ap-
propriate percentile z from the normal distribution. To obtain a 68% confidence
interval, this requires setting z to be the 84th percentile. We know that for the
normal distribution Pr(z ≤ 1) ≈ 0.84, so z ≈ 1, but we can also use the R function
qnorm to compute it.

z <- qnorm(.84)

Now, construct the confidence intervals and compare.

lrm_hat <- t(as.matrix(ecm_lrm$Estimate[2:3]))


lrm_se <- t(as.matrix(ecm_lrm$'Std. Error'[2:3]))
ones <- matrix(1, nrow = 3, ncol = 1)
zz <- as.matrix(c(-z, 0, z))
lrm_ci <- ones %*% lrm_hat + zz %*% lrm_se
rownames(lrm_ci) <- c("lb", "md", "ub")
colnames(lrm_ci) <- c("AT", "CT")
print(lrm_ci)

## AT CT
## lb 0.1480907 0.6473332
## md 0.2687276 0.7873063
## ub 0.3893645 0.9272794

irfs_41_ci <- rbind(irfs_ci$lb[41,], irfs_ci$md[41,],


irfs_ci$ub[41,])
rownames(irfs_41_ci) <- c("lb", "md", "ub")
print(irfs_41_ci)

## AT YT
## lb -0.03213304 0.04864858
## md 0.23256522 0.72974583
## ub 0.49726349 1.41084308

The key observation in these results is that the confidence intervals for cumulative
impulse response at h = 40 are in fact wider than the confidence intervals for the
LRMs. You should find this quite unsettling!

12
In fact, both confidence intervals are asymptotically correct (with few exceptions),
meaning that in an infinite sample we should observe confidence intervals for LRMs
that are wider than those for the 40th horizon IR. But we do not have an infinite
sample, and this is the main reason for the discrepancy.
We have already discussed in part (d) how inference on cumulative impulse re-
sponses in a finite sample is expected to be less accurate at more distant horizons.
Indeed, the same is true for estimates of the standard errors that we use to form
the confidence intervals!
At more distant horizons, and culminating with the LRMs, standard errors es-
timates are subject to (often substantial) finite sample bias. The exercise here
underscores two points: (i) it is not uncommon to encounter discrepancies, and
(ii) inference on dynamic effects at longer horizons requires extra care. The under-
lying lesson is simply that one should never take output from statistical software
for granted.

(f) Construct an adequate set of ARDL(p, l, s) models for ct .

Solution We follow a procedure very similar to the one used in Tutorial 3 to


construct an adequate set of ARMA models. In particular, we estimate all speci-
fications obtained from p = 1, . . . , 5, l = 0, . . . , 4 and s = 0, . . . , 4 and record the
AICs and BICs in an easy-to-analyse table. Unlike in Tutorial 3, we will now also
store the estimation results for each specification so as to avoid re-estimating the
models later.

ardl_est <- list()


ic <- matrix(nrow = 125, ncol = 5)
colnames(ic) <- c("p", "l", "s", "aic", "bic")

i <- 0;
for (p in 1:5)
{
for (l in 0:4)
{
for (s in 0:4)
{
i <- i + 1
ardl_est[[i]] <- ardl(CT ~ AT + YT, mydata,
order = c(p, l, s))
ic[i,] <- c(p, l, s, AIC(ardl_est[[i]]),

13
BIC(ardl_est[[i]]))
}
}
}

ic_aic <- ic[order(ic[,4], decreasing = FALSE),][1:10,]


ic_bic <- ic[order(ic[,5], decreasing = FALSE),][1:10,]

The first six models preferred by the BIC also have sufficient support in terms of
the AIC, so we take this to be the adequate set.

adq_set <- ic_bic[1:6,]

We also need to match the orders in the adequate set to the set of all models that
we have estimated.

adq_idx <- match(data.frame(t(adq_set[, 1:3])),


data.frame(t(ic[, 1:3])))

W finally do a quick scan of the residuals for obvious anomalies.

for (i in 1:length(adq_idx))
{
order <- adq_set[i,1:3]
acf(ardl_est[[adq_idx[i]]]$residuals,
xlim = c(1, 20), xaxp = c(1, 20, 1),
ylim = c(-0.15, 0.15), yaxp = c(-0.15, 0.15, 2),
main = paste("Residuals ACF for ARDL(", order[1], ", ",
order[2], ", ",
order[3], ")",
sep = ""))
}

14
Residuals ACF for ARDL(1, 2, 2)
0.15
ACF

0.00
−0.15

1 20

Lag

Residuals ACF for ARDL(1, 1, 2)


0.15
ACF

0.00
−0.15

1 20

Lag

15
Residuals ACF for ARDL(1, 3, 2)
0.15
ACF

0.00
−0.15

1 20

Lag

Residuals ACF for ARDL(1, 0, 2)


0.15
ACF

0.00
−0.15

1 20

Lag

16
Residuals ACF for ARDL(2, 2, 2)
0.15
ACF

0.00
−0.15

1 20

Lag

Residuals ACF for ARDL(1, 2, 3)


0.15
ACF

0.00
−0.15

1 20

Lag

We do not observe anything drastic so we proceed with the set of six models.

(g) Draw inference about the dynamic relationship between expenditures

17
and wealth.

Solution Generate the plots of IRFs with confidence intervals for all ARDL
specifications in the adequate set.

j <- 1 # select responses to AT


shock_name <- "AT"
y_min <- Inf
y_max <- -Inf
irfs_ci <- list()
for (i in 1:length(adq_idx))
{
irfs_ci_i <- ardl_irfs_ci(ardl_est[[adq_idx[i]]], conf = 0.68)
irfs_ci[[i]] <- cbind(irfs_ci_i$lb[, j], irfs_ci_i$md[, j],
irfs_ci_i$ub[, j])
y_min <- min(y_min, irfs_ci_i$lb[, j])
y_max <- max(y_max, irfs_ci_i$ub[, j])
}

for (i in 1:length(adq_idx))
{
order <- adq_set[i,1:3]
plot(0:40, irfs_ci[[i]][, 2], type = "l", ylim = c(y_min, y_max),
ylab = "Impulse Response", xlab = "Horizon",
main = paste("ARDL(", order[1], ", ", order[2], ", ", order[3],
"): Cumulative IRFs to ", shock_name, sep = ""))
lines(0:40, irfs_ci[[i]][, 1], type = "l", col = "blue")
lines(0:40, irfs_ci[[i]][, 3], type = "l", col = "blue")
lines(0:40, numeric(41), type = "l", col = "red")
}

18
ARDL(1, 2, 2): Cumulative IRFs to AT
0.5
0.4
Impulse Response

0.3
0.2
0.1
0.0

0 10 20 30 40

Horizon

ARDL(1, 1, 2): Cumulative IRFs to AT


0.5
0.4
Impulse Response

0.3
0.2
0.1
0.0

0 10 20 30 40

Horizon

19
ARDL(1, 3, 2): Cumulative IRFs to AT
0.5
0.4
Impulse Response

0.3
0.2
0.1
0.0

0 10 20 30 40

Horizon

ARDL(1, 0, 2): Cumulative IRFs to AT


0.5
0.4
Impulse Response

0.3
0.2
0.1
0.0

0 10 20 30 40

Horizon

20
ARDL(2, 2, 2): Cumulative IRFs to AT
0.5
0.4
Impulse Response

0.3
0.2
0.1
0.0

0 10 20 30 40

Horizon

ARDL(1, 2, 3): Cumulative IRFs to AT


0.5
0.4
Impulse Response

0.3
0.2
0.1
0.0

0 10 20 30 40

Horizon

We divide the inference into short-run and long-run effects. By long-run we will
mean around 10 years (equivalently, h ≈ 40) following a shock to wealth. By
short-run we will mean within the first year (equivalent, h ≤ 4).
Starting with the long-run effects, we conclude with reasonable confidence that,
cetiris paribus, an increase in wealth will be at most partially spent after 10 years.

21
In particular, it is very unlikely that more than half of a wealth increase is spent,
but there is a small possibility that overall expenditures contract, in the ten years
following a wealth increase.
In the short-run, we conclude with reasonable confidence that an increase in wealth
leads to an increase in expenditures. Two scenarios are possible. In the first, ex-
penditures jump quickly (within one quarter) and then level off. In the second
scenario, expenditures rise gradually. Under both scenarios, expenditures are un-
likely exceed 20% of the wealth increase within the first year. However, the first
scenario results in a greater short-term expenditure increase relative to the second
scenario.
What inference can you obtain about the dynamic relationship between expendi-
tures and income?

22
ECON3350: Applied Econometrics
for Macroeconomics and Finance

Tutorial 6: Cointegration - I

At the end of this tutorial you should be able to:

• Automate the task of unit root testing in multiple time-series samples in R;

• Implement the Engle-Granger cointegration test in R;

• Interpret the outcome of an Engle-Granger.

• Use the outcome of the Engle-Granger test to infer possible cointegrating


relations.

Problems
In this tutorial you will test for cointegration using the Engle-Granger method.
The data consists of four Australian interest rates: the 5 year (i3y) and 3 year
(i3y) Treasury Bond (i.e., Capital Market) rates, along with the 180 day (i180d)
and 90 (i90d) day Bank Accepted Bill (i.e., Money Market) rates. The data are
annualized monthly rates for the period June 1992—August 2010 (T = 219), and
are saved in term_structure.csv.

1. Analyse the integration properties of each individual process: {i3yt }, {i5yt },


{i90dt } and {i180dt }. Based on the data, what inference can we draw about
each of these processes resembling a unit root process?

Solution For this tutorial, we load the following useful packages.

library(forecast)
library(dplyr)
library(zoo)
library(aTSA)

1
It is also useful to create some functions to help automate the task of constructing
adequate sets for ADF specifications. The following two functions estimate the
coefficients and record AIC/BIC values for a range of ADF regressions specified
by lags combined with the inclusion and/or excludion of a constant and/or trend.
One function performs the estimation in levels, while the other does the same in
differences.
# create a function to estimate a range of ADF regression
# specifications in levels along with the AICs and BICs
ADF_estimate_lev <- function(y, p_max = 9)
{
TT <- length(y)
ADF_est <- list()
ic <- matrix(nrow = 3 * (1 + p_max), ncol = 5)
colnames(ic) <- c("const", "trend", "p", "aic", "bic")
i <- 0
for (const in 0:1)
{
for (p in 0:p_max)
{
i <- i + 1
try(silent = T, expr =
{
ADF_est[[i]] <- Arima(diff(y), xreg = y[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = F)
ic[i,] <- c(const, 0, p, ADF_est[[i]]$aic,
ADF_est[[i]]$bic)
})
}

if (const)
{
# only add a specification with trend if there is a
# constant (i.e., exclude no constant with trend)
for (p in 0:p_max)
{
i <- i + 1
try(silent = T, expr =
{
ADF_est[[i]] <- Arima(diff(y), xreg = y[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = T)
ic[i,] <- c(const, 1, p, ADF_est[[i]]$aic,

2
ADF_est[[i]]$bic)
})
}
}
}

ic_aic <- ic[order(ic[,4]),][1:10,]


ic_bic <- ic[order(ic[,5]),][1:10,]

return(list(ADF_est = ADF_est, ic = ic,


ic_aic = ic_aic, ic_bic = ic_bic))
}

# create a function to estimate a range of ADF regression


# specifications in differences along with the AICs and BICs
ADF_estimate_diff <- function(y, p_max = 9)
{
TT <- length(diff(y))
ADF_est_diff <- list()
ic_diff <- matrix(nrow = 3 * (1 + p_max), ncol = 5)
colnames(ic_diff) <- c("const", "trend", "p", "aic", "bic")
i <- 0
for (const in 0:1)
{
for (p in 0:p_max)
{
i <- i + 1
try(silent = T, expr =
{
ADF_est_diff[[i]] <- Arima(diff(diff(y)),
xreg = diff(y)[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = F)
ic_diff[i,] <- c(const, 0, p, ADF_est_diff[[i]]$aic,
ADF_est_diff[[i]]$bic)
})
}

if (const)
{
# only add a specification with trend if there is a
# constant (i.e., exclude no constant with trend)
for (p in 0:p_max)
{

3
i <- i + 1
try(silent = T, expr =
{
ADF_est_diff[[i]] <- Arima(diff(diff(y)),
xreg = diff(y)[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = T)
ic_diff[i,] <- c(const, 1, p, ADF_est_diff[[i]]$aic,
ADF_est_diff[[i]]$bic)
})
}
}
}

ic_aic_diff <- ic_diff[order(ic_diff[,4]),][1:10,]


ic_bic_diff <- ic_diff[order(ic_diff[,5]),][1:10,]

return(list(ADF_est_diff = ADF_est_diff,
ic_diff = ic_diff,
ic_aic_diff = ic_aic_diff,
ic_bic_diff = ic_bic_diff))
}

Next, load the data and and extract the four variables.

mydata <- read.delim("term_structure.csv", header = TRUE,


sep = ",")

dates <- as.yearmon(mydata$obs, format = "%YM%m")


i3y <- mydata$I3Y
i5y <- mydata$I5Y
i90d <- mydata$I90D
i180d <- mydata$I180D

Now, consider the proximity of {i3yt } to a unit root process. We begin by con-
structing an adequate set of ADF regressions in the level of {i3yt }.

i3y_ADF_lev <- ADF_estimate_lev(i3y, p_max = 15)


print(i3y_ADF_lev$ic_aic)

## const trend p aic bic


## [1,] 1 1 10 100.3036 147.6865
## [2,] 1 1 12 101.0781 155.2301

4
## [3,] 1 1 11 101.7485 152.5159
## [4,] 1 0 12 102.8333 153.6008
## [5,] 1 1 13 102.9466 160.4830
## [6,] 1 0 10 103.1882 147.1866
## [7,] 0 0 12 103.5612 150.9442
## [8,] 1 1 14 103.5876 164.5085
## [9,] 1 1 7 103.6625 140.8919
## [10,] 1 0 11 104.2917 151.6746

print(i3y_ADF_lev$ic_bic)

## const trend p aic bic


## [1,] 0 0 1 118.3726 128.5261
## [2,] 1 0 1 116.0791 129.6171
## [3,] 1 0 2 113.7241 130.6466
## [4,] 0 0 2 117.3288 130.8668
## [5,] 1 0 3 110.8348 131.1417
## [6,] 1 1 3 107.5159 131.2074
## [7,] 1 1 2 111.4567 131.7637
## [8,] 1 1 1 115.0552 131.9776
## [9,] 0 0 0 125.2605 132.0295
## [10,] 0 0 3 116.1076 133.0301

The AIC and BIC ranking do not have any specifications in common, so we select
from both top 10 rankings in a way that reflects some agreement. This is obviously
very subjective! The justification we use as follows. From the AIC list, take the
most preferred specification along with a few others that have the lowest BIC
values. Then, do the same using the BIC list.
As result, we obtain the following set of specifications on which we run our residuals
analysis.

i3y_adq_set <- as.matrix(arrange(as.data.frame(


rbind(i3y_ADF_lev$ic_aic[c(1, 6, 9),],
i3y_ADF_lev$ic_bic[c(1, 3, 5:7),])),
const, trend, p))
i3y_adq_idx <- match(data.frame(t(i3y_adq_set[, 1:3])),
data.frame(t(i3y_ADF_lev$ic[, 1:3])))

for (i in 1:length(i3y_adq_idx))
{
checkresiduals(i3y_ADF_lev$ADF_est[[i3y_adq_idx[i]]])
}

5
Residuals from Regression with ARIMA(1,0,0) errors

0 50 100 150 200

0.1 30
ACF

df$y
0.0 20

10
−0.1

0
0 5 10 15 20 −1 0 1 2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 20.038, df = 9, p-value = 0.01768
##
## Model df: 1. Total lags used: 10

6
Residuals from Regression with ARIMA(2,0,0) errors

0 50 100 150 200

40

0.1 30
ACF

df$y
20
0.0

10
−0.1
0
0 5 10 15 20 −1 0 1 2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 18.425, df = 8, p-value = 0.01826
##
## Model df: 2. Total lags used: 10

7
Residuals from Regression with ARIMA(3,0,0) errors

0 50 100 150 200

0.2
40

0.1 30
ACF

df$y
20
0.0

10
−0.1
0
0 5 10 15 20 −1 0 1 2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 16.161, df = 7, p-value = 0.02369
##
## Model df: 3. Total lags used: 10

8
Residuals from Regression with ARIMA(10,0,0) errors

0 50 100 150 200

0.15 40

0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −1 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(10,0,0) errors
## Q* = 5.2834, df = 3, p-value = 0.1522
##
## Model df: 10. Total lags used: 13

9
Residuals from Regression with ARIMA(2,0,0) errors

0 50 100 150 200

30
0.1
ACF

df$y
20
0.0

10
−0.1
0
0 5 10 15 20 −1 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 19.404, df = 8, p-value = 0.01284
##
## Model df: 2. Total lags used: 10

10
Residuals from Regression with ARIMA(3,0,0) errors

0 50 100 150 200

40

0.1 30
ACF

df$y
20
0.0

10
−0.1
0
0 5 10 15 20 −1 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 15.443, df = 7, p-value = 0.03072
##
## Model df: 3. Total lags used: 10

11
Residuals from Regression with ARIMA(7,0,0) errors

0 50 100 150 200

40

0.1
30
ACF

df$y
0.0 20

10
−0.1
0
0 5 10 15 20 −1 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(7,0,0) errors
## Q* = 6.1322, df = 3, p-value = 0.1054
##
## Model df: 7. Total lags used: 10

12
Residuals from Regression with ARIMA(10,0,0) errors

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(10,0,0) errors
## Q* = 4.848, df = 3, p-value = 0.1833
##
## Model df: 10. Total lags used: 13

We reject white noise residuals at the 5% significance level for all models with
p < 7. Hence, we remove all models except the three with p = 7, 10, all containing
a constant and two also containing a trend.
Given our adequate set of ADF regressions, we should run the ADF test with
nlag = 11, but we will use nlag = 15 just to check how sensitive the results are
to including more lags (which the AIC prefers, but the BIC rejects).

adf.test(i3y, nlag = 15)

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -0.896 0.358

13
## [2,] 1 -0.697 0.429
## [3,] 2 -1.220 0.242
## [4,] 3 -1.161 0.264
## [5,] 4 -1.103 0.284
## [6,] 5 -1.022 0.313
## [7,] 6 -0.915 0.352
## [8,] 7 -0.950 0.339
## [9,] 8 -0.761 0.406
## [10,] 9 -0.773 0.402
## [11,] 10 -0.738 0.415
## [12,] 11 -0.807 0.390
## [13,] 12 -0.709 0.425
## [14,] 13 -0.603 0.463
## [15,] 14 -0.645 0.448
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -1.74 0.4298
## [2,] 1 -2.05 0.3087
## [3,] 2 -2.80 0.0634
## [4,] 3 -3.10 0.0295
## [5,] 4 -2.85 0.0553
## [6,] 5 -2.38 0.1779
## [7,] 6 -2.22 0.2424
## [8,] 7 -2.65 0.0893
## [9,] 8 -2.32 0.2036
## [10,] 9 -2.54 0.1154
## [11,] 10 -2.09 0.2924
## [12,] 11 -1.97 0.3400
## [13,] 12 -1.60 0.4828
## [14,] 13 -1.55 0.5018
## [15,] 14 -1.37 0.5683
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -2.27 0.4618
## [2,] 1 -2.90 0.1999
## [3,] 2 -3.29 0.0728
## [4,] 3 -3.79 0.0203
## [5,] 4 -3.51 0.0422
## [6,] 5 -2.97 0.1689
## [7,] 6 -2.90 0.1985
## [8,] 7 -3.45 0.0475
## [9,] 8 -3.28 0.0758
## [10,] 9 -3.63 0.0310
## [11,] 10 -3.05 0.1333
## [12,] 11 -2.83 0.2293

14
## [13,] 12 -2.52 0.3550
## [14,] 13 -2.65 0.3046
## [15,] 14 -2.39 0.4093
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01

For specifications with a constant, no trend, all specifications except p = 3, the


null cannot be rejected at the 5% significance level. For specifications with a
constant and with a trend, the same conclusion holds for all specifications except
p = 3, 4, 7, 9. Our concern is the one with p = 7 since it is in our adequate set.
However, might note that the p-value for p = 7 is 0.475, indicating that if we
choose 4.75% as the significance level, then we should conclude that the null cannot
be rejected for any specification in the adequate set. Is there a great reason to
commit to 5% versus 4.75%? That is a question we would need to consider more
profoundly in this particular case.
Overall we might lean towards concluding {i3yt } is not empirically distinguishable
from a unit root process, with some ambiguity arising from the specification uncer-
tainty that results from the constant with trend and p = 7 specification rejecting
a unit root at the 5% significance level (but not the 4.75% level).
Accordingly, we repeat the exercise for the differenced process {∆i3yt }.

i3y_ADF_diff <- ADF_estimate_diff(i3y, p_max = 15)


print(i3y_ADF_diff$ic_aic_diff)

## const trend p aic bic


## [1,] 0 0 12 77.49903 124.81760
## [2,] 0 0 4 78.52513 98.80452
## [3,] 0 0 10 78.65040 119.20917
## [4,] 0 0 11 78.96325 122.90192
## [5,] 0 0 5 79.34904 103.00832
## [6,] 1 0 12 79.35876 130.05722
## [7,] 0 0 13 79.47963 130.17809
## [8,] 1 0 4 80.46394 104.12322
## [9,] 1 0 10 80.55557 124.49423
## [10,] 1 0 11 80.84817 128.16673

print(i3y_ADF_diff$ic_bic_diff)

## const trend p aic bic


## [1,] 0 0 4 78.52513 98.80452
## [2,] 0 0 1 89.38183 99.52153
## [3,] 0 0 5 79.34904 103.00832
## [4,] 0 0 2 90.39326 103.91285

15
## [5,] 1 0 4 80.46394 104.12322
## [6,] 1 0 1 91.37490 104.89449
## [7,] 0 0 3 90.16004 107.05953
## [8,] 1 0 5 81.28020 108.31938
## [9,] 1 0 2 92.37827 109.27776
## [10,] 1 1 4 82.30717 109.34635

i3y_adq_set_diff <- as.matrix(arrange(as.data.frame(


i3y_ADF_diff$ic_bic_diff[c(1, 3, 5),]),
const, trend, p))
i3y_adq_idx_diff <- match(data.frame(
t(i3y_adq_set_diff[, 1:3])),
data.frame(
t(i3y_ADF_diff$ic_diff[, 1:3])))

for (i in 1:length(i3y_adq_idx_diff))
{
checkresiduals(
i3y_ADF_diff$ADF_est_diff[[i3y_adq_idx_diff[i]]])
}

Residuals from Regression with ARIMA(4,0,0) errors


1.0

0.5

0.0

−0.5

0 50 100 150 200

0.1 30
ACF

df$y

20
0.0

10
−0.1
0
0 5 10 15 20 −0.5 0.0 0.5 1.0
Lag residuals

##
## Ljung-Box test
##

16
## data: Residuals from Regression with ARIMA(4,0,0) errors
## Q* = 7.1851, df = 6, p-value = 0.3041
##
## Model df: 4. Total lags used: 10

Residuals from Regression with ARIMA(5,0,0) errors


1.0

0.5

0.0

−0.5

0 50 100 150 200

0.1 30
ACF

df$y

20
0.0

10
−0.1
0
0 5 10 15 20 −1.0 −0.5 0.0 0.5 1.0
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(5,0,0) errors
## Q* = 6.5018, df = 5, p-value = 0.2604
##
## Model df: 5. Total lags used: 10

17
Residuals from Regression with ARIMA(4,0,0) errors
1.0

0.5

0.0

−0.5

0 50 100 150 200

40

0.1
30
ACF

df$y
0.0 20

10
−0.1
0
0 5 10 15 20 −0.5 0.0 0.5 1.0
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(4,0,0) errors
## Q* = 7.1557, df = 6, p-value = 0.3067
##
## Model df: 4. Total lags used: 10

adf.test(diff(i3y))

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -12.23 0.01
## [2,] 1 -8.31 0.01
## [3,] 2 -6.28 0.01
## [4,] 3 -6.30 0.01
## [5,] 4 -6.85 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -12.21 0.01
## [2,] 1 -8.32 0.01

18
## [3,] 2 -6.29 0.01
## [4,] 3 -6.31 0.01
## [5,] 4 -6.86 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -12.19 0.01
## [2,] 1 -8.30 0.01
## [3,] 2 -6.27 0.01
## [4,] 3 -6.30 0.01
## [5,] 4 -6.84 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01

The null is rejected for all specifications. We conclude {∆i3yt } is empirically


distinguishable from an I(1) process. We can infer that the order of integration is
not greater than one with a high degree of confidence. However, we do not have
conclusive inference on how distinguishable {i3yt } is from an I(1).
Repeating for {i5yt } and {∆i5yt }, we obtain the following.

i5y_ADF_lev <- ADF_estimate_lev(i5y, p_max = 15)


print(i5y_ADF_lev$ic_aic)

## const trend p aic bic


## [1,] 1 1 10 58.66839 106.0513
## [2,] 1 1 3 78.72176 102.4132
## [3,] 1 1 5 79.63041 110.0909
## [4,] 1 1 4 80.33103 107.4070
## [5,] 1 1 7 80.44508 117.6745
## [6,] 1 1 12 80.70523 134.8571
## [7,] 1 1 9 81.16063 125.1591
## [8,] 1 1 11 81.18581 131.9532
## [9,] 1 1 2 81.54914 101.8561
## [10,] 1 1 6 81.61182 115.4568

print(i5y_ADF_lev$ic_bic)

## const trend p aic bic


## [1,] 0 0 1 86.51082 96.66430
## [2,] 1 0 1 84.62997 98.16795
## [3,] 0 0 2 86.58971 100.12769
## [4,] 1 1 1 83.38674 100.30922
## [5,] 1 0 2 83.91952 100.84199
## [6,] 0 0 0 94.42651 101.19550
## [7,] 1 1 2 81.54914 101.85611

19
## [8,] 1 1 3 78.72176 102.41322
## [9,] 1 0 3 82.46661 102.77358
## [10,] 0 0 3 86.18707 103.10955

i5y_adq_set <- as.matrix(arrange(as.data.frame(


rbind(i5y_ADF_lev$ic_aic[1,],
i5y_ADF_lev$ic_bic[c(1, 7:8),])),
const, trend, p))
i5y_adq_idx <- match(data.frame(t(i5y_adq_set[, 1:3])),
data.frame(t(i5y_ADF_lev$ic[, 1:3])))

for (i in 1:length(i5y_adq_idx))
{
checkresiduals(i5y_ADF_lev$ADF_est[[i5y_adq_idx[i]]])
}

Residuals from Regression with ARIMA(1,0,0) errors


1.5

1.0

0.5

0.0

−0.5

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y

0.00 20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 14.115, df = 9, p-value = 0.1183
##
## Model df: 1. Total lags used: 10

20
Residuals from Regression with ARIMA(2,0,0) errors
1.5

1.0

0.5

0.0

−0.5

0 50 100 150 200

0.15 40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −1.0 −0.5 0.0 0.5 1.0 1.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 14.329, df = 8, p-value = 0.07357
##
## Model df: 2. Total lags used: 10

21
Residuals from Regression with ARIMA(3,0,0) errors
1.5

1.0

0.5

0.0

−0.5

0 50 100 150 200

0.15
40
0.10

0.05 30
ACF

df$y
0.00 20
−0.05
10
−0.10
0
0 5 10 15 20 −1 0 1
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 11.163, df = 7, p-value = 0.1317
##
## Model df: 3. Total lags used: 10

22
Residuals from Regression with ARIMA(10,0,0) errors
1.0

0.5

0.0

−0.5

0 50 100 150 200

0.15
40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.5 0.0 0.5 1.0
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(10,0,0) errors
## Q* = 3.6386, df = 3, p-value = 0.3032
##
## Model df: 10. Total lags used: 13

adf.test(i5y, nlag = 15)

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -1.058 0.300
## [2,] 1 -0.796 0.394
## [3,] 2 -1.268 0.225
## [4,] 3 -1.133 0.274
## [5,] 4 -1.125 0.276
## [6,] 5 -1.130 0.275
## [7,] 6 -1.017 0.315
## [8,] 7 -1.008 0.318
## [9,] 8 -0.816 0.387

23
## [10,] 9 -0.784 0.398
## [11,] 10 -0.785 0.398
## [12,] 11 -0.913 0.352
## [13,] 12 -0.823 0.384
## [14,] 13 -0.729 0.418
## [15,] 14 -0.725 0.419
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -1.69 0.4488
## [2,] 1 -1.88 0.3735
## [3,] 2 -2.63 0.0922
## [4,] 3 -2.75 0.0730
## [5,] 4 -2.60 0.0972
## [6,] 5 -2.30 0.2089
## [7,] 6 -2.13 0.2773
## [8,] 7 -2.34 0.1955
## [9,] 8 -2.01 0.3246
## [10,] 9 -2.22 0.2403
## [11,] 10 -1.99 0.3310
## [12,] 11 -1.91 0.3620
## [13,] 12 -1.56 0.5015
## [14,] 13 -1.34 0.5782
## [15,] 14 -1.16 0.6408
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -2.21 0.4869
## [2,] 1 -2.78 0.2467
## [3,] 2 -3.22 0.0851
## [4,] 3 -3.59 0.0347
## [5,] 4 -3.39 0.0565
## [6,] 5 -2.96 0.1745
## [7,] 6 -2.88 0.2073
## [8,] 7 -3.21 0.0868
## [9,] 8 -3.04 0.1379
## [10,] 9 -3.48 0.0455
## [11,] 10 -3.15 0.0973
## [12,] 11 -2.86 0.2133
## [13,] 12 -2.51 0.3591
## [14,] 13 -2.41 0.4049
## [15,] 14 -2.24 0.4752
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01

i5y_ADF_diff <- ADF_estimate_diff(i5y, p_max = 15)


print(i5y_ADF_diff$ic_aic_diff)

24
## const trend p aic bic
## [1,] 0 0 4 53.98123 74.26061
## [2,] 0 0 12 54.93903 102.25759
## [3,] 0 0 5 55.07313 78.73242
## [4,] 1 0 4 55.81398 79.47326
## [5,] 0 0 13 56.01841 106.71687
## [6,] 0 0 6 56.29025 83.32943
## [7,] 1 0 12 56.61626 107.31472
## [8,] 1 0 5 56.89064 83.92982
## [9,] 0 0 11 57.33228 101.27094
## [10,] 1 0 13 57.65940 111.73776

print(i5y_ADF_diff$ic_bic_diff)

## const trend p aic bic


## [1,] 0 0 1 62.26382 72.40352
## [2,] 0 0 4 53.98123 74.26061
## [3,] 0 0 2 62.62992 76.14951
## [4,] 1 0 1 64.20689 77.72648
## [5,] 0 0 5 55.07313 78.73242
## [6,] 0 0 3 61.92017 78.81966
## [7,] 1 0 4 55.81398 79.47326
## [8,] 1 0 2 64.54523 81.44472
## [9,] 1 1 1 66.04324 82.94273
## [10,] 0 0 6 56.29025 83.32943

i5y_adq_set_diff <- as.matrix(arrange(as.data.frame(


i5y_ADF_diff$ic_bic_diff[c(2, 5, 7, 10),]),
const, trend, p))
i5y_adq_idx_diff <- match(data.frame(
t(i5y_adq_set_diff[, 1:3])),
data.frame(
t(i5y_ADF_diff$ic_diff[, 1:3])))

for (i in 1:length(i5y_adq_idx_diff))
{
checkresiduals(
i5y_ADF_diff$ADF_est_diff[[i5y_adq_idx_diff[i]]])
}

25
Residuals from Regression with ARIMA(4,0,0) errors

0.5

0.0

−0.5

0 50 100 150 200

0.10
30
0.05
ACF

df$y
20
0.00

−0.05
10
−0.10
0
0 5 10 15 20 −1.0 −0.5 0.0 0.5 1.0
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(4,0,0) errors
## Q* = 3.8498, df = 6, p-value = 0.697
##
## Model df: 4. Total lags used: 10

26
Residuals from Regression with ARIMA(5,0,0) errors
1.0

0.5

0.0

−0.5

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.5 0.0 0.5 1.0
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(5,0,0) errors
## Q* = 3.42, df = 5, p-value = 0.6355
##
## Model df: 5. Total lags used: 10

27
Residuals from Regression with ARIMA(6,0,0) errors
1.0

0.5

0.0

−0.5

0 50 100 150 200

0.10 40

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10
0
0 5 10 15 20 −0.5 0.0 0.5 1.0
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(6,0,0) errors
## Q* = 3.2818, df = 4, p-value = 0.5118
##
## Model df: 6. Total lags used: 10

28
Residuals from Regression with ARIMA(4,0,0) errors
1.0

0.5

0.0

−0.5

0 50 100 150 200

0.10
30
0.05
ACF

df$y
20
0.00

−0.05
10
−0.10
0
0 5 10 15 20 −0.5 0.0 0.5 1.0
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(4,0,0) errors
## Q* = 3.8035, df = 6, p-value = 0.7032
##
## Model df: 4. Total lags used: 10

adf.test(diff(i5y))

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -12.09 0.01
## [2,] 1 -8.35 0.01
## [3,] 2 -6.39 0.01
## [4,] 3 -6.31 0.01
## [5,] 4 -6.71 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -12.08 0.01
## [2,] 1 -8.37 0.01

29
## [3,] 2 -6.41 0.01
## [4,] 3 -6.33 0.01
## [5,] 4 -6.73 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -12.05 0.01
## [2,] 1 -8.35 0.01
## [3,] 2 -6.39 0.01
## [4,] 3 -6.32 0.01
## [5,] 4 -6.72 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01

For {i5yt }, we fail to reject H0 at the 5% significance level for all specifications
except with a constant, trend and p = 3, 9. Again, p = 3 is one of four specifica-
tions in the adequate set. As with {i3yt }, we might lean towards inferring that
{i5yt } is not empirically distinguishable from a unit root process, but there would
be a nontrivial degree of uncertainty in this conclusion.
On the other hand, the null is rejected in all specifications for {∆i5yt }, leading
us to conclude that it is empirically distinguishable from I(1). Thus, we can infer
that {i5yt } is empirically distinguishable from any unit root process with an order
of integration greater than one, with a high degree of confidence. However, we do
not have conclusive inference on how distinguishable it is from an I(1).
Repeating for {i90dt } and {∆i90dt }, we obtain the following.

i90d_ADF_lev <- ADF_estimate_lev(i90d, p_max = 15)


print(i90d_ADF_lev$ic_aic)

## const trend p aic bic


## [1,] 1 0 4 -124.6213 -100.96201
## [2,] 1 1 4 -123.5817 -96.54254
## [3,] 1 0 5 -123.1565 -96.11733
## [4,] 1 1 5 -122.0429 -91.62381
## [5,] 1 0 3 -121.5855 -101.30607
## [6,] 1 0 6 -121.2449 -90.82580
## [7,] 1 0 8 -120.9319 -83.75307
## [8,] 1 1 3 -120.3162 -96.65695
## [9,] 1 1 6 -120.1657 -86.36673
## [10,] 1 1 8 -120.1257 -79.56690

print(i90d_ADF_lev$ic_bic)

## const trend p aic bic

30
## [1,] 1 0 1 -116.7680 -103.24839
## [2,] 1 0 2 -119.4153 -102.51579
## [3,] 0 0 1 -111.5606 -101.42094
## [4,] 1 0 3 -121.5855 -101.30607
## [5,] 1 0 4 -124.6213 -100.96201
## [6,] 1 1 1 -115.5881 -98.68865
## [7,] 0 0 2 -112.1658 -98.64624
## [8,] 1 1 2 -118.3177 -98.03834
## [9,] 0 0 4 -117.4682 -97.18884
## [10,] 1 1 3 -120.3162 -96.65695

i90d_adq_set <- as.matrix(arrange(as.data.frame(


rbind(i90d_ADF_lev$ic_aic[c(1, 5, 8),],
i90d_ADF_lev$ic_bic[1,])),
const, trend, p))
i90d_adq_idx <- match(data.frame(t(i90d_adq_set[, 1:3])),
data.frame(t(i90d_ADF_lev$ic[, 1:3])))

for (i in 1:length(i90d_adq_idx))
{
checkresiduals(i90d_ADF_lev$ADF_est[[i90d_adq_idx[i]]])
}

Residuals from Regression with ARIMA(1,0,0) errors


0.5

0.0

−0.5

−1.0

0 50 100 150 200

0.2 40

0.1 30
ACF

df$y

0.0 20

10
−0.1

0
0 5 10 15 20 −1.0 −0.5 0.0 0.5
Lag residuals

##

31
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 19.272, df = 9, p-value = 0.02298
##
## Model df: 1. Total lags used: 10

Residuals from Regression with ARIMA(3,0,0) errors


0.5

0.0

−0.5

−1.0

0 50 100 150 200

0.15
40
0.10

0.05 30
ACF

df$y

0.00 20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 8.1896, df = 7, p-value = 0.3162
##
## Model df: 3. Total lags used: 10

32
Residuals from Regression with ARIMA(4,0,0) errors
0.5

0.0

−0.5

−1.0

0 50 100 150 200

40
0.10

0.05 30
ACF

df$y
0.00 20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(4,0,0) errors
## Q* = 5.2493, df = 6, p-value = 0.5123
##
## Model df: 4. Total lags used: 10

33
Residuals from Regression with ARIMA(3,0,0) errors
0.5

0.0

−0.5

−1.0

0 50 100 150 200

40
0.10

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 8.3316, df = 7, p-value = 0.3043
##
## Model df: 3. Total lags used: 10

adf.test(i90d)

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -0.534 0.488
## [2,] 1 -0.726 0.419
## [3,] 2 -0.839 0.379
## [4,] 3 -0.832 0.381
## [5,] 4 -0.764 0.406
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -1.38 0.5646
## [2,] 1 -2.78 0.0665

34
## [3,] 2 -3.19 0.0230
## [4,] 3 -3.71 0.0100
## [5,] 4 -3.16 0.0243
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -1.50 0.7837
## [2,] 1 -2.89 0.2003
## [3,] 2 -3.28 0.0749
## [4,] 3 -3.84 0.0178
## [5,] 4 -3.29 0.0736
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01

For all specifications in the adequate set, we reject H0 at the 5% significance level
and conclude that {i90dt } is empirically distinguishable from a unit root process.
Since we infer that i90d is I(0), there is no need to run the test on {∆i90dt }.
Repeating for {i180dt } and {∆i180dt }, we obtain the following.

i180d_ADF_lev <- ADF_estimate_lev(i180d, p_max = 15)


print(i180d_ADF_lev$ic_aic)

## const trend p aic bic


## [1,] 1 0 4 -39.46503 -15.8057520
## [2,] 1 1 4 -38.43833 -11.3991539
## [3,] 1 0 3 -38.40986 -18.1304805
## [4,] 1 0 5 -37.46849 -10.4293151
## [5,] 1 1 3 -37.21378 -13.5544981
## [6,] 1 1 5 -36.43834 -6.0192666
## [7,] 1 0 2 -35.76414 -18.8646555
## [8,] 1 0 6 -35.62162 -5.2025467
## [9,] 1 1 2 -34.75054 -14.4711549
## [10,] 1 1 6 -34.63545 -0.8364764

print(i180d_ADF_lev$ic_bic)

## const trend p aic bic


## [1,] 1 0 1 -34.49656 -20.97697
## [2,] 1 0 2 -35.76414 -18.86466
## [3,] 0 0 1 -28.62834 -18.48865
## [4,] 1 0 3 -38.40986 -18.13048
## [5,] 1 1 1 -33.38467 -16.48518
## [6,] 1 0 4 -39.46503 -15.80575
## [7,] 0 0 2 -28.05423 -14.53464
## [8,] 1 1 2 -34.75054 -14.47115

35
## [9,] 1 1 3 -37.21378 -13.55450
## [10,] 0 0 3 -29.18731 -12.28782

i180d_adq_set <- as.matrix(arrange(as.data.frame(


rbind(i180d_ADF_lev$ic_aic[c(1:3, 5, 7, 9),],
i180d_ADF_lev$ic_bic[c(1, 5),])),
const, trend, p))
i180d_adq_idx <- match(data.frame(t(i180d_adq_set[, 1:3])),
data.frame(t(i180d_ADF_lev$ic[, 1:3])))

for (i in 1:length(i180d_adq_idx))
{
checkresiduals(i180d_ADF_lev$ADF_est[[i180d_adq_idx[i]]])
}

Residuals from Regression with ARIMA(1,0,0) errors


0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

0.2 40

0.1 30
ACF

df$y

20
0.0

10
−0.1
0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 11.854, df = 9, p-value = 0.2217
##
## Model df: 1. Total lags used: 10

36
Residuals from Regression with ARIMA(2,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40
0.1
30
ACF

df$y
0.0 20

10
−0.1
0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 9.7071, df = 8, p-value = 0.2862
##
## Model df: 2. Total lags used: 10

37
Residuals from Regression with ARIMA(3,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40

0.1
30
ACF

df$y
0.0 20

10
−0.1
0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 4.2242, df = 7, p-value = 0.7536
##
## Model df: 3. Total lags used: 10

38
Residuals from Regression with ARIMA(4,0,0) errors

0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40

0.1
30
ACF

df$y
0.0 20

10
−0.1

0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(4,0,0) errors
## Q* = 2.7058, df = 6, p-value = 0.8448
##
## Model df: 4. Total lags used: 10

39
Residuals from Regression with ARIMA(1,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

0.2
40

0.1 30
ACF

df$y
20
0.0

10
−0.1
0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 11.99, df = 9, p-value = 0.2138
##
## Model df: 1. Total lags used: 10

40
Residuals from Regression with ARIMA(2,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40

0.1
30
ACF

df$y
20
0.0

10
−0.1
0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 9.6403, df = 8, p-value = 0.2912
##
## Model df: 2. Total lags used: 10

41
Residuals from Regression with ARIMA(3,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40

0.1
30
ACF

df$y
0.0 20

10
−0.1

0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 4.3218, df = 7, p-value = 0.7421
##
## Model df: 3. Total lags used: 10

42
Residuals from Regression with ARIMA(4,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40
0.1
30
ACF

df$y
0.0 20

10
−0.1

0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(4,0,0) errors
## Q* = 2.7166, df = 6, p-value = 0.8435
##
## Model df: 4. Total lags used: 10

adf.test(i180d)

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -0.474 0.507
## [2,] 1 -0.724 0.420
## [3,] 2 -0.843 0.377
## [4,] 3 -0.833 0.381
## [5,] 4 -0.768 0.404
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -1.57 0.4943
## [2,] 1 -2.88 0.0504

43
## [3,] 2 -3.19 0.0228
## [4,] 3 -3.72 0.0100
## [5,] 4 -3.28 0.0184
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -1.71 0.6989
## [2,] 1 -2.99 0.1602
## [3,] 2 -3.28 0.0756
## [4,] 3 -3.84 0.0178
## [5,] 4 -3.40 0.0538
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01

i180d_ADF_diff <- ADF_estimate_diff(i180d, p_max = 15)


print(i180d_ADF_diff$ic_aic_diff)

## const trend p aic bic


## [1,] 0 0 3 -31.88034 -15.003950
## [2,] 0 0 2 -30.21348 -16.712366
## [3,] 0 0 4 -30.03514 -9.783466
## [4,] 1 0 3 -29.90334 -9.651665
## [5,] 0 0 0 -29.72709 -22.976533
## [6,] 0 0 1 -29.25261 -19.126772
## [7,] 1 0 2 -28.24426 -11.367863
## [8,] 0 0 5 -28.05966 -4.432713
## [9,] 1 0 4 -28.05856 -4.431613
## [10,] 1 1 3 -27.93277 -4.305817

print(i180d_ADF_diff$ic_bic_diff)

## const trend p aic bic


## [1,] 0 0 0 -29.72709 -22.976533
## [2,] 0 0 1 -29.25261 -19.126772
## [3,] 1 0 0 -27.75244 -17.626607
## [4,] 0 0 2 -30.21348 -16.712366
## [5,] 0 0 3 -31.88034 -15.003950
## [6,] 1 0 1 -27.28159 -13.780481
## [7,] 1 1 0 -25.79273 -12.291616
## [8,] 1 0 2 -28.24426 -11.367863
## [9,] 0 0 4 -30.03514 -9.783466
## [10,] 1 0 3 -29.90334 -9.651665

44
i180d_adq_set_diff <- as.matrix(arrange(as.data.frame(
intersect(as.data.frame(i180d_ADF_diff$ic_aic_diff),
as.data.frame(i180d_ADF_diff$ic_bic_diff))),
const, trend, p))

i180d_adq_idx_diff <- match(data.frame(


t(i180d_adq_set_diff[, 1:3])),
data.frame(
t(i180d_ADF_diff$ic_diff[, 1:3])))

for (i in 1:length(i180d_adq_idx_diff))
{
checkresiduals(i180d_ADF_diff$ADF_est_diff[[i180d_adq_idx_diff[i]]])
}

Residuals from Regression with ARIMA(0,0,0) errors


0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

0.2
40

0.1 30
ACF

df$y

0.0 20

10
−0.1

0
0 5 10 15 20 −1 0
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(0,0,0) errors
## Q* = 10.653, df = 10, p-value = 0.3852
##
## Model df: 0. Total lags used: 10

45
Residuals from Regression with ARIMA(1,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40
0.1
30
ACF

df$y
0.0
20

10
−0.1

0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 8.8595, df = 9, p-value = 0.4503
##
## Model df: 1. Total lags used: 10

46
Residuals from Regression with ARIMA(2,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40
0.1
30
ACF

df$y
0.0
20

−0.1 10

0
0 5 10 15 20 −1 0
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 5.4383, df = 8, p-value = 0.7099
##
## Model df: 2. Total lags used: 10

47
Residuals from Regression with ARIMA(3,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40
0.1

30
ACF

df$y
0.0
20

−0.1 10

0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 2.9199, df = 7, p-value = 0.8923
##
## Model df: 3. Total lags used: 10

48
Residuals from Regression with ARIMA(4,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40
0.1

30
0.0
ACF

df$y
20

−0.1 10

0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(4,0,0) errors
## Q* = 3.0154, df = 6, p-value = 0.8069
##
## Model df: 4. Total lags used: 10

49
Residuals from Regression with ARIMA(2,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

40
0.1
30
ACF

df$y
0.0
20

−0.1 10

0
0 5 10 15 20 −1 0
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 5.4376, df = 8, p-value = 0.7099
##
## Model df: 2. Total lags used: 10

50
Residuals from Regression with ARIMA(3,0,0) errors
0.5

0.0

−0.5

−1.0

−1.5
0 50 100 150 200

0.1 40

30
ACF

0.0

df$y
20

−0.1 10

0
0 5 10 15 20 −1.5 −1.0 −0.5 0.0 0.5
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 2.9201, df = 7, p-value = 0.8923
##
## Model df: 3. Total lags used: 10

adf.test(diff(i180d), nlag = 10)

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -8.33 0.01
## [2,] 1 -6.71 0.01
## [3,] 2 -5.33 0.01
## [4,] 3 -5.74 0.01
## [5,] 4 -5.44 0.01
## [6,] 5 -5.36 0.01
## [7,] 6 -4.98 0.01
## [8,] 7 -5.13 0.01
## [9,] 8 -4.67 0.01

51
## [10,] 9 -4.76 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -8.31 0.01
## [2,] 1 -6.70 0.01
## [3,] 2 -5.32 0.01
## [4,] 3 -5.73 0.01
## [5,] 4 -5.43 0.01
## [6,] 5 -5.35 0.01
## [7,] 6 -4.97 0.01
## [8,] 7 -5.11 0.01
## [9,] 8 -4.66 0.01
## [10,] 9 -4.74 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -8.29 0.01
## [2,] 1 -6.68 0.01
## [3,] 2 -5.31 0.01
## [4,] 3 -5.72 0.01
## [5,] 4 -5.42 0.01
## [6,] 5 -5.34 0.01
## [7,] 6 -4.96 0.01
## [8,] 7 -5.12 0.01
## [9,] 8 -4.67 0.01
## [10,] 9 -4.76 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01

For {i180dt }, we reject H0 at the 5% significance level in some specifications in


the adequate set but not others. The ADF results are not sufficiently accurate to
ascertain the proximity of the DGP to a unit root process.
For {∆i180dt }, specifications with lag lengths up to 9 lead to H0 being universally
rejected at a very small significance level. The process {∆i180dt } is clearly distin-
guishable from a unit root process. However, we cannot say how close {i180dt } is
to an I(1) using the ADF testing approach.

2. Use the Engle-Granger method to test for a cointegrating relation involving


all four processes. Assume the 5 year TB rate is the dependent variable in
the initial regression. Hint: Use the coint.test function provided by the
aTSA package.

52
Solution We need to construct an adequate set of ADF specifications for the
estimated residuals from the regression of i5yt on a constant, i3yt , i90dt , and
i180dt . A regression in R is implemented using the lm function.

eg_reg <- lm( i5y ~ i3y + i90d + i180d, mydata)


eg_res <- eg_reg$residuals

Now, use the same approach as in Question 1 but with eg_res instead of an
observed sample.

egr_ADF_lev <- ADF_estimate_lev(eg_res, p_max = 15)


print(egr_ADF_lev$ic_aic)

## const trend p aic bic


## [1,] 0 0 2 -420.1848 -406.6652
## [2,] 0 0 1 -420.1357 -409.9961
## [3,] 0 0 0 -419.9250 -413.1652
## [4,] 0 0 3 -418.5450 -401.6455
## [5,] 1 0 2 -418.2587 -401.3592
## [6,] 0 0 15 -418.2065 -360.7482
## [7,] 1 0 1 -418.1856 -404.6660
## [8,] 1 0 0 -417.9520 -407.8123
## [9,] 1 0 3 -416.6074 -396.3280
## [10,] 1 1 0 -416.5675 -403.0479

print(egr_ADF_lev$ic_bic)

## const trend p aic bic


## [1,] 0 0 0 -419.9250 -413.1652
## [2,] 0 0 1 -420.1357 -409.9961
## [3,] 1 0 0 -417.9520 -407.8123
## [4,] 0 0 2 -420.1848 -406.6652
## [5,] 1 0 1 -418.1856 -404.6660
## [6,] 1 1 0 -416.5675 -403.0479
## [7,] 0 0 3 -418.5450 -401.6455
## [8,] 1 0 2 -418.2587 -401.3592
## [9,] 1 1 1 -416.5125 -399.6130
## [10,] 1 0 3 -416.6074 -396.3280

We will only consider specifications without a constant or trend since we are


focusing on residuals.

53
egr_adq_set <- as.matrix(arrange(as.data.frame(
egr_ADF_lev$ic_bic[c(1, 2, 4, 7), ]),
const, trend, p))
egr_adq_idx <- match(data.frame(t(egr_adq_set[, 1:3])),
data.frame(t(egr_ADF_lev$ic[, 1:3])))

for (i in 1:length(egr_adq_idx))
{
checkresiduals(egr_ADF_lev$ADF_est[[egr_adq_idx[i]]])
}

Residuals from Regression with ARIMA(0,0,0) errors


0.4

0.2

0.0

−0.2

0 50 100 150 200

0.1 40

30
ACF

df$y

0.0
20

−0.1 10

0
0 5 10 15 20 −0.2 0.0 0.2 0.4
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(0,0,0) errors
## Q* = 9.1103, df = 10, p-value = 0.5217
##
## Model df: 0. Total lags used: 10

54
Residuals from Regression with ARIMA(1,0,0) errors

0.2

0.0

−0.2

0 50 100 150 200

50
0.10
40
0.05
30
ACF

df$y
0.00
20
−0.05

−0.10 10

−0.15 0
0 5 10 15 20 −0.2 0.0 0.2 0.4
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 7.2714, df = 9, p-value = 0.6089
##
## Model df: 1. Total lags used: 10

55
Residuals from Regression with ARIMA(2,0,0) errors

0.2

0.0

−0.2

0 50 100 150 200

40
0.10

0.05 30
ACF

df$y
0.00 20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 5.1884, df = 8, p-value = 0.7373
##
## Model df: 2. Total lags used: 10

56
Residuals from Regression with ARIMA(3,0,0) errors

0.2

0.0

−0.2

0 50 100 150 200

0.15

0.10 40

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10
0
−0.15
0 5 10 15 20 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 5.7923, df = 7, p-value = 0.5642
##
## Model df: 3. Total lags used: 10

All residuals look OK. Hence, we can continue and use the function coint.test
from the aTSA package to implement the Engle-Granger test for each of the specifi-
cations in the adequate set. The inputs to coint.test are the dependent variable
in the regression, a matrix containing independent variables and the number of
lags to use in the unit root test on the residuals series.
We use a for loop to implement the test on each of the specifications in the
adequate set. Instead of generating output at each iteration, we use the option
output = F to suppress it and store the p-values in an easy-to-read table.

eg_test <- matrix(nrow = 1, ncol = 4)


colnames(eg_test) <- rep("", 4)
rownames(eg_test) <- c("No const, no trend")
for (l in 1:4)
{
eg_l <- coint.test(i5y, cbind(i3y, i90d, i180d),
nlag = l, output = F)

57
eg_test[, l] <- eg_l[1, 3]
colnames(eg_test)[l] <- paste("Lag", l)
}
print(eg_test)

## Lag 1 Lag 2 Lag 3 Lag 4


## No const, no trend 0.01 0.01 0.01 0.01

For specifiations with no constant and no trend, the unit root in the residuals
is rejected at low significance levels. The best inference we can draw is that if
the residual in the regression i5yt on a constant, i3yt , i90dt , and i180dt is mean-
independent, then it also does not have a unit root. The reasoning behind this is
that if a residual series is mean-independent of the regressors, then its uncondi-
tional mean is zero. This scenario matches ADF specifications above that restrict
the constant to be zero.

3. Interpret the inference obtained Questions 1 and 2 in terms of empirical


evidence of cointegration in the four interest rates.

Solution In Question 1, we concluded that i3yt is not empirically distinguishable


from I(1), but for the remaining three processes our inference on their proximity
to I(1) processes is rather ambiguous.
When we regress i5yt on i3yt , i90dt and i180dt , we find that the residuals process
does not have a unit root if we enforce the restriction that residuals are mean-
independent. Assuming this restriction is valid, we have the following possibilities:

1. i3yt , i5yt , i90dt and i180dt are all I(0);

2. any three processes are I(1) and cointegrated while a fourth is I(0); for
example, we could have that i3yt , i5yt and i90dt are cointegrated and i180dt
is is I(0). The same could hold for any other combination.

3. Any two processes are I(1) and cointegrated while the other two are I(0); for
example, we could have that i3yt and i5yt are cointegrated while i90dt and
i180dt are both I(0).

4. any two processes are I(1) and cointegrated, and the other two processes are
also I(1) and cointegrated, but the four processes are not all cointegrated
with each other in a single cointegrating relation;

58
5. all four processes are I(1) and cointegrated in a single cointegrating relation.

Which of these five scenarios prevails? It depends on what we assume about the
integration properties of the processes involved. Our unit root tests in Question 1
did not clearly reject a unit root in any of the processes, except {i90dt }. If {i90dt }
is I(0), then we can rule out scenarios 4 and 5.
In terms of scenarios 1-3, we can in principle make the unit root assumption about
any combination of i3yt on i5yt and i180dt , which will determine the appropriate
interpretation. The important thing to remember is that it is always an assump-
tion that a unit root exists! Whether or not it is a useful one depends on the
application.

4. Repeat Question 2 three more times but each time change the dependent
variable. Is the inference regarding cointegration affected?

Solution The steps are exactly the same if we replace the dependent variable.
For example, letting i3yt be the dependent variable, we obtain the following.

eg_reg <- lm( i3y ~ i5y + i90d + i180d, mydata)


eg_res <- eg_reg$residuals

egr_ADF_lev <- ADF_estimate_lev(eg_res, p_max = 15)


print(egr_ADF_lev$ic_aic)

## const trend p aic bic


## [1,] 0 0 0 -490.4155 -483.6557
## [2,] 0 0 1 -490.1551 -480.0154
## [3,] 0 0 2 -489.8021 -476.2825
## [4,] 0 0 3 -488.4683 -471.5688
## [5,] 1 0 0 -488.4332 -478.2935
## [6,] 1 0 1 -488.1889 -474.6693
## [7,] 1 0 2 -487.8534 -470.9539
## [8,] 0 0 15 -486.5836 -429.1253
## [9,] 1 0 3 -486.5075 -466.2281
## [10,] 0 0 4 -486.4786 -466.1992

print(egr_ADF_lev$ic_bic)

59
## const trend p aic bic
## [1,] 0 0 0 -490.4155 -483.6557
## [2,] 0 0 1 -490.1551 -480.0154
## [3,] 1 0 0 -488.4332 -478.2935
## [4,] 0 0 2 -489.8021 -476.2825
## [5,] 1 0 1 -488.1889 -474.6693
## [6,] 1 1 0 -486.4333 -472.9137
## [7,] 0 0 3 -488.4683 -471.5688
## [8,] 1 0 2 -487.8534 -470.9539
## [9,] 1 1 1 -486.2086 -469.3091
## [10,] 1 0 3 -486.5075 -466.2281

egr_adq_set <- as.matrix(arrange(as.data.frame(


egr_ADF_lev$ic_bic[c(1, 2, 4, 7), ]),
const, trend, p))
egr_adq_idx <- match(data.frame(t(egr_adq_set[, 1:3])),
data.frame(t(egr_ADF_lev$ic[, 1:3])))

for (i in 1:length(egr_adq_idx))
{
checkresiduals(egr_ADF_lev$ADF_est[[egr_adq_idx[i]]])
}

Residuals from Regression with ARIMA(0,0,0) errors

0.2
0.1
0.0
−0.1
−0.2
−0.3
0 50 100 150 200

0.1 40

30
ACF

df$y

0.0
20

−0.1 10

0
0 5 10 15 20 −0.2 0.0 0.2
Lag residuals

##

60
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(0,0,0) errors
## Q* = 7.9494, df = 10, p-value = 0.6338
##
## Model df: 0. Total lags used: 10

Residuals from Regression with ARIMA(1,0,0) errors


0.2
0.1
0.0
−0.1
−0.2
−0.3
0 50 100 150 200

50
0.10
40
0.05
30
ACF

df$y

0.00
20
−0.05

−0.10 10

−0.15 0
0 5 10 15 20 −0.3 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 6.6997, df = 9, p-value = 0.6684
##
## Model df: 1. Total lags used: 10

61
Residuals from Regression with ARIMA(2,0,0) errors
0.2

0.1

0.0

−0.1

−0.2

−0.3
0 50 100 150 200

0.10 40

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.3 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 5.096, df = 8, p-value = 0.7473
##
## Model df: 2. Total lags used: 10

62
Residuals from Regression with ARIMA(3,0,0) errors
0.2
0.1
0.0
−0.1
−0.2
−0.3
0 50 100 150 200

0.10 40

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.3 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 5.5928, df = 7, p-value = 0.588
##
## Model df: 3. Total lags used: 10

eg_test <- matrix(nrow = 1, ncol = 4)


colnames(eg_test) <- rep("", 4)
rownames(eg_test) <- c("No const, no trend")
for (l in 1:4)
{
eg_l <- coint.test(i3y, cbind(i5y, i90d, i180d),
nlag = l, output = F)
eg_test[, l] <- eg_l[1, 3]
colnames(eg_test)[l] <- paste("Lag", l)
}
print(eg_test)

## Lag 1 Lag 2 Lag 3 Lag 4


## No const, no trend 0.01 0.01 0.01 0.01

We obtain the same results as with i5yt being the dependent variable. In fact, noth-
ing of substance changes by setting i90dt or i180dt to be the dependent variables

63
either. Changing the dependent variable does not materially affect our inference
about cointegrating relations involving these four processes.

eg_reg <- lm( i90d ~ i3y + i5y + i180d, mydata)


eg_res <- eg_reg$residuals

egr_ADF_lev <- ADF_estimate_lev(eg_res, p_max = 15)


print(egr_ADF_lev$ic_aic)

## const trend p aic bic


## [1,] 0 0 0 -452.4299 -445.6701
## [2,] 0 0 3 -450.8043 -433.9048
## [3,] 0 0 1 -450.7693 -440.6296
## [4,] 1 0 0 -450.4913 -440.3516
## [5,] 1 1 5 -450.1861 -419.7670
## [6,] 1 1 0 -450.0606 -436.5410
## [7,] 1 0 3 -448.8389 -428.5595
## [8,] 1 0 1 -448.8348 -435.3152
## [9,] 0 0 2 -448.8249 -435.3053
## [10,] 0 0 4 -448.8044 -428.5250

print(egr_ADF_lev$ic_bic)

## const trend p aic bic


## [1,] 0 0 0 -452.4299 -445.6701
## [2,] 0 0 1 -450.7693 -440.6296
## [3,] 1 0 0 -450.4913 -440.3516
## [4,] 1 1 0 -450.0606 -436.5410
## [5,] 1 0 1 -448.8348 -435.3152
## [6,] 0 0 2 -448.8249 -435.3053
## [7,] 0 0 3 -450.8043 -433.9048
## [8,] 1 1 1 -448.3139 -431.4144
## [9,] 1 0 2 -446.8931 -429.9936
## [10,] 1 0 3 -448.8389 -428.5595

egr_adq_set <- as.matrix(arrange(as.data.frame(


egr_ADF_lev$ic_bic[c(1, 2, 6, 7), ]),
const, trend, p))
egr_adq_idx <- match(data.frame(t(egr_adq_set[, 1:3])),
data.frame(t(egr_ADF_lev$ic[, 1:3])))

for (i in 1:length(egr_adq_idx))
{
checkresiduals(egr_ADF_lev$ADF_est[[egr_adq_idx[i]]])
}

64
Residuals from Regression with ARIMA(0,0,0) errors
0.50

0.25

0.00

−0.25
0 50 100 150 200

0.1 40

30
ACF

df$y
0.0
20

−0.1 10

0
0 5 10 15 20 −0.25 0.00 0.25 0.50
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(0,0,0) errors
## Q* = 6.4587, df = 10, p-value = 0.7754
##
## Model df: 0. Total lags used: 10

65
Residuals from Regression with ARIMA(1,0,0) errors
0.50

0.25

0.00

−0.25
0 50 100 150 200

50
0.1
40

30
ACF

df$y
0.0
20

−0.1 10

0
0 5 10 15 20 −0.25 0.00 0.25 0.50
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 6.1121, df = 9, p-value = 0.7287
##
## Model df: 1. Total lags used: 10

66
Residuals from Regression with ARIMA(2,0,0) errors
0.50

0.25

0.00

−0.25
0 50 100 150 200

50
0.1
40

30
ACF

df$y
0.0
20

−0.1 10

0
0 5 10 15 20 −0.25 0.00 0.25 0.50
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 5.9864, df = 8, p-value = 0.6488
##
## Model df: 2. Total lags used: 10

67
Residuals from Regression with ARIMA(3,0,0) errors
0.50

0.25

0.00

−0.25
0 50 100 150 200

40
0.1

30
ACF

0.0

df$y
20

−0.1 10

0
0 5 10 15 20 −0.25 0.00 0.25 0.50
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 3.844, df = 7, p-value = 0.7976
##
## Model df: 3. Total lags used: 10

eg_test <- matrix(nrow = 1, ncol = 4)


colnames(eg_test) <- rep("", 4)
rownames(eg_test) <- c("No const, no trend")
for (l in 1:4)
{
eg_l <- coint.test(i90d, cbind(i3y, i5y, i180d),
nlag = l, output = F)
eg_test[, l] <- eg_l[1, 3]
colnames(eg_test)[l] <- paste("Lag", l)
}
print(eg_test)

## Lag 1 Lag 2 Lag 3 Lag 4


## No const, no trend 0.01 0.01375873 0.0269668 0.01

68
eg_reg <- lm( i180d ~ i3y + i5y + i90d, mydata)
eg_res <- eg_reg$residuals

egr_ADF_lev <- ADF_estimate_lev(eg_res, p_max = 15)


print(egr_ADF_lev$ic_aic)

## const trend p aic bic


## [1,] 0 0 5 -422.6708 -399.0115
## [2,] 1 1 5 -421.8393 -391.4202
## [3,] 1 0 5 -420.7122 -393.6730
## [4,] 0 0 0 -418.8574 -412.0976
## [5,] 0 0 3 -418.8418 -401.9423
## [6,] 0 0 1 -418.7116 -408.5719
## [7,] 1 1 0 -418.3307 -404.8111
## [8,] 1 1 3 -418.0234 -394.3642
## [9,] 1 1 1 -417.8057 -400.9062
## [10,] 0 0 2 -417.7150 -404.1954

print(egr_ADF_lev$ic_bic)

## const trend p aic bic


## [1,] 0 0 0 -418.8574 -412.0976
## [2,] 0 0 1 -418.7116 -408.5719
## [3,] 1 0 0 -416.8772 -406.7375
## [4,] 1 1 0 -418.3307 -404.8111
## [5,] 0 0 2 -417.7150 -404.1954
## [6,] 1 0 1 -416.7327 -403.2131
## [7,] 0 0 3 -418.8418 -401.9423
## [8,] 1 1 1 -417.8057 -400.9062
## [9,] 0 0 5 -422.6708 -399.0115
## [10,] 1 0 2 -415.7406 -398.8411

egr_adq_set <- as.matrix(arrange(as.data.frame(


egr_ADF_lev$ic_bic[c(1, 2, 5, 7, 9), ]),
const, trend, p))
egr_adq_idx <- match(data.frame(t(egr_adq_set[, 1:3])),
data.frame(t(egr_ADF_lev$ic[, 1:3])))

for (i in 1:length(egr_adq_idx))
{
checkresiduals(egr_ADF_lev$ADF_est[[egr_adq_idx[i]]])
}

69
Residuals from Regression with ARIMA(0,0,0) errors
0.25

0.00

−0.25

−0.50
0 50 100 150 200

40
0.1
30
ACF

df$y
0.0 20

10
−0.1
0
0 5 10 15 20 −0.4 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(0,0,0) errors
## Q* = 8.1843, df = 10, p-value = 0.6108
##
## Model df: 0. Total lags used: 10

70
Residuals from Regression with ARIMA(1,0,0) errors
0.25

0.00

−0.25

−0.50
0 50 100 150 200

0.10 40

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.50 −0.25 0.00 0.25
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 6.6843, df = 9, p-value = 0.6699
##
## Model df: 1. Total lags used: 10

71
Residuals from Regression with ARIMA(2,0,0) errors
0.25

0.00

−0.25

−0.50
0 50 100 150 200

0.10 40

0.05
30
ACF

df$y
0.00
20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.4 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 5.8484, df = 8, p-value = 0.6642
##
## Model df: 2. Total lags used: 10

72
Residuals from Regression with ARIMA(3,0,0) errors
0.25

0.00

−0.25

−0.50
0 50 100 150 200

40
0.10

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.50 −0.25 0.00 0.25
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 4.0301, df = 7, p-value = 0.7763
##
## Model df: 3. Total lags used: 10

73
Residuals from Regression with ARIMA(5,0,0) errors
0.25

0.00

−0.25

−0.50
0 50 100 150 200

40
0.1
30
ACF

df$y
0.0
20

10
−0.1

0
0 5 10 15 20 −0.50 −0.25 0.00 0.25
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(5,0,0) errors
## Q* = 2.776, df = 5, p-value = 0.7345
##
## Model df: 5. Total lags used: 10

eg_test <- matrix(nrow = 1, ncol = 6)


colnames(eg_test) <- rep("", 6)
rownames(eg_test) <- c("No const, no trend")
for (l in 1:6)
{
eg_l <- coint.test(i180d, cbind(i3y, i5y, i90d),
nlag = l, output = F)
eg_test[, l] <- eg_l[1, 3]
colnames(eg_test)[l] <- paste("Lag", l)
}
print(eg_test)

## Lag 1 Lag 2 Lag 3 Lag 4 Lag 5 Lag 6


## No const, no trend 0.01 0.01 0.01 0.01 0.01 0.01

74
5. Next, use the data to test the expectations theory of the term structure of
interest rates (ETT). Specifically, investigate whether the spreads in the
Capital Market (i5y − i3y) and Money Market (i180d − i90d) are stable (and
therefore stationary assuming constant variances and auto-covariances).

Solution This is straightforward using the ADF testing approach explained in


Question 1. We apply it once to the spread i5y − i3y representing the Capital
Market and again to the spread i180d − i90d representing the Money Market. In
both cases the spreads are observed samples so there is no special consideration
needed to the distribution of the ADF test statistic.
Also note that we do not require any assumptions about unit roots in the DGPs of
individual interest rates to draw conclusions about the stationarity of the spreads
(such assumptions are only needed if we want to conclude “a stationary spread
implies cointegrated interest rates’ ’).

cm_ADF_lev <- ADF_estimate_lev(i5y - i3y, p_max = 15)


print(cm_ADF_lev$ic_aic)

## const trend p aic bic


## [1,] 1 0 2 -557.2660 -540.3435
## [2,] 1 1 2 -556.7428 -536.4358
## [3,] 1 0 4 -556.7363 -533.0449
## [4,] 1 0 3 -556.6236 -536.3166
## [5,] 1 1 4 -556.4973 -529.4214
## [6,] 1 1 6 -556.1063 -522.2614
## [7,] 1 1 15 -556.0468 -491.7414
## [8,] 1 0 15 -555.8224 -494.9015
## [9,] 1 1 1 -555.7064 -538.7839
## [10,] 1 1 3 -555.6926 -532.0012

print(cm_ADF_lev$ic_bic)

## const trend p aic bic


## [1,] 0 0 0 -550.2068 -543.4378
## [2,] 1 0 0 -552.6518 -542.4983
## [3,] 1 0 1 -555.3603 -541.8224
## [4,] 0 0 1 -551.9226 -541.7691
## [5,] 0 0 2 -555.0171 -541.4791
## [6,] 1 0 2 -557.2660 -540.3435
## [7,] 1 1 1 -555.7064 -538.7839
## [8,] 1 1 0 -552.0080 -538.4700
## [9,] 0 0 3 -555.0132 -538.0907
## [10,] 1 1 2 -556.7428 -536.4358

75
cm_adq_set <- as.matrix(arrange(as.data.frame(
cm_ADF_lev$ic_bic[c(2:3, 6:8, 10),]),
const, trend, p))
cm_adq_idx <- match(data.frame(t(cm_adq_set[, 1:3])),
data.frame(t(cm_ADF_lev$ic[, 1:3])))

for (i in 1:length(cm_adq_idx))
{
checkresiduals(cm_ADF_lev$ADF_est[[cm_adq_idx[i]]])
}

Residuals from Regression with ARIMA(0,0,0) errors


0.2

0.1

0.0

−0.1

−0.2

0 50 100 150 200

40
0.1
30
ACF

df$y

0.0 20

10
−0.1
0
0 5 10 15 20 −0.3 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(0,0,0) errors
## Q* = 19.261, df = 10, p-value = 0.03707
##
## Model df: 0. Total lags used: 10

76
Residuals from Regression with ARIMA(1,0,0) errors
0.2

0.1

0.0

−0.1

−0.2

0 50 100 150 200

30
0.1
ACF

df$y
20
0.0

10
−0.1
0
0 5 10 15 20 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 12.915, df = 9, p-value = 0.1665
##
## Model df: 1. Total lags used: 10

77
Residuals from Regression with ARIMA(2,0,0) errors
0.2

0.1

0.0

−0.1

−0.2

0 50 100 150 200

0.15
40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 10.092, df = 8, p-value = 0.2587
##
## Model df: 2. Total lags used: 10

78
Residuals from Regression with ARIMA(0,0,0) errors
0.2

0.1

0.0

−0.1

−0.2

0 50 100 150 200

40
0.1
30
ACF

df$y
0.0 20

10
−0.1
0
0 5 10 15 20 −0.3 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(0,0,0) errors
## Q* = 19.346, df = 10, p-value = 0.03608
##
## Model df: 0. Total lags used: 10

79
Residuals from Regression with ARIMA(1,0,0) errors
0.2

0.1

0.0

−0.1

−0.2

0 50 100 150 200

40

0.1 30
ACF

df$y
20
0.0

10
−0.1
0
0 5 10 15 20 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 12.354, df = 9, p-value = 0.1941
##
## Model df: 1. Total lags used: 10

80
Residuals from Regression with ARIMA(2,0,0) errors
0.2

0.1

0.0

−0.1

−0.2

0 50 100 150 200

0.15
40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.2 −0.1 0.0 0.1 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 10.031, df = 8, p-value = 0.2629
##
## Model df: 2. Total lags used: 10

adf.test(i5y - i3y, nlag = 3)

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -2.72 0.0100
## [2,] 1 -2.99 0.0100
## [3,] 2 -2.13 0.0345
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -3.40 0.0128
## [2,] 1 -3.88 0.0100
## [3,] 2 -2.82 0.0611
## Type 3: with drift and trend

81
## lag ADF p.value
## [1,] 0 -3.48 0.045
## [2,] 1 -4.09 0.010
## [3,] 2 -3.04 0.141
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01

For i5y − i3y, the test rejects a unit root for models except with a constant, no
trend and p = 2, which it fails to reject at the 5% significance level (although it
would be reject at the 6.2% level). We are unable to conclusively confirm the ETT
in the Capital Market.

mm_ADF_lev <- ADF_estimate_lev(i180d - i90d, p_max = 15)


print(mm_ADF_lev$ic_aic)

## const trend p aic bic


## [1,] 1 0 1 -540.8512 -527.3316
## [2,] 1 0 2 -540.5695 -523.6700
## [3,] 1 0 3 -539.8240 -519.5446
## [4,] 0 0 1 -539.2831 -529.1434
## [5,] 1 1 1 -539.0038 -522.1043
## [6,] 1 1 2 -538.6916 -518.4123
## [7,] 0 0 2 -538.5208 -525.0012
## [8,] 0 0 3 -538.1661 -521.2667
## [9,] 1 0 4 -538.0003 -514.3410
## [10,] 1 1 3 -537.9208 -514.2615

print(mm_ADF_lev$ic_bic)

## const trend p aic bic


## [1,] 0 0 1 -539.2831 -529.1434
## [2,] 1 0 1 -540.8512 -527.3316
## [3,] 0 0 2 -538.5208 -525.0012
## [4,] 1 0 2 -540.5695 -523.6700
## [5,] 1 1 1 -539.0038 -522.1043
## [6,] 0 0 3 -538.1661 -521.2667
## [7,] 1 0 3 -539.8240 -519.5446
## [8,] 1 1 2 -538.6916 -518.4123
## [9,] 0 0 0 -524.7458 -517.9860
## [10,] 0 0 4 -536.1951 -515.9157

mm_adq_set <- as.matrix(arrange(as.data.frame(


mm_ADF_lev$ic_bic[c(1:8),]),
const, trend, p))

82
mm_adq_idx <- match(data.frame(t(mm_adq_set[, 1:3])),
data.frame(t(mm_ADF_lev$ic[, 1:3])))

for (i in 1:length(mm_adq_idx))
{
checkresiduals(mm_ADF_lev$ADF_est[[mm_adq_idx[i]]])
}

Residuals from Regression with ARIMA(1,0,0) errors


0.2

0.0

−0.2

−0.4
0 50 100 150 200

40
0.10

0.05 30
ACF

df$y

0.00 20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.4 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 7.5858, df = 9, p-value = 0.5764
##
## Model df: 1. Total lags used: 10

83
Residuals from Regression with ARIMA(2,0,0) errors
0.2

0.0

−0.2

0 50 100 150 200

0.10 40

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10
0
−0.15
0 5 10 15 20 −0.4 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 7.7336, df = 8, p-value = 0.4599
##
## Model df: 2. Total lags used: 10

84
Residuals from Regression with ARIMA(3,0,0) errors
0.2

0.0

−0.2

0 50 100 150 200

40
0.10

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.4 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 7.1189, df = 7, p-value = 0.4166
##
## Model df: 3. Total lags used: 10

85
Residuals from Regression with ARIMA(1,0,0) errors
0.2

0.0

−0.2

−0.4
0 50 100 150 200

0.10 30

0.05
20
ACF

df$y
0.00

−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.4 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 7.8281, df = 9, p-value = 0.5516
##
## Model df: 1. Total lags used: 10

86
Residuals from Regression with ARIMA(2,0,0) errors

0.2

0.0

−0.2

−0.4
0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.4 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 7.6639, df = 8, p-value = 0.467
##
## Model df: 2. Total lags used: 10

87
Residuals from Regression with ARIMA(3,0,0) errors
0.2

0.0

−0.2

−0.4
0 50 100 150 200

40
0.10

0.05 30
ACF

df$y
0.00 20
−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.4 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 6.6809, df = 7, p-value = 0.4628
##
## Model df: 3. Total lags used: 10

88
Residuals from Regression with ARIMA(1,0,0) errors
0.2

0.0

−0.2

−0.4
0 50 100 150 200

0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.4 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 7.8597, df = 9, p-value = 0.5483
##
## Model df: 1. Total lags used: 10

89
Residuals from Regression with ARIMA(2,0,0) errors
0.2

0.0

−0.2

0 50 100 150 200

0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10

−0.15 0
0 5 10 15 20 −0.4 −0.2 0.0 0.2
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 7.6345, df = 8, p-value = 0.47
##
## Model df: 2. Total lags used: 10

adf.test(i180d - i90d)

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -3.02 0.01
## [2,] 1 -3.84 0.01
## [3,] 2 -3.92 0.01
## [4,] 3 -4.14 0.01
## [5,] 4 -4.00 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -3.39 0.0135
## [2,] 1 -4.26 0.0100

90
## [3,] 2 -4.34 0.0100
## [4,] 3 -4.64 0.0100
## [5,] 4 -4.52 0.0100
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -3.40 0.0551
## [2,] 1 -4.26 0.0100
## [3,] 2 -4.33 0.0100
## [4,] 3 -4.64 0.0100
## [5,] 4 -4.52 0.0100
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01

The test rejects a unit root for all models in the adequate set. Note that some
specifications lead to a failure to reject, but they are not in our adequate set, so
we can ignore them! We can confidently conclude that the money market spread
is I(0) and the ETT holds for the Money Market.

91
ECON3350: Applied Econometrics
for Macroeconomics and Finance
Tutorial 9: Modelling Volatility - II

At the end of this tutorial you should be able to:

• construct an adequate set of models with possible TGARCH errors;


• construct an adequate set of models with possible GARCH-in-mean compo-
nents;
• use R to draw inference on the presence of leverage effects from a class of
TGARCH models;
• use R to draw inference on the presence of time-varying risk premia from a
class of GARCH-in-mean models;
• use R to estimate and forecast volatilities based on models with TGARCH
errors or GARCH-in-mean components.

Problems with Solutions


Consider the daily share prices of Merck & Co., Inc. (MRK) for the period 2
January 2001 to 23 December 2013 in the data file Merck.csv. Let {yt } denote the
process of share prices. Recall that we learned how to fit ARMA-ARCH/GARCH
models to data last week. We now consider extensions of these models to capture
possible leverage effects and time-varying risk premia.

1. Consider the class of ARMA(pm , qm )-TGARCH(ph , qh ) models for the re-


turns rt = ln yt − ln yt−1 .
(a) Construct an adequate set of models for estimating and forecasting
volatilities.

Solution For this tutorial, we load the following useful packages.

1
library(forecast)
library(dplyr)
library(rugarch)

Next, load the data, extract the variables and plot them to get an idea on what
we are working with.

mydata <- read.delim("Merck.csv", header = TRUE, sep = ",")

date <- as.Date(mydata$Date, format = "%d/%m/%Y")


y <- mydata$y
r <- diff(log(y))

plot(date[-1], r, type = "l", xlab = "", ylab = "returns")


0.1
0.0
returns

−0.1
−0.2
−0.3

2002 2004 2006 2008 2010 2012 2014

To construct and adquate set of models, we use a similar approach to the one de-
veloped in Week 8 for the class of basic GARCH models. The additional dimension
considered here is whether the model includes the “threshold term’ ’.
Note that the rugarch package has a different naming convention for threshold
GARCH models. Specifically, what we refer to as TGARCH, is called the gjr-
GARCH in the package, whereas TGARCH denotes a slightly different model.
This is not uncommon, so it is always important to check the documentation of
any particular software / package carefully!

2
submods <- c("sGARCH", "gjrGARCH")
ARMA_TGARCH_est <- list()
ic_arma_tgarch <- matrix( nrow = 4 ˆ 2 * 2 ˆ 3, ncol = 7 )
colnames(ic_arma_tgarch) <- c("pm", "qm", "ph", "qh",
"thresh", "aic", "bic")
i <- 0; t0 <- proc.time()
for (pm in 0:3)
{
for (qm in 0:3)
{
for (ph in 0:1)
{
for (qh in 0:1)
{
for (thresh in 0:1)
{
i <- i + 1
ic_arma_tgarch[i, 1:5] <- c(pm, qm, ph, qh, thresh)

if (ph == 0 && qh == 0)
{
# no such thing as a homoscedastic threshold ARMA!
if (!thresh)
{
# for models with constant variance, the ugarchspec
# and ugarchfit functions do not work well;
# instead, the documentation advises to use
# arfimaspec and arfimafit
try(silent = T, expr =
{
ARMA_TGARCH_mod <- arfimaspec(
mean.model = list(armaOrder = c(pm, qm)))

ARMA_TGARCH_est[[i]] <- arfimafit(


ARMA_TGARCH_mod, r)

ic_arma_tgarch[i,6:7] <- infocriteria(


ARMA_TGARCH_est[[i]])[1:2]
})
}
}
else
{
try(silent = T, expr =
{

3
ARMA_TGARCH_mod <- ugarchspec(
mean.model = list(armaOrder = c(pm, qm)),
variance.model = list(model = submods[1 + thresh],
garchOrder = c(ph, qh)))

ARMA_TGARCH_est[[i]] <- ugarchfit(ARMA_TGARCH_mod, r,


solver = 'hybrid')

ic_arma_tgarch[i,6:7] <- infocriteria(


ARMA_TGARCH_est[[i]])[1:2]
})
}
}
}
}
}
}
cat("\n", proc.time() - t0, "\n")

##
## 166.33 2.6 211 NA NA

ic_aic_arma_tgarch <- ic_arma_tgarch[


order(ic_arma_tgarch[,6]),][1:20,]
ic_bic_arma_tgarch <- ic_arma_tgarch[
order(ic_arma_tgarch[,7]),][1:20,]

ic_int_arma_tgarch <- intersect(as.data.frame(ic_aic_arma_tgarch),


as.data.frame(ic_bic_arma_tgarch))

adq_set_arma_tgarch <- as.matrix(arrange(as.data.frame(


ic_int_arma_tgarch), pm, qm, ph, qh, thresh))
adq_idx_arma_tgarch <- match(data.frame(
t(adq_set_arma_tgarch[, 1:5])),
data.frame(t(ic_arma_tgarch[, 1:5])))

Note that we have selected the entire intersecting set of models. The next step is
to have a look at residual autocorrelation. As with the basic GARCH, one must
be careful in interpreting the Ljung-Box test with conditional heteroscedasticity.
Indeed, we will avoid it altogether and instead just examine the SACF.

nmods <- length(adq_idx_arma_tgarch)


sacf_tgarch <- matrix(nrow = nmods, ncol = 15)
colnames(sacf_tgarch) <- c("pm", "qm", "ph", "qh", "thresh", 1:10)

4
for (i in 1:nmods)
{
sacf_tgarch[i,1:5] <- adq_set_arma_tgarch[i,1:5]
sacf_tgarch[i,6:15] <-
acf(ARMA_TGARCH_est[[adq_idx_arma_tgarch[i]]]@fit$z,
lag = 10, plot = F)$acf[2:11]
}

All the residuals look relatively small, so we do not eliminate any specifications
from the existing set of models. Hence, this is our adequate set.

(b) Draw inference on historic volatilities.

Solution Plotting estimated volatilities is the same as for the basic GARCH
(note that we still do not get confidence intervals with the rugarch package).

title_tgarch <- rep(NA, times = nmods)


for (i in 1:nmods)
{
title_tgarch[i] <- paste("ARMA(",
as.character(adq_set_arma_tgarch[i, 1]), ",",
as.character(adq_set_arma_tgarch[i, 2]),
")-",
c("", "T")[1 + adq_set_arma_tgarch[i,5]],
"GARCH(",
as.character(adq_set_arma_tgarch[i, 3]), ",",
as.character(adq_set_arma_tgarch[i, 4]), ")",
sep = "")
plot(date[-1], sqrt(
ARMA_TGARCH_est[[adq_idx_arma_tgarch[i]]]@fit$var),
type = "l", xlab = "", ylab = "volatilities",
ylim = c(0, 0.08), main = title_tgarch[i])
}

5
ARMA(0,0)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(0,1)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

6
ARMA(0,1)−TGARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(0,2)−TGARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

7
ARMA(0,3)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(0,3)−TGARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

8
ARMA(1,0)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(1,0)−TGARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

9
ARMA(1,1)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(1,2)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

10
ARMA(1,2)−TGARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(1,3)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

11
ARMA(2,1)−TGARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(3,0)−TGARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

All volatility estimates generally look very similar across the specifications in the
adequate set. TGARCH vs GARCH specifications generally do not produce no-
ticeable differences for fixed pm , qm , ph and qh . However, larger ARMA lags (pm
and qm ) appear to be associated with larger “spikes’ ’ in volatilities when they do
occur.

12
(c) Draw inference on the existence of leverage effects.

Solution To test for leverage effects, we focus on on the threshold parameter


in models where the threshold is actually included. Note that the rugarch labels
the threshold parameter γ, whereas we used the notation λ in the videos.

for (i in 1:nmods)
{
if (adq_set_arma_tgarch[i, 5] == 1)
{
lambda_est <- ARMA_TGARCH_est[[
adq_idx_arma_tgarch[i]]]@fit$coef["gamma1"]
lambda_tvl <- ARMA_TGARCH_est[[
adq_idx_arma_tgarch[i]]]@fit$tval["gamma1"]
cat(paste0(title_tgarch[i], ": lambda_hat = ",
round(lambda_est, 2),
", t-value = ",
round(lambda_tvl, 2)),
"\n")
}
}

## ARMA(0,1)-TGARCH(1,1): lambda_hat = 0.01, t-value = 15801.83


## ARMA(0,2)-TGARCH(1,1): lambda_hat = 0.01, t-value = 22678.58
## ARMA(0,3)-TGARCH(1,1): lambda_hat = 0.01, t-value = 50575.46
## ARMA(1,0)-TGARCH(1,1): lambda_hat = 0.01, t-value = 16991.85
## ARMA(1,2)-TGARCH(1,1): lambda_hat = 0.01, t-value = 3315.97
## ARMA(2,1)-TGARCH(1,1): lambda_hat = 0, t-value = 0.15
## ARMA(3,0)-TGARCH(1,1): lambda_hat = 0.01, t-value = 4184.12

The null H0 : λ = 0 is easily rejected in favour of H1 : λ > 0 for all specifications


at a very low significance level. Hence, we may infer that there is substantial
evidence of “leverage effects’ ’.
It is worth pointing out that we can justify this conclusion based on the hypothesis
test in TGARCH specifications only. The fact that some GARCH specifications
are also included in the adequate is irrelevant for this purpose. The latter sim-
ply means that we cannot eliminate basic GARCH models on the basis of fit vs
parsimony trade-off along with risk of residual autocorrelation.

13
(d) Forecast volatility for the four trading days past the end of the sample.

Solution Volatility forecasts are generated using the ugarchboot function,


which also provides partial predictive intervals—i.e., that account for uncertainty
related to unknown future shocks, but not estimation uncertainty. In fact,
ugarchboot can also be instructed to compute predictive intervals that account
for estimation uncertainty by specifying option method = "Full", but this is
computationally very time consuming, so we do not attempt it in this exercise.
Also, note that rugarch provides a special version of the plot function that will
plot the volatility forecasts along with predictive intervals. Unfortunately, the
functionality of this customised function is limited: it does not allow the user to
modify the plot title, axis limits, etc.

for (i in 1:nmods)
{
plot(ugarchboot(ARMA_TGARCH_est[[adq_idx_arma_tgarch[i]]],
method = "Partial"), which = 3)
}

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)
0.0105

forecast
bootstrapped
0.0095
sigma

0.0085

GARCH model : sGARCH


0.0075

2 4 6 8 10

T+i

14
Sigma Forecast
with Bootstrap Error Bands
0.012 (q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.010
sigma

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.008 0.009 0.010 0.011
sigma

GARCH model : gjrGARCH

2 4 6 8 10

T+i

15
Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.008 0.009 0.010 0.011
sigma

GARCH model : gjrGARCH


2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)
0.012

forecast
bootstrapped
0.010
sigma

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

16
Sigma Forecast
with Bootstrap Error Bands
0.011 (q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.010
sigma

0.009

GARCH model : gjrGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)
0.012

forecast
bootstrapped
0.010
sigma

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

17
Sigma Forecast
with Bootstrap Error Bands
0.011 (q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.010
sigma

0.009

GARCH model : gjrGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.012
sigma

0.010

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

18
Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
0.012

bootstrapped
0.010
sigma

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.008 0.009 0.010 0.011
sigma

GARCH model : gjrGARCH

2 4 6 8 10

T+i

19
Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
0.011

bootstrapped
0.010
sigma

0.009

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
0.011

bootstrapped
0.010
sigma

0.009

GARCH model : gjrGARCH


0.008

2 4 6 8 10

T+i

20
Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.008 0.009 0.010 0.011
sigma

GARCH model : gjrGARCH


2 4 6 8 10

T+i

There are subtle differences between the volatility forecasts generated by different
GARCH and TGARCH specifications, but all agree that volatilities across the
forecast horizon will remain well below the levels estimated around 2005 and then
again in 2008/2009.

2. Consider a class of GARCH-M models for returns rt , with one ARCH lag
and one GARCH lag.

(a) Construct an adequate set of models for estimating the risk-premium.

Solution We use the same approach as with GARCH/TGARCH models; here


it is some what simpler because we fix all GARCH lags to be 1. So a specification
consists of ARMA lags p, q and an indicator as whether or not the GARCH-in-
mean term is included.

ARMA_GARCHM_est <- list()


ic_arma_garchm <- matrix( nrow = 4 ˆ 2 * 2, ncol = 5 )
colnames(ic_arma_garchm) <- c("p", "q", "m", "aic", "bic")
i <- 0; t0 <- proc.time()
for (p in 0:3)

21
{
for (q in 0:3)
{
for (m in 0:1)
{
i <- i + 1
ic_arma_garchm[i, 1:3] <- c(p, q, m)

try(silent = T, expr =
{
ARMA_GARCHM_mod <- ugarchspec(
mean.model = list(armaOrder = c(p, q),
archm = m),
variance.model = list(garchOrder = c(1, 1)))

ARMA_GARCHM_est[[i]] <- ugarchfit(ARMA_GARCHM_mod, r,


solver = 'hybrid')

ic_arma_garchm[i,4:5] <- infocriteria(


ARMA_GARCHM_est[[i]])[1:2]
})
}
}
}
cat("\n", proc.time() - t0, "\n")

##
## 21.81 0.48 36.5 NA NA

ic_aic_arma_garchm <- ic_arma_garchm[


order(ic_arma_garchm[,4]),][1:20,]
ic_bic_arma_garchm <- ic_arma_garchm[
order(ic_arma_garchm[,5]),][1:20,]

ic_int_arma_garchm <- intersect(


as.data.frame(ic_aic_arma_garchm),
as.data.frame(ic_bic_arma_garchm))

adq_set_arma_garchm <- as.matrix(arrange(as.data.frame(


ic_int_arma_garchm), p, q, m))
adq_idx_arma_garchm <- match(
data.frame(t(adq_set_arma_garchm[, 1:3])),
data.frame(t(ic_arma_garchm[, 1:3])))

22
nmods <- length(adq_idx_arma_garchm)
sacf_garchm <- matrix(nrow = nmods, ncol = 13)
colnames(sacf_garchm) <- c("p", "q", "m", 1:10)
for (i in 1:nmods)
{
sacf_garchm[i,1:3] <- adq_set_arma_garchm[i,1:3]
sacf_garchm[i,4:13] <-
acf(ARMA_GARCHM_est[[adq_idx_arma_garchm[i]]]@fit$z,
lag = 10, plot = F)$acf[2:11]
}

(b) Draw inference on historic volatilities.

Solution We use the same approach as with GARCH/TGARCH specifications


in Question 1.

title_garchm <- rep(NA, times = nmods)


for (i in 1:nmods)
{
title_garchm[i] <- paste("ARMA(",
as.character(adq_set_arma_garchm[i, 1]), ",",
as.character(adq_set_arma_garchm[i, 2]),
c(")-GARCH(1,1)", ")-GARCHM(1,1)")
[1 + adq_set_arma_garchm[i,3]],
sep = "")
plot(date[-1], sqrt(
ARMA_GARCHM_est[[adq_idx_arma_garchm[i]]]@fit$var),
type = "l", xlab = "", ylab = "volatilities",
ylim = c(0, 0.08), main = title_garchm[i])
}

23
ARMA(0,0)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(0,0)−GARCHM(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

24
ARMA(0,1)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(0,1)−GARCHM(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

25
ARMA(0,3)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(1,0)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

26
ARMA(1,0)−GARCHM(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(1,1)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

27
ARMA(1,1)−GARCHM(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(1,2)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

28
ARMA(1,3)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

ARMA(2,1)−GARCH(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

29
ARMA(3,0)−GARCHM(1,1)
0.08
0.06
volatilities

0.04
0.02
0.00

2002 2004 2006 2008 2010 2012 2014

All volatility estimates generally look very similar. Including the GARCH-in-mean
term appears to produce only a minor difference in estimated volatilities for fixed
p, q. The periods of high volatility match up to those estimated using GARCH
and TGARCH models.

(c) Draw inference on the existence of a time-varying risk premium.

Solution To test for time-varying risk premia, we focus on on the coefficient


of the GARCH-in-mean term in models where the GARCH-in-mean is actually
included. This is δ in Engle, et al. (1987), but rugarch refers to it as “archm’ ’.

for (i in 1:nmods)
{
if (adq_set_arma_garchm[i, 3] == 1)
{
archm_est <- ARMA_GARCHM_est[[
adq_idx_arma_garchm[i]]]@fit$coef["archm"]
archm_tvl <- ARMA_GARCHM_est[[
adq_idx_arma_garchm[i]]]@fit$tval["archm"]
cat(paste0(title_garchm[i], ": archm_hat = ",

30
round(archm_est, 2),
", t-value = ",
round(archm_tvl, 2)),
"\n")
}
}

## ARMA(0,0)-GARCHM(1,1): archm_hat = 0.05, t-value = 4.9


## ARMA(0,1)-GARCHM(1,1): archm_hat = 0.05, t-value = 2812.95
## ARMA(1,0)-GARCHM(1,1): archm_hat = 0.06, t-value = 2812.95
## ARMA(1,1)-GARCHM(1,1): archm_hat = 0.05, t-value = 4.85
## ARMA(3,0)-GARCHM(1,1): archm_hat = 0.05, t-value = 3528.4

The null H0 : δ = 0 is easily rejected in favour of H1 : δ > 0 at very low significance


levels for all models with a GARCH-in-mean term. Hence, we conclude that there
is substantial evidence of a “time-varying risk premium”.

(d) Forecast volatility for the four trading days past the end of the sample.

Solution To forecast volatilities, we again use the ugarchboot function.

for (i in 1:nmods)
{
plot(ugarchboot(ARMA_GARCHM_est[[adq_idx_arma_garchm[i]]],
method = "Partial"), which = 3)
}

31
Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.012
sigma

0.010

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.012
sigma

0.010

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

32
Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
0.012

bootstrapped
sigma

0.010

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.012
sigma

0.010

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

33
Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
0.012

bootstrapped
sigma

0.010

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)
0.012

forecast
bootstrapped
0.010
sigma

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

34
Sigma Forecast
with Bootstrap Error Bands
0.012 (q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.010
sigma

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.012
sigma

0.010

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

35
Sigma Forecast
with Bootstrap Error Bands
0.012 (q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.010
sigma

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.012
sigma

0.010

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

36
Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.012
sigma

0.010

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
0.012

bootstrapped
0.010
sigma

GARCH model : sGARCH


0.008

2 4 6 8 10

T+i

37
Sigma Forecast
with Bootstrap Error Bands
(q: 5%, 25%, 75%, 95%)

forecast
bootstrapped
0.008 0.009 0.010 0.011
sigma

GARCH model : sGARCH


2 4 6 8 10

T+i

The volatility forecasts obtained from GARCH-in-mean specifications are very


similar to those obtained with the GARCH and TGARCH specifications.

38
ECON3350: Applied Econometrics
for Macroeconomics and Finance

Tutorial 5: Trends and Cycles

At the end of this tutorial you should be able to:


• construct an adequate set of ADF specifications for unit root testing;
• carry out ADF tests for a unit root and interpret the results;
• construct an adequate set of general ARIMA(p, d, q) models.

Problems with Solutions


The specification for a general ARIMA(p, d, q) model is
p q
∆ d y t = δt + aj ∆d yt−j +
X X
bj ϵt + ϵt ,
j=1 j=1

where δt is a general deterministic term.


• If the process has no deterministic terms, then δt = 0.
• If the process includes a constant only, then δt = a0 .
• If there is a constant and a trend, then δt = a0 + δt.

1
The file usdata.csv contains 209 observations on:
• yt ≡ log real per capita GDP (GDP); and
• rt ≡ the overnight Federal Funds Rate for the US (FFR).
1. For yt :
(a) Plot the observed time series and comment on potential trends.

Solution For this tutorial, we load the following useful packages.


library(forecast)
library(dplyr)
library(zoo)
library(aTSA)

Next, load the data and plot log real US GDP per capita. Clearly, per capita GDP
levels are increasing in the US over the sample period, so we suspect a trend exists
in the data generating process.
mydata <- read.delim("usdata.csv", header = TRUE, sep = ",")

dates <- as.yearqtr(mydata$obs)


y <- mydata$GDP
r <- mydata$FFR

plot(dates, y, type = "l", xlab = "Time (Quarters)",


main = "Log Real US GDP per Capita")

2
Log Real US GDP per Capita
0.0
−0.5
y

−1.0
−1.5

1960 1970 1980 1990 2000

Time (Quarters)

(b) Construct an adequate set of ADF regression models.

Solution ADF regression are specified by lag length p along with the inclusion /
exclusion of a constant and trend. Note that a trend is only included if a constant
is included (i.e., we do not consider specifications with a trend, but no constant).
With that in mind, we implement the familiar approach.
TT <- length(y)
ADF_est <- list()
ic <- matrix( nrow = 30, ncol = 5 )
colnames(ic) <- c("cons", "trend", "p", "aic", "bic")
i <- 0
for (const in 0:1)
{
for (p in 0:9)
{
i <- i + 1
ADF_est[[i]] <- Arima(diff(y), xreg = y[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = F)

3
ic[i,] <- c(const, 0, p, ADF_est[[i]]$aic,
ADF_est[[i]]$bic)
}

if (const)
{
# only add a specification with trend if there is a
# constant (i.e., exclude no constant with trend)
for (p in 0:9)
{
i <- i + 1
ADF_est[[i]] <- Arima(diff(y), xreg = y[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = T)
ic[i,] <- c(const, 1, p, ADF_est[[i]]$aic,
ADF_est[[i]]$bic)
}
}
}

ic_aic <- ic[order(ic[,4]),][1:10,]


ic_bic <- ic[order(ic[,5]),][1:10,]

print(ic_aic)

## cons trend p aic bic


## [1,] 1 1 2 -1384.766 -1364.741
## [2,] 1 1 1 -1383.057 -1366.369
## [3,] 1 1 3 -1382.777 -1359.414
## [4,] 1 1 4 -1380.779 -1354.079
## [5,] 1 1 5 -1380.217 -1350.179
## [6,] 1 0 1 -1378.683 -1365.333
## [7,] 1 0 2 -1378.681 -1361.993
## [8,] 1 1 6 -1378.575 -1345.199
## [9,] 1 0 3 -1377.156 -1357.130
## [10,] 1 1 7 -1376.638 -1339.925
print(ic_bic)

## cons trend p aic bic


## [1,] 1 1 1 -1383.057 -1366.369
## [2,] 1 0 1 -1378.683 -1365.333
## [3,] 1 1 2 -1384.766 -1364.741
## [4,] 1 0 2 -1378.681 -1361.993
## [5,] 1 1 3 -1382.777 -1359.414

4
## [6,] 1 0 3 -1377.156 -1357.130
## [7,] 0 0 2 -1367.494 -1354.144
## [8,] 1 1 4 -1380.779 -1354.079
## [9,] 0 0 1 -1362.530 -1352.518
## [10,] 1 0 4 -1375.519 -1352.156
The AIC prefers specifications with both a constant and trend as well as lag
lengths in the range p = 1, . . . , 5. The BIC ranking includes specifications with
both a constant and trend as well as lags p = 1, . . . , 3. However, it also includes
specifications with a constant only and lags p = 1, . . . , 2. Putting this information
together, we select the top five specifications preferred by the BIC.
adq_set <- as.matrix(arrange(as.data.frame(ic_bic[1:5,]),
const, trend, p))
adq_idx <- match(data.frame(t(adq_set[, 1:3])),
data.frame(t(ic[, 1:3])))

Finally, we check the residuals for all specifications in the adequate set.
for (i in 1:length(adq_idx))
{
checkresiduals(ADF_est[[adq_idx[i]]])
}

Residuals from Regression with ARIMA(1,0,0) errors

0.02

0.00

−0.02

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y

0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##

5
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 8.4811, df = 7, p-value = 0.2921
##
## Model df: 3. Total lags used: 10

Residuals from Regression with ARIMA(2,0,0) errors

0.02

0.00

−0.02

0 50 100 150 200

0.10
30
0.05
ACF

df$y

0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 6.822, df = 6, p-value = 0.3376
##
## Model df: 4. Total lags used: 10

6
Residuals from Regression with ARIMA(1,0,0) errors

0.02

0.00

−0.02

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 7.7825, df = 6, p-value = 0.2545
##
## Model df: 4. Total lags used: 10

7
Residuals from Regression with ARIMA(2,0,0) errors

0.02

0.00

−0.02

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 5.4569, df = 5, p-value = 0.3627
##
## Model df: 5. Total lags used: 10

8
Residuals from Regression with ARIMA(3,0,0) errors

0.02

0.00

−0.02

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 5.466, df = 4, p-value = 0.2427
##
## Model df: 6. Total lags used: 10
As no obvious problems jump out from the residuals analysis, we proceed with the
adequate set constructed above.

(c) Implement the ADF test for a stochastic trend and draw inference
regarding the integration properties of yt .

Solution Use the adf.test function from the aTSA package.


adf.test(y)

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value

9
## [1,] 0 -10.33 0.01
## [2,] 1 -5.31 0.01
## [3,] 2 -4.05 0.01
## [4,] 3 -3.79 0.01
## [5,] 4 -3.54 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -1.347 0.575
## [2,] 1 -0.931 0.721
## [3,] 2 -0.637 0.824
## [4,] 3 -0.648 0.820
## [5,] 4 -0.659 0.816
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -1.99 0.577
## [2,] 1 -2.45 0.385
## [3,] 2 -2.54 0.347
## [4,] 3 -2.44 0.390
## [5,] 4 -2.36 0.424
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
Since only “Type 2” and “Type 3” specifications are in our adequate set, we ignore
the output related to “Type 1” specifications.Consequently, for all specifications in
our adequate set, the null of unit root cannot be rejected. We conclude that the
process is not empirically distinguishable from an integrated process with a drift
(and possibly linear growth).

(d) Repeat parts (a)-(c) for the differenced series ∆yt .

Solution The procedure is nearly identical to that used in constructing an


adequate set for y, but replacing it with ∆y instead. In R, this is done by simply
replacing y with diff(y).
TT <- length(diff(y))
ADF_est_diff <- list()
ic_diff <- matrix( nrow = 30, ncol = 5 )
colnames(ic_diff) <- c("cons", "trend", "p", "aic", "bic")
i <- 0
for (const in 0:1)
{
for (p in 0:9)
{

10
i <- i + 1
ADF_est_diff[[i]] <- Arima(diff(diff(y)),
xreg = diff(y)[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = F)
ic_diff[i,] <- c(const, 0, p, ADF_est_diff[[i]]$aic,
ADF_est_diff[[i]]$bic)
}

if (const)
{
# only add a specification with trend if there is a
# constant (i.e., exclude no constant with trend)
for (p in 0:9)
{
i <- i + 1
ADF_est_diff[[i]] <- Arima(diff(diff(y)),
xreg = diff(y)[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = T)
ic_diff[i,] <- c(const, 1, p, ADF_est_diff[[i]]$aic,
ADF_est_diff[[i]]$bic)
}
}
}

ic_aic_diff <- ic_diff[order(ic_diff[,4]),][1:10,]


ic_bic_diff <- ic_diff[order(ic_diff[,5]),][1:10,]

print(ic_aic_diff)

## cons trend p aic bic


## [1,] 1 0 1 -1374.441 -1361.110
## [2,] 1 0 0 -1373.402 -1363.404
## [3,] 1 1 1 -1372.788 -1356.125
## [4,] 1 0 2 -1372.466 -1355.803
## [5,] 1 1 0 -1371.969 -1358.638
## [6,] 1 1 2 -1370.838 -1350.842
## [7,] 1 0 3 -1370.571 -1350.574
## [8,] 1 0 5 -1369.103 -1342.441
## [9,] 1 1 3 -1368.922 -1345.593
## [10,] 1 0 4 -1368.730 -1345.400

11
print(ic_bic_diff)

## cons trend p aic bic


## [1,] 1 0 0 -1373.402 -1363.404
## [2,] 1 0 1 -1374.441 -1361.110
## [3,] 1 1 0 -1371.969 -1358.638
## [4,] 1 1 1 -1372.788 -1356.125
## [5,] 1 0 2 -1372.466 -1355.803
## [6,] 1 1 2 -1370.838 -1350.842
## [7,] 1 0 3 -1370.571 -1350.574
## [8,] 1 1 3 -1368.922 -1345.593
## [9,] 1 0 4 -1368.730 -1345.400
## [10,] 1 0 5 -1369.103 -1342.441
Note that in this case the top five specifications ranked by the AIC is the same as
the top five ranked by the BIC. Since they agree, we chose these five to form the
adequate set: p = 0, 1, 2 with constant and no trend along with p = 0, 1 with both
a constant and trend.
adq_set_diff <- as.matrix(arrange(as.data.frame(
ic_bic_diff[1:5,]), const, trend, p))
adq_idx_diff <- match(data.frame(t(adq_set_diff[, 1:3])),
data.frame(t(ic_diff[, 1:3])))

Finally, we check the residuals for all specifications in the adequate set.
for (i in 1:length(adq_idx))
{
checkresiduals(ADF_est[[adq_idx[i]]])
}

12
Residuals from Regression with ARIMA(1,0,0) errors

0.02

0.00

−0.02

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 8.4811, df = 7, p-value = 0.2921
##
## Model df: 3. Total lags used: 10

13
Residuals from Regression with ARIMA(2,0,0) errors

0.02

0.00

−0.02

0 50 100 150 200

0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 6.822, df = 6, p-value = 0.3376
##
## Model df: 4. Total lags used: 10

14
Residuals from Regression with ARIMA(1,0,0) errors

0.02

0.00

−0.02

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 7.7825, df = 6, p-value = 0.2545
##
## Model df: 4. Total lags used: 10

15
Residuals from Regression with ARIMA(2,0,0) errors

0.02

0.00

−0.02

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 5.4569, df = 5, p-value = 0.3627
##
## Model df: 5. Total lags used: 10

16
Residuals from Regression with ARIMA(3,0,0) errors

0.02

0.00

−0.02

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 5.466, df = 4, p-value = 0.2427
##
## Model df: 6. Total lags used: 10
As no obvious problems jump out from the residuals analysis, we proceed with
ADF test using specifications in the adequate set.
adf.test(diff(y))

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -7.13 0.01
## [2,] 1 -5.05 0.01
## [3,] 2 -4.30 0.01
## [4,] 3 -3.73 0.01
## [5,] 4 -3.42 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -10.58 0.01

17
## [2,] 1 -7.88 0.01
## [3,] 2 -7.21 0.01
## [4,] 3 -6.67 0.01
## [5,] 4 -6.66 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -10.60 0.01
## [2,] 1 -7.88 0.01
## [3,] 2 -7.21 0.01
## [4,] 3 -6.67 0.01
## [5,] 4 -6.68 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
A unit root is rejected at very small significance level for all specifications. Hence,
the differenced process is empirically distinguishable from an integrated process.

(e) Interpret the overall findings in parts (c) and (d).

Solution Since {yt } is not empirically distinguishable from an integrated process,


but {∆yt } is, we conclude that {yt } is not empirically distinguishable from an
I(1) process. Remember, however, that we did not find evidence of {yt } being I(1)
exactly!

(f) Construct an adequate set of ARIMA(p, d, q) models using information


criteria and residuals analysis.

Solution We will consider models for p = 0, .., 3; q = 0, ..., 3; d = 0, 1 and either


with or without constant and/or trend terms. There are 96 models to estimate
altogether. We use the Arima function in a nested for loop to automate the
estimation. There are two caveats to deal with.
• The Arima function with d = 1 will only specify an intercept when setting
the include.drift = T option. Since we want to incude a linear growth
term (i.e. t as a regressor) in the differenced specification, we need to pass it
as an exogenous variable to Arima through the xreg option. However, with
d = 1, Arima will also difference whatever data we pass this way, so we need
to cummulatively sum t before passing it.

18
• Some specifications will be so bad that MLE will run into numerical problems
and return an error. We want to ignore these specifications in the least
disruptive way possible. The way to do it is to embed Arima as an argument
to the try function with the silent = T option. This in general a very
useful programming technique when automating a large number of steps that
may potentially cause errors.
TT <- length(y)
ARIMA_est <- list()
ic_arima <- matrix( nrow = (2 * 2 + 2) * 4 ˆ 2, ncol = 7 )
colnames(ic_arima) <- c("d", "cons", "trend", "p", "q", "aic",
"bic")
i <- 0
for (d in 0:1)
{
for (const in 0:1)
{
for (p in 0:3)
{
for (q in 0:3)
{
i <- i + 1
d1 <- as.logical(d)
c1 <- as.logical(const)

try(silent = T, expr =
{
ARIMA_est[[i]] <- Arima(y, order = c(p, d, q),
include.constant = c1)

ic_arima[i,] <- c(d, const, 0, p, q,


ARIMA_est[[i]]$aic,
ARIMA_est[[i]]$bic)
})

if (const)
{
# only add a specification with trend if there is a
# constant (i.e., exclude no constant with trend)
i <- i + 1

if (d1)
{
x <- c(0,cumsum(1:(TT - 1)))
}
else

19
{
x <- NULL
}

try(silent = T, expr =
{
ARIMA_est[[i]] <- Arima(y, order = c(p, d, q),
xreg = x,
include.constant = c1,
include.drift = T)

ic_arima[i,] <- c(d, const, 1, p, q,


ARIMA_est[[i]]$aic,
ARIMA_est[[i]]$bic)
})
}
}
}
}
}

ic_aic_arima <- ic_arima[order(ic_arima[,6]),][1:10,]


ic_bic_arima <- ic_arima[order(ic_arima[,7]),][1:10,]

print(ic_aic_arima)

## d cons trend p q aic bic


## [1,] 0 1 1 3 0 -1385.796 -1365.742
## [2,] 0 1 1 2 1 -1385.080 -1365.026
## [3,] 0 1 1 2 0 -1384.489 -1367.778
## [4,] 0 1 1 1 2 -1383.951 -1363.897
## [5,] 0 1 1 2 2 -1383.938 -1360.542
## [6,] 0 1 1 3 1 -1383.853 -1360.456
## [7,] 0 1 1 3 3 -1383.816 -1353.735
## [8,] 1 1 1 3 1 -1383.743 -1360.380
## [9,] 0 1 1 1 3 -1383.558 -1360.162
## [10,] 0 1 1 2 3 -1382.031 -1355.293
print(ic_bic_arima)

## d cons trend p q aic bic


## [1,] 1 1 0 1 0 -1379.386 -1369.373
## [2,] 0 1 1 2 0 -1384.489 -1367.778
## [3,] 1 1 0 2 0 -1379.434 -1366.084
## [4,] 0 1 1 3 0 -1385.796 -1365.742
## [5,] 1 1 0 0 2 -1379.046 -1365.696

20
## [6,] 1 1 0 1 1 -1378.786 -1365.436
## [7,] 1 1 0 0 1 -1375.166 -1365.153
## [8,] 0 1 1 2 1 -1385.080 -1365.026
## [9,] 1 1 1 1 0 -1378.280 -1364.930
## [10,] 0 1 1 1 2 -1383.951 -1363.897
The AIC generally prefers models with d = 0, while the BIC top 10 includes a
more varied mix of integrated and non-integrated ARMAs. It also seems helpful in
this case to compute the intersecting set of the top 10 AIC and top 10 BIC ranked
specifications.
ic_int_arima <- intersect(as.data.frame(ic_aic_arima),
as.data.frame(ic_bic_arima))

print(ic_int_arima)

## d cons trend p q aic bic


## 1 0 1 1 3 0 -1385.796 -1365.742
## 2 0 1 1 2 1 -1385.080 -1365.026
## 3 0 1 1 2 0 -1384.489 -1367.778
## 4 0 1 1 1 2 -1383.951 -1363.897
We observe that the intersection contains only specifications in levels (i.e. d = 0).
However, given that a number of integrated specifications are in the top 10 ranked
by the BIC as well as our inference that {yt } is not empirically distinguishable
from an I(1) process, it is worth taking a closer look to see if any specifications in
differences (i.e., d = 1) are worth considering.
We make note of the fact that ARIMA(1, 1, 0) and ARIMA(2, 1, 0)—both with a
constant only—are in the top three of the BIC ranking. In light of this and the
above considerations, we will add ARIMA(1, 1, 0) and ARIMA(2, 1, 0) to the four
models in the intersecting set.
adq_set_arima <- as.matrix(arrange(as.data.frame(
rbind(ic_int_arima,
ic_bic_arima[c(1, 3),])),
d, const, trend, p))
adq_idx_arima <- match(data.frame(t(adq_set_arima[, 1:5])),
data.frame(t(ic_arima[, 1:5])))

Finally, we check the residuals for every model in the adequate set.
for (i in 1:length(adq_idx_arima))
{
checkresiduals(ARIMA_est[[adq_idx_arima[i]]])
}

21
Residuals from ARIMA(1,0,2) with drift

0.02

0.00

−0.02

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,2) with drift
## Q* = 6.4524, df = 5, p-value = 0.2647
##
## Model df: 5. Total lags used: 10

22
Residuals from ARIMA(2,0,1) with drift

0.02

0.00

−0.02

0 50 100 150 200

0.15 40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,1) with drift
## Q* = 5.3012, df = 5, p-value = 0.3802
##
## Model df: 5. Total lags used: 10

23
Residuals from ARIMA(2,0,0) with drift

0.02

0.00

−0.02

0 50 100 150 200

0.15 40

0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,0) with drift
## Q* = 7.0155, df = 6, p-value = 0.3194
##
## Model df: 4. Total lags used: 10

24
Residuals from ARIMA(3,0,0) with drift

0.02

0.00

−0.02

0 50 100 150 200

0.15
40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02 0.04
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,0,0) with drift
## Q* = 5.1452, df = 5, p-value = 0.3984
##
## Model df: 5. Total lags used: 10

25
Residuals from ARIMA(1,1,0) with drift

0.02

0.00

−0.02

0 50 100 150 200

0.10 40

0.05 30
ACF

df$y
0.00
20
−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0) with drift
## Q* = 8.1496, df = 8, p-value = 0.419
##
## Model df: 2. Total lags used: 10

26
Residuals from ARIMA(2,1,0) with drift

0.02

0.00

−0.02

0 50 100 150 200

40
0.10
30
0.05
ACF

df$y
0.00 20

−0.05
10
−0.10
0
0 5 10 15 20 −0.02 0.00 0.02
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,0) with drift
## Q* = 6.6179, df = 7, p-value = 0.4697
##
## Model df: 3. Total lags used: 10
All the residuals look fine so our construction of the adequate set is complete.

2. Repeat parts (a)-(e) of Question 1 for rt (you do not need to do part (f)).

Solution The approach is nearly identical to that involving y in Question 1.


The parts where we need to pay particular attention is qualitatively assessing the
information in AIC and BIC ranking tables as well as results of residuals analysis.

27
plot(dates, r, type = "l", xlab = "Time (Quarters)",
main = "Federal Funds Rate")

Federal Funds Rate


0.15
0.10
r

0.05

1960 1970 1980 1990 2000

Time (Quarters)

TT <- length(r)
ADF_est <- list()
ic <- matrix( nrow = 30, ncol = 5 )
colnames(ic) <- c("cons", "trend", "p", "aic", "bic")
i <- 0
for (const in 0:1)
{
for (p in 0:9)
{
i <- i + 1
ADF_est[[i]] <- Arima(diff(r), xreg = r[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = F)
ic[i,] <- c(const, 0, p, ADF_est[[i]]$aic,
ADF_est[[i]]$bic)
}

if (const)
{
# only add a specification with trend if there is a
# constant (i.e., exclude no constant with trend)

28
for (p in 0:9)
{
i <- i + 1
ADF_est[[i]] <- Arima(diff(r), xreg = r[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = T)
ic[i,] <- c(const, 1, p, ADF_est[[i]]$aic,
ADF_est[[i]]$bic)
}
}
}

ic_aic <- ic[order(ic[,4]),][1:10,]


ic_bic <- ic[order(ic[,5]),][1:10,]
ic_int <- intersect(as.data.frame(ic_aic),
as.data.frame(ic_bic))

print(ic_aic)

## cons trend p aic bic


## [1,] 1 0 7 -1379.096 -1345.720
## [2,] 1 0 8 -1377.897 -1341.184
## [3,] 1 1 7 -1377.218 -1340.505
## [4,] 0 0 7 -1376.067 -1346.029
## [5,] 1 1 8 -1375.956 -1335.906
## [6,] 1 0 9 -1375.907 -1335.856
## [7,] 1 0 6 -1374.959 -1344.922
## [8,] 0 0 8 -1374.456 -1341.081
## [9,] 1 1 9 -1373.974 -1330.586
## [10,] 1 1 6 -1372.964 -1339.588
print(ic_bic)

## cons trend p aic bic


## [1,] 1 0 3 -1369.151 -1349.126
## [2,] 0 0 2 -1361.784 -1348.434
## [3,] 1 0 2 -1364.814 -1348.126
## [4,] 0 0 3 -1364.796 -1348.108
## [5,] 1 0 1 -1360.578 -1347.228
## [6,] 0 0 7 -1376.067 -1346.029
## [7,] 0 0 1 -1356.003 -1345.990
## [8,] 1 0 7 -1379.096 -1345.720
## [9,] 1 0 5 -1371.988 -1345.288
## [10,] 1 0 6 -1374.959 -1344.922

29
print(ic_int)

## cons trend p aic bic


## 1 1 0 7 -1379.096 -1345.720
## 2 0 0 7 -1376.067 -1346.029
## 3 1 0 6 -1374.959 -1344.922
The intersecting set has three models: two with a constant and p = 6, 7, and one
without a constant and p = 7; no models have a trend. We may reasonably set
this to be the adequate set (although other variations are obviously justifiable as
well). Importantly, we complete the procedure with a residuals analysis.
adq_set <- as.matrix(arrange(as.data.frame(ic_int),
const, trend, p))
adq_idx <- match(data.frame(t(adq_set[, 1:3])),
data.frame(t(ic[, 1:3])))

for (i in 1:length(adq_idx))
{
checkresiduals(ADF_est[[adq_idx[i]]])
}

Residuals from Regression with ARIMA(6,0,0) errors

0.050

0.025

0.000

−0.025
0 50 100 150 200

0.1 40

30
ACF

df$y

0.0
20

−0.1 10

0
0 5 10 15 20 −0.025 0.000 0.025 0.050
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(6,0,0) errors

30
## Q* = 9.1461, df = 3, p-value = 0.02741
##
## Model df: 8. Total lags used: 11

Residuals from Regression with ARIMA(7,0,0) errors

0.050

0.025

0.000

−0.025
0 50 100 150 200

40
0.1
30
ACF

df$y

0.0
20

10
−0.1

0
0 5 10 15 20 −0.025 0.000 0.025 0.050
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(7,0,0) errors
## Q* = 3.3278, df = 3, p-value = 0.3438
##
## Model df: 9. Total lags used: 12

31
Residuals from Regression with ARIMA(7,0,0) errors

0.050

0.025

0.000

−0.025
0 50 100 150 200

0.1 40

30
ACF

0.0

df$y
20

−0.1 10

0
0 5 10 15 20 −0.025 0.000 0.025 0.050
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(7,0,0) errors
## Q* = 3.4685, df = 3, p-value = 0.3249
##
## Model df: 8. Total lags used: 11
Residuals all look reasonable, although some autocorrelations exceed the 95%
confidence intervals very slightly (and at larger lags). For robustness, we might
add a few models with a larger p into the adequate set, such as p = 8, 9.
Proceeding the the ADF test, the default option p ≤ 5 implemented in the
adf.test function is insuffcient, so we need to explicitly specify longer lag lengths
with the nlag option.
adf.test(r, nlag = 10)

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -0.884 0.362
## [2,] 1 -1.190 0.253
## [3,] 2 -0.937 0.344
## [4,] 3 -1.120 0.278

32
## [5,] 4 -1.110 0.281
## [6,] 5 -1.324 0.205
## [7,] 6 -1.087 0.290
## [8,] 7 -0.871 0.367
## [9,] 8 -0.908 0.354
## [10,] 9 -0.880 0.364
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -2.30 0.2106
## [2,] 1 -2.86 0.0549
## [3,] 2 -2.40 0.1734
## [4,] 3 -2.76 0.0714
## [5,] 4 -2.71 0.0788
## [6,] 5 -3.16 0.0244
## [7,] 6 -2.70 0.0804
## [8,] 7 -2.26 0.2244
## [9,] 8 -2.37 0.1833
## [10,] 9 -2.31 0.2055
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -2.24 0.4757
## [2,] 1 -2.81 0.2363
## [3,] 2 -2.34 0.4328
## [4,] 3 -2.71 0.2769
## [5,] 4 -2.67 0.2935
## [6,] 5 -3.14 0.0992
## [7,] 6 -2.66 0.2974
## [8,] 7 -2.20 0.4895
## [9,] 8 -2.31 0.4439
## [10,] 9 -2.25 0.4693
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
Only “Type 1” and “Type 2” specifications are in our adequate set, so we ignore
the output related to “Type 3”. For all specifications in our adequate set, the null
of a unit root cannot be rejected.
Note, however, that there are specifications for which the null is indeed rejected.
One such specification includes a constant, but no trend contains p = 5 lags. We
observe that this is the 9th ranked specification by the BIC, and it is outside the
top 10 in terms of AIC.
We can summarise our inference as follows. The process is not empirically distin-
guishable from an integrated process with a drift, but there is a small element of
uncertainty in this conclusion.
Next, we repeat the construction of an adequate set of ADF regressions of ∆rt .

33
TT <- length(diff(r))
ADF_est_diff <- list()
ic_diff <- matrix( nrow = 30, ncol = 5 )
colnames(ic_diff) <- c("cons", "trend", "p", "aic", "bic")
i <- 0
for (const in 0:1)
{
for (p in 0:9)
{
i <- i + 1
ADF_est_diff[[i]] <- Arima(diff(diff(r)),
xreg = diff(r)[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = F)
ic_diff[i,] <- c(const, 0, p, ADF_est_diff[[i]]$aic,
ADF_est_diff[[i]]$bic)
}

if (const)
{
# only add a specification with trend if there is a
# constant (i.e., exclude no constant with trend)
for (p in 0:9)
{
i <- i + 1
ADF_est_diff[[i]] <- Arima(diff(diff(r)),
xreg = diff(r)[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = T)
ic_diff[i,] <- c(const, 1, p, ADF_est_diff[[i]]$aic,
ADF_est_diff[[i]]$bic)
}
}
}

ic_aic_diff <- ic_diff[order(ic_diff[,4]),][1:10,]


ic_bic_diff <- ic_diff[order(ic_diff[,5]),][1:10,]
ic_int_diff <- intersect(as.data.frame(ic_aic_diff),
as.data.frame(ic_bic_diff))

print(ic_aic_diff)

## cons trend p aic bic

34
## [1,] 0 0 6 -1368.762 -1342.100
## [2,] 0 0 7 -1367.974 -1337.980
## [3,] 1 0 6 -1366.834 -1336.839
## [4,] 1 0 7 -1366.045 -1332.718
## [5,] 0 0 8 -1365.990 -1332.662
## [6,] 1 1 6 -1365.403 -1332.076
## [7,] 0 0 9 -1364.708 -1328.048
## [8,] 1 1 7 -1364.569 -1327.909
## [9,] 1 0 8 -1364.061 -1327.401
## [10,] 1 0 9 -1362.778 -1322.785
print(ic_bic_diff)

## cons trend p aic bic


## [1,] 0 0 2 -1357.902 -1344.572
## [2,] 0 0 0 -1349.094 -1342.429
## [3,] 0 0 6 -1368.762 -1342.100
## [4,] 0 0 3 -1356.028 -1339.364
## [5,] 1 0 2 -1355.968 -1339.305
## [6,] 0 0 4 -1358.595 -1338.599
## [7,] 0 0 5 -1361.729 -1338.400
## [8,] 0 0 1 -1348.330 -1338.332
## [9,] 0 0 7 -1367.974 -1337.980
## [10,] 1 0 0 -1347.158 -1337.160
print(ic_int_diff)

## cons trend p aic bic


## 1 0 0 6 -1368.762 -1342.10
## 2 0 0 7 -1367.974 -1337.98
There are only two specifications in the intersecting set: both with no constant
or trend and p = 6, 7. Overall, both AIC and BIC favour specifications without
a trend, and AIC generally favours larger lag lengths, whereas the BIC yields a
more dispersed ranking. Taking the intersection as the adquate set, we proceed to
the residuals analysis.
adq_set_diff <- as.matrix(arrange(
as.data.frame(ic_int_diff),
const, trend, p))
adq_idx_diff <- match(data.frame(t(adq_set_diff[, 1:3])),
data.frame(t(ic_diff[, 1:3])))

for (i in 1:length(adq_idx_diff))
{
checkresiduals(ADF_est_diff[[adq_idx_diff[i]]])
}

35
Residuals from Regression with ARIMA(6,0,0) errors

0.050

0.025

0.000

−0.025
0 50 100 150 200

40
0.1
30
ACF

0.0

df$y
20

−0.1 10

0
0 5 10 15 20 −0.02 0.00 0.02 0.04 0.06
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(6,0,0) errors
## Q* = 2.1855, df = 3, p-value = 0.5348
##
## Model df: 7. Total lags used: 10

36
Residuals from Regression with ARIMA(7,0,0) errors
0.06

0.04

0.02

0.00

−0.02

0 50 100 150 200

0.1 40

30
ACF

df$y
0.0
20

−0.1 10

0
0 5 10 15 20 −0.025 0.000 0.025 0.050
Lag residuals

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(7,0,0) errors
## Q* = 3.424, df = 3, p-value = 0.3308
##
## Model df: 8. Total lags used: 11
As with levels, residuals look ok, but with some indication that longer lag lengths
should be considered. Finally, we implement the ADF test.
adf.test(diff(r), nlag = 10)

## Augmented Dickey-Fuller Test


## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -11.28 0.01
## [2,] 1 -10.80 0.01
## [3,] 2 -7.36 0.01
## [4,] 3 -6.59 0.01
## [5,] 4 -5.07 0.01
## [6,] 5 -5.70 0.01
## [7,] 6 -6.43 0.01
## [8,] 7 -5.56 0.01

37
## [9,] 8 -5.28 0.01
## [10,] 9 -5.40 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -11.26 0.01
## [2,] 1 -10.78 0.01
## [3,] 2 -7.35 0.01
## [4,] 3 -6.58 0.01
## [5,] 4 -5.06 0.01
## [6,] 5 -5.69 0.01
## [7,] 6 -6.42 0.01
## [8,] 7 -5.55 0.01
## [9,] 8 -5.27 0.01
## [10,] 9 -5.39 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -11.25 0.01
## [2,] 1 -10.78 0.01
## [3,] 2 -7.35 0.01
## [4,] 3 -6.58 0.01
## [5,] 4 -5.05 0.01
## [6,] 5 -5.69 0.01
## [7,] 6 -6.43 0.01
## [8,] 7 -5.56 0.01
## [9,] 8 -5.28 0.01
## [10,] 9 -5.42 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
A unit root is rejected at very small significance level for all specifications. We
infer that FFR is not empirically distinguishable from an I(1) process.

38

You might also like