Tutorial3 EBC2090
Tutorial3 EBC2090
Tutorial Assignments
Getting Started in R
You will have to analyze your data in R. R is a computer programming language that can be used for
statistical analyses. You can find a manual on http://www.r-project.org/, following “Manuals” to “An
Introduction to R”, or you can consult an interactive introductory course on https://www.datacamp.com.
    Throughout the tutorial assignments initial help will be provided to appropriately analyze your data in
R. !! Importantly, we will provide initial help for specific functions the first time that you will need to make
use of this function. If in a later assignment (in the same tutorial or in a next tutorial), you will again
have to make use of a function that was introduced to you already earlier on, then that function will not be
repeated !!
Software Access
Start by download R from https://cran.r-project.org. A user-friendly interface is available via RStudio,
to be downloaded from https://www.rstudio.com/products/rstudio/download. You will need to have
downloaded R and RStudio preferably before the first lecture, but definitely before handing in the first
tutorial assignment in week 1 as you will need it!
   When you open RStudio, the screen is divided into three main windows, as shown in Figure 1. First, the
console (left) where you can directly give in commands and where output is returned. Under the default text
that is already given in this window, you can see the “>” sign, followed by the text cursor. This is called the
“prompt”, through which R indicates that it is ready to execute a new command. Second, the environment
window (top right) gives an overview of all objects in memory. Third, in the bottom right window, plots are
returned, help files can be accessed and packages can be downloaded.
First Steps in R
R can be used as a simple calculator. Try to enter
 1350 + 6750
in the console. R will give you the following answer:
 [1] 8100
   If you want to re-execute your last command, you do not have to type it in all over again. Just press on
the upwards arrow button on your keyboard and the last command will reappear. If you want, you can now
make adjustment to this command by using the left and right arrow buttons.
Creating objects. You can also create objects in R. You could consider an object to be a “box” to which
you give a name and in which you store data such as numbers, vectors, matrices, etc. Suppose that we want
to create the object “x” to which we want to attribute the value “5”. You can do this by executing the
command
 x <- 5
where you attribute, through the command <-, the value 5 to the object x.
                                                        1
                               Figure 1: RStudio and its different windows.
   If you would now want to ask R what the object “x” contains, you can simple give the command
 x
and R will reply with
 [1] 5
Similarly, you can attribute a vector to “x” through the function c() where the elements in the vector
are separated by a comma:
  x <- c(3,7,10)
If you now ask R what the object “x” contains, R will reply with
  [1] 3     7    10
You can also access individual elements in a vector through the brackets [ ]. For instance: if you would
like to find out what the second element of “x” is, you can give the command
  x[2]
and R will reply with
  [1] 7
    When assigning objects, please take into account that you cannot use object names that belong to R’s
internal vocabulary. For example, you cannot create an object with the name “sqrt”, as this is reserved for
computing the square root of a number. Furthermore, R is case sensitive! “x” and “X” are thus two different
objects!
Logarithm and differences. Throughout the course, we will often make use of logarithmic and/or dif-
ference transformations to make our time series stationary. Let “x” again be defined as the vector described
previously:
 x <- c(3,7,10)
You can now create a new object called “log x” that represents the natural logarithm of “x”
 log_x <- log(x)
To see the value of “log x”, type in
 log_x
in the console and you get
 [1] 1.098612 1.945910 2.302585
                                                    Page 2
                                    Figure 2: RStudio and R scripts.
 d_x
 [1] 4 3
which indeed returns the differences between the consecutive elements in the vector (i.e. 7 − 3 = 4 and
10 − 7 = 3). Finally, let us compute “dlog x” as the first differences of the log-transformed “x”
 dlog_x <- diff(log(x))
 dlog_x
 [1] 0.8472979 0.3566749
Computing such a log-difference transformation will turn out to be useful to obtain growth rates of our time
series, as you will learn during the tutorials.
    If you ever need some further documentation on one of R’s functions, for example on the log function,
you can use the question mark functionality:
 ?log
after which the function documentation pops up in the bottom right window of RStudio.
    If you would like to execute a function, for example taking the logarithm of a number, but you do not
know the exact name of the function, you can try:
 ??logarithm
and several documentations files will be suggested.
R Scripts
Suppose that you have been working in R for several hours, but it is getting late and you want to continue
your work tomorrow. If you would now close R, all of your work would be gone! To avoid this problem, we
will not give R commands directly through the R console, but save them in an R script. Hence, for your
own records, and with a view on preparing your final paper, it is a good idea to keep a systematic record of
all your workflow through R scripts.
    You can open a new R script by clicking in the menu bar (at the top of RStudio) on “File”, “New File”,
“R Script”. The left panel then gets divided into two windows, see Figure 2: The top one is your R script,
the bottom one is the console we have been using until now. In the R script, you can now enter commands
like we did before and execute them by first selecting the command and then clicking “Run” (keyboard short
cuts are available but depend on your system). When you are finished working in R, save the script (“File”
and then “Save as”). You can later re-open these R scripts in RStudio to continue working with them.
    When you write R scripts, the code can become very long and obscure. To clarify your work, you can
use commentary lines in R to provide your code with additional info. Such commentary lines should always
be start with the “#” sign. You could even include output of your analysis as comments in your R script.
                                                  Page 3
Data Source
The data sets provided to you come from the Penn World Table (PWT) version 10.0. PWT is a secondary
data source, conveniently and freely accessible through the website of the Groningen Growth and Develop-
ment Centre (GGDC), where you will also find detailed and up-to-date documentation: www.rug.nl/ggdc/
productivity/pwt.
   When using these data, please refer to the following paper: Feenstra, Robert C., Robert Inklaar and
Marcel P. Timmer (2015), “The Next Generation of the Penn World Table” American Economic Review,
105(10), 3150-3182, available for download at www.ggdc.net/pwt.
                                                       Page 4
                 Table 1: Overview of key GDP components and identifier variables.
                                                Page 5
Tutorial 1: Exploring R and Reviewing Regression Analysis
In this tutorial, you will learn to work with R, you will inspect your data through plots and you will review
the basics of regression analysis.
  1. Getting started in R.
      (a) Read the section “Getting Started in R” at the start of this document to get you started with
          this tutorial.
      (b) Start by creating a directory on your computer, for instance, “tutorialsEBC2090”. This directory
          should contain all files we use in this tutorial assignment.
      (c) Go to Canvas and download the data file of your country. This is an .RData file. For instance,
          the data file for the Netherlands is “NLD data.RData”. Save the RData file of your country in
          the directory on your computer that you have just created (i.e. tutorialsEBC2090 in my case).
      (d) Open RStudio and open a new R script. Give it an appropriate name, for instance, “EBC2090-
          tutorial1” and save the file. To do this, click (in the menu bar at the top of RStudio) on “File” and
          then “Save as”. Enter EBC2090-tutorial1 (or another file name) under “File name” and navigate
          to the directory of your choosing (tutorialsEBC2090 in my case) to save the file there. You will
          see that the file will be saved as an .R file.
      (e) It is good practice to start your R script by clearing your environment in R, this can be done by
          typing the following line into your R script
           rm(list=ls())
          and then pressing “Run” to execute it. To get more information on this function, remember that
          you can execute the code ?rm and consult the corresponding documentation in the help-window.
      (f) Next, you need to tell R the location of your working directory. This is the directory “tutori-
          alsEBC2090” where we will save all the files we use in this tutorial. You can do this by clicking on
          “Session”, “Set Working Directory”, and finally “Choose Directory”. Now scroll to the location of
          your directory tutorialsEBC2090 and click on Open. You will see that this executes the command
          in the form of
           setwd("C:/..../tutorialsEBC2090")
          in the R console to set the working directory in R. Note that the “...” in the command above will
          not appear since the specific path will be dependent on the location of your directory on your
          laptop, and hence different for everyone. Copy this command from the console into your R script
          (on a new line) such that you can remember it and execute it for later use!
  2. Importing the data.
      (a) We are now ready to import our data into R. To load your data set into R, you can type the
          command
            load("NLD_data.RData")
          into your R script and execute it. Naturally, if your country is not the Netherlands, you need to
          write the appropriate name of your RData file here but also in the remainder of the exercises! You
          should notice that in the environment window (top right panel of RStudio), the object “NLD data”
          is now listed. If so, then you have successfully imported your data into R!
      (b) Let us now inspect all variables that are included in your data file. To to this, type the command
            View(NLD_data)
          into your R script and execute it. This opens up a new window (new header will appear next
          to your R script) with a spreadsheet type of view on your data. Scroll through your data set to
          inspect it.
      (c) If you want to know the names given to your variables in your data set, you can use
            names(NLD_data)
          By giving R the command
          attach(NLD_data)
          you can now address your variables with the names that were given to them!
                                                   Page 6
3. Time series plots. Visually inspect your data. That is the way to get to know your data and to trace
   data errors. We start by making time series (line) plots.
   (a) R (as any other software package) does not assume your variable is a time series by default, instead
       it assumes it is a typical numerical variable (for numeric data). We thus need to explicitly declare
       that the variable Y U is a time series. In R, create a new time series object, “YU ts”, by using
       the function ts:
         YU_ts <- ts(NLD_data$YU, start = 1950, frequency = 1)
       where we tell R that the data set starts in year 1950 (start = 1950; ! check this, as this may be
       different for your country!) and the data are annual (frequency = 1).
   (b) Now make a time series plot by using the command
        ts.plot(YU_ts)
       Discuss the properties of the time series. Note that there are many additional arguments in the
       plot function, to change the axis labels, to make the line thicker, .... you can explore these on
       your own via the documentation provided in R.
   (c) Optional Tip: It is convenient to save your plots as separate files, such that you can show them in
       class or later include them in your paper. To save a figure in, for instance, .pdf format, you first
       tell R to open a pdf file, where you give the file a name (time-plot-YU), and also set the width
       and the height of the file. On the next line you then write the command for the figure you want
       to plot. Then R will fill in the .pdf file with the figure (possibly even several figures on subsequent
       pages!). You need to finally tell R (on the third line below) that it should close the .pdf file, as
       you do not want to further add content to it:
         pdf(file = "time-plot-YU.pdf", width = 6, height = 6)
         ts.plot(YU_ts)
         dev.off()
       If you successfully created and closed the pdf file, it should appear in your working directory and
       you can open the pdf file to inspect your plot! Final note: if you want to overwrite the content of
       your pdf-file (for instance run the code again if you noticed you made a mistake), make sure that
       your pdf file is closed on your laptop, otherwise R can not (over-)write the file!
   (d) Now let us plot two time series on the same graph: namely Y U and Y O. You should start by
       declaring the latter also as a time series (see instructions above!), give it the name“YO ts”. To
       plot several time series on the same plot, you may use:
         ts.plot(YU_ts, YO_ts, ylab = "YU versus YO", col = c("blue", "black"))
       where we now specified what R should use as label on the vertical axis (via ylab), and where
       we indicate that the first series should be visualized in blue, the second in black; you can choose
       different colors! To add a legend to your graph, you can execute the following command after
       your plot command:
         legend("topleft", legend = c("YU", "YO"), col = c("blue", "black"), lty = c(1,1))
       where you first indicate the position of the legend, then the argument legend specifies the text
       that needs to be displayed, followed by the colors for the lines and the line type; where lty=c(1,1)
       indicates that both lines– hence you use a vector –are corresponding to line type 1; namely a solid
       line. Note that you can also add the lty argument in the ts.plot function; try what happens if you
       use lty=c(2,2)...
       Discuss the figure. Where do the series cross, and why? Are the series trending over time or do
       they fluctuate around a constant mean?
   (e) Now repeat the same exercise, hereby plotting Y U , Y O, IU , IO all on one graph! Discuss the
       figure as you did above.
4. Scatter plots. Now inspect scatter plots, relating one variable to the other. Make a scatter of IO on
   Y O: The first variable on the vertical axis, the second on the horizontal axis via the command
    plot(x = YO, y = IO)
                                                 Page 7
   (b) Observe whether or not a relationship seems to emerge, and whether or not it could be approxi-
       mated by a linear function.
5. Simple Regressions. Run a simple regression of investment (in constant prices) on output (in constant
   prices) and a constant term:
                                        IOt = β0 + β1 Y Ot + ut .                                    (1)
  In R, use the function lm to estimate a linear regression model. You can name the object however you
  want, here we give the following name:
   fit_IO_on_YO <- lm(IO~YO)
  Note that an intercept is included by default. Then ask for a summary of your fitted linear regression
  model:
   summary(fit_IO_on_YO)
   (a) What are the values of the estimates β̂0 and β̂1 ?
   (b) Interpret the coefficient β̂1 .
    (c) Is the variable Y Ot significant at 5% significance level? Answer this question in three different
        ways: based on the (i) t-statistic, (ii) p-value, (iii) 95% confidence interval around β1 . Note that
        the former two are displayed in the output, but you need to compute the 95% confidence interval
        manually!
   (d) You may also ask R to compute the 95% confidence interval:
        confint(fit_IO_on_YO, parm= "YO", level = 0.95)
       Verify your manual computation against the R output!
    (e) Inspect the overall goodness-of-fit in terms of the R2 . Give its value and interpret it. Do you
        notice something unusual?
6. Residual Inspection. Closely inspect the residuals of your estimated regression of IOt on Y Ot . In R,
   the residuals of the regression are saved in your fitted regression object fit IO on YO (together with a
   lot of other useful information). To see what information is saved in the fitted object, ask for:
    names(fit_IO_on_YO)
   you will notice that there is a slot “residuals” which indeed contains the residuals of the estimated
   regression. You can access any output slot in the fitted regression object through the dollar symbol,
   hence to plot the residuals, use
    plot(fit_IO_on_YO$residuals, type = "l")
   where the type argument indicates that you want to make a line graph.
   (a) Examine the time pattern of the residuals. Does it seem that the residual series is distributed in
       accordance with the assumption of random sampling?
   (b) Plot the residual series against Y Ot in a scatter plot (the former on the vertical axis, the latter
       on the horizontal axis). Does a visual inspection of the residual series suggest that they satisfy
       the assumption of constant variance (homoskedasticity)?
    (c) Plot a histogram summarising the frequency distribution of the residuals using the command
         hist(fit_IO_on_YO$residuals)
        Does the assumption of normality seem plausible?
   (d) Formally whether the residuals are normally distributed using the Jarque-Bera test to test the
       null of normality. It measures how much the skewness (asymmetry) and the kurtosis (curvature
       and tail thickness) of the residual series differ from those of a Normal distribution.
       This test is contained in a specific R library, namely the library tseries. So first we need to
       install this library in R. You can install this library in R by going to the bottom left panel in
       R, click in the menu bar on “Packages”, then “Install”. Type the name of the package, namely
       tseries, and click on “Install”. R will install the package for you. Once this is done, then you
                                                 Page 8
        need to load the library into R, such that you can access the functions in this library. To load the
        library in R, use the command
          library(tseries)
        You can now perform the Jarque-Bera test:
          jarque.bera.test(fit_IO_on_YO$residuals)
        What is your conclusion?
        Important note on R libraries: Installing a library only needs to be done once, but in case you
        would like to make use of functions in the library, you need to load the library in every R session!
ln IOt = β0 + β1 ln Y Ot + ut . (2)
   (a) Generate the variable ln IOt which is the natural logarithm of the variable IOt :
        lnIO_ts <- log(IO_ts)
       Do the same for ln Y Ot .
   (b) Make a time series plot of ln IOt , ln Y Ot . How does this plot compare/differ to the one you made
       of IOt , Y Ot ? Can you think of reasons to apply the log-transformation?
    (c) Make a scatter plot of ln IOt , ln Y Ot . Do you see a relationship emerging, and could it be
        approximated by a linear function?
   (d) Estimate the simple regression model in equation (2).
    (e) What are the values of the estimates β̂0 and β̂1 ?
    (f) Interpret the coefficient β̂1 . Be careful!, it has a different interpretation than in regression (1)!
   (g) Is the variable ln Y Ot significant at 5% significance level? Answer this question in three different
       ways: based on the (i) t-statistic, (ii) p-value, (iii) 95% confidence interval around β1 .
   (h) Give the value of the R2 and interpret it.
    (i) Inspect the residuals of the log-log model as you did in part 5. What are your conclusions?
∆ ln IOt = β0 + β1 ∆ ln Y Ot + ut . (3)
  where ∆ ln IOt = ln IOt − ln IOt−1 . The term “log-difference” is short for (first) logarithmic difference.
  A log-difference should be interpreted as a rate of change (or growth rate). Log-differences are often
  preferable to ordinary percentage changes because unlike percentage changes they are additive and
  symmetric. It is important to understand the properties of logarithms! (See e.g. Appendix A.4 of
  Wooldridge.)
   (a) Generate the variable dlnIO which is the first difference of lnIO using the command:
        dlnIO_ts <- diff(log(IO_ts))
       Do the same for ∆ ln Y Ot .
   (b) How many observations are available for the variable IOt and how many are available for ∆ ln IOt ?
       Explain the difference!
    (c) Make a time series plot of ∆ ln IOt , ∆ ln Y Ot . How does this plot compare/differ to the one you
        made of ln IOt , ln Y Ot ?
   (d) Make a scatter plot of ∆ ln IOt , ∆ ln Y Ot . Do you see a relationship emerging, an could it be
       approximated by a linear function?
    (e) Estimate the simple regression model in equation (3).
    (f) What are the values of the estimates β̂0 and β̂1 ?
                                                  Page 9
(g) Interpret the coefficient β̂1 .
(h) Is the variable ∆ ln Y Ot significant at 5% significance level? Answer this question in three different
    ways: based on the (i) t-statistic, (ii) p-value, (iii) 95% confidence interval around β1 .
(i) Give the value of the R2 and interpret it. Do you observe a difference compared to the earlier
    regressions you ran?
(j) Inspect the residuals of the model in log-differences as you did in part 5. What are your conclu-
    sions?
                                             Page 10
Tutorial 2: Basic Time Series Regressions
In this tutorial, you will discuss basic time series concepts and time series regressions while revising how to
perform a joint hypothesis test. By default, we work with a significance level of 5% in this tutorial and in
the next ones!
    !! Important Reminder: We will provide initial help for specific functions the first time that you will
need to make use of this function. Hence, functions introduced in the previous tutorial assignment(s) will
not be repeated here. You can always look back at the previous tutorial assignments if you no longer know
what the appropriate functions to use are !!
  1. Setting up R. Set-up your R script as you did for the first tutorial, so: set your working directory, and
     import your data into R.
  2. Visual Inspection of Stationarity.
      (a) Make a time series plot of ln IOt and ln Y Ot . Are these time series stationary? Discuss.
      (b) Make a correlogram of ln IOt and ln Y Ot . Discuss the values of the autocorrelations at the first
          couple of lags.
          In R, use the command
           acf(lnIO_ts)
          to display the correlogram of a particular time series (assuming you created a time series object
          for the log-transformed IO variable).
       (c) Do the same for ∆ ln IOt and ∆ ln Y Ot : discuss stationarity based on the time series plot and
           discuss the values of the autocorrelations.
3. Autoregressive Model for ln IOt . Consider the AutoRegression of order 1, denoted AR(1), for ln IOt :
      (a) To generate the response and predictor variable in regression model (4), you can make use of the
          function embed:
           lags_lnIO <- embed(lnIO, dimension = 2)
          which generates a new matrix where the response ln IOt is contained in the first column and the
          predictor ln IOt−1 is contained in the second column. More lags can be obtained by adjusting the
          argument dimension.
      (b) Inspect the newly created matrix lags IO via the function View. How many observations (rows)
          does the matrix lags IO have?
       (c) You can then access the first column in a matrix via [, 1], and the second column via [, 2] to
           generate the response and predictor needed for estimating model (4):
            lnIO_0 <- lags_lnIO[, 1]
            lnIO_1 <- lags_lnIO[, 2]
           where we use the notation x to denote the xth lag of a certain variable.
      (d) Estimate the AR(1) model in equation (4) using the function lm.
       (e) Interpret the value of the estimate β̂1 .
       (f) What does the value of the estimate β̂1 tell you about the stationarity of the series ln IOt ?
4. Autoregressive Model for ∆ ln IOt . Consider the AR(1) model for ∆ ln IOt :
      (a) Generate the variable dlnIO (the series IO in first differences). Then generate the variable dlnIO 0
          and dlnIO 1 (respectively the response and predictor in equation (5)) via the function embed.
                                                       Page 11
   (b) Estimate the AR(1) model in equation (5). How many observations are used to estimate this
       model?
    (c) Interpret the value of the estimate β̂1 .
   (d) What does the value of the estimate β̂1 tell you about the stationarity of the series ∆ ln IOt ?
  We will return to the topic of stationarity, unit roots and unit root tests in Tutorial 3! In this tutorial,
  let us consider static time series regression models, finite distributed lag models and autoregressive
  distributed lag models for ln IOt .
ln IOt = β0 + β1 ln Y Ot + ut . (6)
   (a) Explain what a static time series regression means and why the regression in equation (6) is one.
   (b) Estimate model (6). Make a (i) line plot of the residuals, as well as a (ii) correlogram of the
       residuals. Are the residuals autocorrelated?
    (c) In case the residuals are autocorrelated: does this cause the OLS estimator to be biased?
   (d) In case the residuals are autocorrelated: does this cause problems for inference (t-statistics, p-
       values, ....). If so, can you think if solutions to circumvent this problem?
6. Finite Distributed Lag Model. Estimate the Finite Distributed Lag model of order one, denoted as
   FDL(1):
                              ln IOt = β0 + β1 ln Y Ot + β5 ln Y Ot−1 + ut .                    (7)
  Note: It will become clear later on in the assignment why we use β5 and not β2 in front of ln Y Ot−1 .
   (a) Explain what a dynamic time series regression means and why the regression in equation (7) is
       one.
   (b) Generate the variables lnYO 0 and lnYO 1 by using the function embed.
    (c) Estimate the FDL(1) model in equation (7). What would happen if you execute the following
        code in R:
         fit_FDL <- lm(lnIO ~ lnYO_0 + lnYO_1)
   (d) To remove the first observation from a vector, you may use the notation [-1]. Generate the new
       variable:
        lnIO_0 <- lnIO[-1]
       Discuss why the following regression will give you the desired outcome:
        fit_FDL <- lm(lnIO_0 ~ lnYO_0 + lnYO_1)
       Note: we have now over-written the variable lnIO 0 since we also generated this variable in
       Assignment 3(c) above. But in fact, the definition of lnIO 0 here and the one given in Assignment
       3(c) gives you exactly the same result. Discuss!
    (e) Based on the regression output, (manually) draw a picture of the lag distribution: summarizing
        the effect of ln Y O on ln IO at lag zero, one and two.
    (f) What is the value of the estimated impact multiplier?
   (g) What is the value of the estimated long-run multiplier?
   (h) Test the joint null hypothesis H0 : β1 = β5 = 0 versus the alternative that at least one of the two
       betas is different from zero.
       In R, this joint hypothesis test, where we test the joint nullity of all regression parameters (apart
       from the intercept) is by default reported in the summary output of your lm object, namely on
       the last line. What is the value of the F -statistic? What is the corresponding p-value? What do
       you conclude?
                                                    Page 12
7. Recap Multiple Hypothesis Testing. Consider the choice between current and constant prices. Extend-
   ing the investment function with price indexes allows a formal comparison between nominal and real
   specifications, by means of statistical hypothesis tests. We will work with the implicit price deflators
                                              Y Ut                          IUt
                                     P Yt =              and       P It =       .
                                              Y Ot                          IOt
   (a) Run the extended regression of real investment on a constant, current and lagged real output,
       both price indexes, and one lagged price index:
        You need to generate all your variables first. Assume that you name the fitted regression model
        in (8) fit lnIO ur. Then present your regression output and test the separate hypotheses that
        each price coefficient separately (β2 , β3 , β4 ) is in fact zero.
   (b) Now consider the hypothesis that the price coefficients in equation (8) are all three zero:
H0 : β 2 = β 3 = β 4 = 0
        (versus the alternative that at least one is different from zero). Note that if the hypothesis is true,
        then regression (8) reduces to the regression (7).
        Give the formula in Wooldridge to test this joint hypothesis.
    (c) We will start by computing the F -statistic manually in R. You will have to compute sum of
        squared residuals (SSR) of two regression models. Which regression models? You can use the
        following code to obtain the SSR of, for instance, regression model (8):
         SSR_ur <- sum(fit_lnIO_ur$residuals^2)
        What is the value of the F -statistic? What are the degrees of freedom? Do you reject the null
        hypothesis H0 : β2 = β3 = β4 = 0 or not?
        Note that you can compute the critical values of the F-distribution in R via the function qf. Use
        the documentation in R to appropriately fill in the arguments in the function to compute the
        critical value for your country!
   (d) We can also opt to directly perform the F -test in R. To this end, we can use the linearHypothesis
       function in the R package car. Start by installing the package car. You can then use the following
       command to directly obtain the F -test:
        linearHypothesis(fit_lnIO_ur, c("lnPI_0=0", "lnPY_0=0", "lnPI_1=0"), test="F")
       where you simply write out the restrictions under the null hypothesis by refering to the variable
       names of the corresponding parameters. The argument test="F" ensures that you compute the
       F -test. Does the output provided by R match with your manual computation? Interpret the
       output.
    (e) Next, an important hypothesis is that of price homogeneity, i.e., the theory that absolute price
        levels are unimportant and instead only relative prices matter. Price homogeneity is a theoretical
        property considered as desirable in economic models, at least in the long run. It implies the
        absence of money illusion. Define the relative price of investment goods and services, P IRt :
                                                                 P It
                                                       P IRt ≡        .
                                                                 P Yt
        Under strict price homogeneity, the three price indexes in the regression 8 may be replaced by
        the single relative price, P IRt :
        Write down the null hypothesis of strict price homogeneity in terms of the regression coefficients
        of 8 (so in terms of the βs!). To this end, start from equation 9 and plug in the definition of P IRt .
        Which restrictions on the regression coefficients of 8 arise then?
                                                     Page 13
   (f) Test the null hypothesis of strict price homogeneity using the linearHypothesis function in R.
       What do you conclude?
   (g) A weaker version of price homogeneity would allow for short-run deviations of strict homogeneity.
       One way to do this is to introduce an effect of investment price inflation dlnPIt = ∆ ln P It next
       to relative prices:
       Here absolute price levels still play no role, but the rhythm of price changes does; price homo-
       geneity holds only in the longer run.
       Write down the hypothesis of weak price homogeneity in terms of the regression coefficients of (8)
       (so in terms of the βs!). To this end, start from equation (10) and plug in the definition of P IRt
       and ∆ ln P It . Which restrictions on the regression coefficients of (8) arise then?
   (h) Test the null hypothesis of weak price homogeneity. What do you conclude?
    (i) Finally, test the hypothesis that (8) simplifies to a simple relation between nominal investment
        and nominal output:
                                           ln IUt = γ0 + γ1 ln Y Ut + ut .                           (11)
       This hypothesis too implies joint coefficient restrictions , and you need to find out precisely what
       these restrictions are. Start from equation (11) and plug in the definitions of IUt and Y Ut . Which
       restrictions on the regression coefficients of (8) arise then (so in terms of the βs!)?
   (j) Test the null hypothesis you derived in part (h). What do you conclude?
   (a) Explain why the regression in equation (12) is a dynamic time series regression.
   (b) Explain in words (so intuitively) the difference between the FDL(1) in equation (7) and the
       ARDL(1,1) in equation (12).
   (c) Estimate the ARDL(1,1) model in R using the lm function.
   (d) What is the value of the impact multiplier?
   (e) What is the value of the long-run multiplier? To this end, start from the equilibrium model (see
       model with ? notation on the lecture slides) and solve for the coefficient in front of ln Y O.
                                                Page 14
Tutorial 3: Unit Roots, Trends, Unit Root Tests and Spurious
Regressions
In this tutorial, you will learn how to perform unit root tests, how to determine the order of integration of
a series, how to recognize spurious regressions and you will dive into one of its solutions: ARDL models.
  1. Setting up R. Set-up your R script as you did for the previous tutorials, so: set your working directory,
     and import your data into R.
  2. Visual Inspection of Stationarity (Recap)
      (a) Make a time series plot of ln IOt and ln Y Ot . Are these time series likely to be stationary?
          Discuss.
      (b) Make a time series plot of ∆ ln IOt and ∆ ln Y Ot . Are these time series likely to be stationary?
          Discuss.
3. Visual Inspection of Trends. Consider the regression model for ln IOt with a trend:
ln IOt = β0 + β1 t + ut . (13)
     For the trend t you can generate a variable trend in R via the commands:
     n <- length(lnIO)
     trend <- 1:n
     where the function length returns the length of the variable (hence it gives you the sample size n),
     and the command 1:n simply returns you a sequence of numbers from 1 to n in steps of one.
      (a) Estimate regression model (13) using the lm function and carefully interpret the estimated coef-
          ficient β̂1 .
      (b) Save the residual of model (13). What do the residuals intuitively represent?
      (c) Make a time series plot of the residual series. Do you think it is likely that ln IOt has a deter-
          ministic or a stochastic trend? Explain the difference between both in your answer!
      (d) If a series is trend stationary, what does this mean? Does it then have a deterministic or a
          stochastic trend?
      (e) Repeat the same exercise for ln Y Ot . What is your conclusion: is ln Y Ot likely to have a deter-
          ministic or a stochastic trend?
  4. Dickey-Fuller Unit Root Test (with constant and trend). To formally test whether a series has a
     stochastic or a deterministic trend, we need to perform a unit root test with constant and trend.
      (a) What is the null hypothesis of this unit root test? What is the alternative hypothesis?
      (b) We start by running the Dickey-Fuller (DF) test (with constant and trend) for ln IOt .
          In R, start by installing the package bootUR, which offers a wide range of unit root test. After
          loading the library, you can then use the commands
           df_lnIO <- adf(lnIO, deterministics = "trend", max_lag = 0)
           df_lnIO
          to perform a Dickey Fuller test (max lag = 0, hence zero lagged first difference terms included)
          unit root test with a constant and trend term included (deterministics = "trend"). Present
          the output of the unit root test for ln IOt , how should you interpret it?
      (c) What is your conclusion for ln IOt : does it have a stochastic or a deterministic trend?
      (d) How to proceed in case of a stochastic trend? How to proceed in case of a deterministic trend?
(e) Repeat the same exercise for ln Y Ot : does it have a stochastic or a deterministic trend?
                                                  Page 15
5. Augmented Dickey-Fuller Unit Root Test (with constant and trend). Now consider the “Augmented”
   Dickey-Fuller (ADF) unit root test.
    (a) How does the ADF unit root test differ from the DF test? Why is the augmentation needed?
   (b) Run the ADF test for ln IOt .
       In R, use the commands
        adf_lnIO <- adf(lnIO, deterministics = "trend")
        adf_lnIO
       This function automatically includes lagged difference terms in the test equation, by using an
       information criterion based on Akaike Information Criterion to determine how many of these
       terms should be added.
       Present the output of the unit root test for ln IOt . What is your conclusion for ln IOt : does it
       have a stochastic or a deterministic trend?
6. Bootstrap union of rejection test. In the previous exercise, we used the ADF test as a unit root test,
   which is by far the most popular unit root test. Still the ADF test requires us to specify which
   deterministic components to include in the test equation (a constant and a trend in case the series
   displays a trend; a constant only when the series displays no trend). To relieve the user of making
   this choice (in case it is not so clear cut), you may use the union of rejections test instead. The null
   hypothesis and alternative hypothesis stay the same as before.
    (a) Run the test via the command:
         union_lnIO = boot_union(lnIO)
        Present the output of the test for ln IOt . What is your conclusion for ln IOt : does it have a
        stochastic or a deterministic trend?
   (b) Repeat the same exercise for ln Y Ot . What do you conclude?
7. Unit Root Test on the series in log-differences. The series ln IOt or ln Y Ot will never be stationary (at
   most trend-stationary). (Remind yourself why this is the case!). We now test whether the series in
   log-differences are stationary.
    (a) Perform the union of rejections test on ∆ ln IOt . What is the null hypothesis? What is the
        alternative hypothesis? Present your output of the test. How should you interpret it?
   (b) After having ran the unit root test on ln IOt and ∆ ln IOt what do you conclude about the order
       of integration of ln IOt ?
       Explain the difference between a series that is I(1) (“integrated of order one”) or I(0) (“integrated
       of order zero) in your answer!
8. Static Regression for the series in log-levels (revisited) and Spurious Regressions. Re-consider the static
   regression model for the series in log-levels:
ln IOt = β0 + β1 ln Y Ot + ut . (14)
    (a) Given the outcome of your unit root tests, is the static regression model (14) possibly a spurious
        regression?
        Explain what a spurious regression means and what drives this!
   (b) Is it “safe” to interpret the regression output of model (14)?
    (c) Re-inspect the value of the R2 . Is it spurious? Should we interpret it?
   (d) What are solutions to the spurious regression problem?
                                                 Page 16
     (e) Which solutions have we considered already in earlier tutorials, which haven’t we considered yet?
 9. Static Regression for the series in first differences and Spurious Regressions. Re-consider the static
    regression model for the series in first differences:
∆ ln IOt = β0 + β1 ∆ ln Y Ot + ut . (15)
    (a) Given the outcome of your unit root tests, is the static regression model 15 possibly a spurious
        regression?
    (b) Is it “safe” to interpret the regression output of model 15?
     (c) Re-inspect the value of the R2 . Is it spurious? Should we interpret it?
10. ARDL models: short-run and long-run effects. In this tutorial, we will zoom into one of the solutions
    for spurious regression problems, namely ARDL models. Consider the ARDL model
         We now examine how a permanent rise in xt (a “permanent shock”) affects the conditional mean
         of yt in the following years. Define three time horizons: the same year as the shock (short-run),
         one year later (medium-run), and many years later (long-run), with corresponding effects analyzed
         below.
     (c) Short-run. Define the same-year effect, known as the “impact multiplier”, as
                                                 ∂
                                         θ1 ≡       E (yt | xt , yt−1 , xt−1 , . . .) .                (17)
                                                ∂xt
         Finding the impact multiplier for the ARDL(1,1) model should be easy (you did this already in
         Tutorial 2)! It is simply the instantaneous partial derivative:
                                                    ∂
                                            θ1 ≡       E (yt | xt , . . .) = β1 .
                                                   ∂xt
         What is the value of the impact multiplier for the ARDL(1,1) model you estimated?
    (d) Medium-run. Define the cumulative effect after two years, known as the “two-year (interim)
        multiplier”, as
                                                
                                     ∂       ∂
                             θ2 ≡       +           E (yt | xt , yt−1 , xt−1 , . . .)
                                    ∂xt    ∂xt−1
                                         ∂
                                = θ1 +        E (yt | xt , yt−1 , xt−1 , . . .) .             (18)
                                       ∂xt−1
         This is the sum of the impact multiplier and the second-year partial effect of the shock.
         To obtain the two-year (interim) multiplier for model 16, start by substituting away yt−1 as
         follows:
                                                   Page 17
    From this expression, you can easily obtain the second-year partial effect as the partial derivative
                                          ∂
                                              E (yt | xt , . . .) = β2 + β3 β1
                                        ∂xt−1
    as the coefficient in front of xt−1 . The two-year multiplier is found as the sum of these two partials,
θ 2 ≡ β1 + β2 + β3 β1 .
    What is the value of the two-year multiplier for the ARDL(1,1) model you estimated?
(e) Long-run. Define the cumulative long-run effect, known as the “total multiplier” as
                                      ∞
                                               !
                                     X     ∂
                            θ∞ ≡                 E (yt | xt , yt−1 , xt−1 , . . .) .                    (4.6.iii)
                                     i=0
                                         ∂xt−i
    This is the sum of all partial effects, at impact and in the entire sequel of years.
    To determine long-run effects in a model, we establish whether the model admits a state where all
    variables have converged to some static “equilibrium” level. See what happens when you drop all
    time subscripts and replace them by stars (to indicate constant equilibrium values), then solve the
    resulting relationship for the dependent variable. For instance, the ARDL(1,1) model becomes
y∗ = β0 + β1 x∗ + β2 x∗ + β3 y∗ + u∗ .
    This is a stationary state (assuming β3 < 1) which can be viewed as a hypothetical long-run
    equilibrium of the model. The coefficient of x∗ here denoted as θ∞ = β1−β
                                                                           1 +β2
                                                                               3
                                                                                 is the long-run effect
    on y∗ of shocks in the explanatory variable (Cf. Wooldridge § 10.2, Problem 10.3).
    What is the value of the long-run multiplier for the ARDL(1,1) model you estimated?
(f) Now, let us investigate whether the impact, two-year and long-run multipliers are significantly
    different from zero.
    Obtain the standard error of the estimated impact multiplier, this should be easy. Is the impact
    multiplier significantly different from zero?
(g) Obtaining the standard error for the two-year and long-run multipliers is more difficult. Let us
    consider the two-year multiplier.
    To obtain a standard error for the two-year multiplier estimate θ̂2 , you need to apply the reshuffling
    or “theta trick” (Wooldridge § 4.4). In this example, the reshuffling trick is to substitute out one
    of the β 0 s, say β1 , from the estimating equation (16) in favor of θ2 , using
                                                              θ2 −β2
                                                       β1 =    1+β3    ,
    The above equation is nonlinear in its coefficients. To implement the last nonlinear regression in
    R, you need to use the function nls and enter it as an explicit algebraic equation in your software:
     nls_theta2 = nls(lnIO_0 ~ beta0 + ((theta2 -beta2)/(1+ beta3))*lnYO_0 + beta2*lnYO_1 +
                       beta3*lnIO_1, start = list(beta0 = 1, theta2 = 1, beta2 = 1, beta3 = 1))
    The estimated coefficient “theta2” is a direct estimate of the two-year multiplier θ2 and its stan-
    dard error is reported along with the estimate! Note that we provide some starting values, which
    are the values in the list since nonlinear estimation procedures are iterative. In the code above,
                                                Page 18
    we initialize all parameters at one, but you can use their actual values since you have computed
    these before! So you can use these as starting values to ensure faster convergence of the non-linear
    least squares estimation. As a double check: verify that the estimated values of the betas and the
    thetas in the output of the nls estimation coincide with the values you obtained above, as they
    should!
    Implement this procedure to get the standard error for θ̂2 . Is the two-year multiplier significant?
(h) Implement a similar “theta” trick to get the standard error for θ̂∞ . Start by re-expressing β1 in
    favor of θ∞ . Which expression to you get? Run the non-linear regression to get the standard
    error. Is the long-run multiplier significant?
Page 19