0% found this document useful (0 votes)
4 views15 pages

Ch-2 Lecture Note

Uploaded by

tesfafentaw89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views15 pages

Ch-2 Lecture Note

Uploaded by

tesfafentaw89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Wollo University, College of Business and Economics, Department of Economics

Chapter Two
Basic Regression Analysis with Time Series Data
2.1 The Nature of Time series Data
An obvious characteristic of time series data which distinguishes it from cross-
sectional data is that a time series data set comes with a temporal ordering. For instance,
the data for 1970 immediately precede the data for 1971. A sequence of random variables
indexed by time is called a stochastic process or a time series process. In other words,
stochastic or random process is a collection of random variables ordered in time.

Another difference between cross-sectional and time series data is relied on the
randomness of variables. The cross-sectional data should be viewed as random outcomes
since a different sample drawn from the population will generally yield different values of
the independent and dependent variables. Therefore, the OLS estimates computed from
different random samples will generally differ, and this is why we consider the OLS
estimators to be random variables.

Regarding time series data, when we collect a time series data set, we obtain one possible
outcome, or realization, of the stochastic process. This is because we cannot go back in time
and start the process over again. (This is analogous to cross-sectional analysis where we can
collect only one random sample.) However, if certain conditions in history had been
different, we would generally obtain a different realization for the stochastic process.

Consider for instance a nation has a GDP of $2872.8 billion for 1970–I. In theory, the GDP
figure for the first quarter of 1970 could have been any number, depending on the economic
and political climate then prevailing. The figure of 2872.8 is, therefore a particular outcome
of the stochastic process prevailed in that time. Thus, the time series data is considered as
the outcome of random variables. The distinction between the stochastic process and its
realization is akin to the distinction between population and sample in cross-sectional
data. Just as we use sample data to draw inferences about a population, in time series we
use the realization to draw inferences about the underlying stochastic process.

2.2 Examples of Time series Regression Models


In this section, we discuss two examples of time series models that have been useful in
empirical time series analysis and that are easily estimated by ordinary least squares.
We will study additional models in later.

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 1


Wollo University, College of Business and Economics, Department of Economics

A. Static Models
Suppose that we have time series data available on two variables, say y and z, where yt and
zt are dated contemporaneously, that is, data occurring in the same period of time. A static
model relating y to z is

1.1

The name “static model” comes from the fact that we are modeling a contemporaneous
relationship between y and z. Usually, a static model is postulated when a change in z at
time t is believed to have an immediate effect on y. Static regression models are also used
when we are interested in knowing the tradeoff between y and z. An example of a static
model is the static Phillips curve, given by

1.2

This equation is used to study the contemporaneous tradeoff between the annual inflation
rate, inft and the unemployment rate, unemt.

Naturally, we can have several explanatory variables in a static regression model. Let
mrdrtet denote the murders per 10,000 people in a particular city during year t, let convrtet
denote the murder conviction rate, let unemt be the local unemployment rate, and let
yngmlet be the fraction of the population consisting of males between the ages of 18 and 25.
Then, a static multiple regression model explaining murder rates is

1.3

Using a model such as this, we can hope to estimate, for example, the ceteris paribus effect
of an increase in the conviction rate on criminal activity.

B. Finite Distributed Lag Models


In a finite distributed lag (FDL) model, we allow one or more variables to affect y with a
lag. For example, for annual observations, consider the model

1.4

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 2


Wollo University, College of Business and Economics, Department of Economics

Where gfrt is the general fertility rate (children born per 1,000 women of childbearing age)
and pet is the real dollar value of the personal tax exemption. The idea is to see whether, in
the aggregate, the decision to have children is linked to the tax value of having a child.
Equation (1.4) recognizes that, for both biological and behavioral reasons, decisions to have
children would not immediately result from changes in the personal exemption. Equation
(1.4) is an example of the model

1.5

Special terminology and notation are used to indicate future and past values of z. the value
of z in the previous period is called its first lagged value or more simply, its first lag, and
is denoted zt-1. Its jth lagged value (or simply its jth lag) is its value j periods ago, which is zt-
j. In this regard, equation (1.5) is the two lags of z on y. the change between the two
consecutive periods, zt and zt-1 (or  z), is termed as the first difference of z on y. 1 Note
that zt+1 denote the value of z one period into the future2. The parameter 0 is the immediate
change in y due to the one-unit increase in z at time t, and it is usually called the impact
propensity or impact multiplier. This summarizes the dynamic effect that a temporary
increase in z has on y. Note that there are no further changes in y after two periods. This
shows that the sum of the coefficients on current and lagged z, 0, 1, 2, is the long-run
change in y given a permanent increase in z and is called the long-run propensity (LRP)
or long-run multiplier.

2.3 Finite Sample Properties of OLS under Classical Assumptions


This section deals about finite (small) sample properties of OLS under standard
assumptions, which are derived from the cross-sectional analysis to cover time series
regressions.
Assumptions
Assumption 1 (Linearity in parameter):- It states that the time series process follows a
model which is linear in its parameters.

1.6
Assumption 2 (Zero conditional mean):- For each t, the expected value of the error ut,
given the explanatory variables for all time periods, is zero. Mathematically

1
Economic time series are often analyzed after computing their logarithms or the changes in their logarithms. Thus, the
logarithm of the first difference in terms of percentage change, which is denoted by 100  ln(zt) measures the growth rate.
This indicates that the original time series data would be transformed into log form to compute the growth rate of a
variable of interest.
2
The interval between observations, that is, the period of time between observation t and observation t+ 1, is some unit of
time such as weeks, months, quarters (three-month units), or years. For example, usually, the inflation data are studied on
quarterly basis, so the unit of time: (a period') is a quarter of a year.

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 3


Wollo University, College of Business and Economics, Department of Economics

1.7
Assumption 3 (No perfect collinearity):- In the sample (and therefore in the underlying
time series process), no independent variable is constant or a perfect linear combination of
the others.
Assumption 4 (Homoskedasticity):- Conditional on X, the variance of ut is the same for all
t:

1.8
Assumption 5 (No serial correlation):- Conditional on X, the errors in two different time
periods are uncorrelated:

1.9
Note that in order to use the usual OLS standard errors, t statistics, and F statistics for
hypothesis testing or inference, we need to add a final assumption (Assumption 6) that is
analogous to the normality assumption we used for cross sectional analysis.
Assumption 6 (Normality):- The errors ut are independent of X and are independently and
identically distributed as Normal (0,  2 ).

Theorems
Theorem 1 Unbiasedness of OLS
Under Assumptions 1, 2, and 3 the OLS estimators are unbiased conditional on X, and
therefore unconditionally as well:

1.10
Theorem 2 Guass - Marcov theorem
Under Assumptions 1 through 5, the OLS estimators are the best linear unbiased estimators
conditional on X.
Theorem 3 OLS sampling variance
Under the time series Gauss-Markov assumptions 1 through 5, the variance of ˆ , j

conditional on X, is:

1.11
2
Where SST j is the total sum of squares of X j and R j is the R-squared from the regression
of X j on the other independent variables. The estimator ̂ 2 is the unbiased estimator of
 2 and it is computed as ˆ 2  SSR / df , where df  n  k  1
Theorem 4 Normal sampling distributions
Under Assumptions 1 through 6, the CLM assumptions for time series, the OLS estimators
are normally distributed, conditional on X.

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 4


Wollo University, College of Business and Economics, Department of Economics

2.4 Functional form, Dummy Variables, and Index numbers


Functional Form
So far, we have seen different functional forms in the regression analyses. Although these
functional forms can be used in time series regressions, the most important of these is the
log-log (or natural logarithm) functional form. This functional form refers to time series
regressions with constant percentage effects appear often in applied work.
Annual data on the Puerto Rican employment rate, minimum wage, and other variables
are used by Castillo-Freedman and Freedman (1992) to study the effects of the U.S.
minimum wage on employment in Puerto Rico. A simplified version of their model is

1.12
Where prepopt is the employment rate in Puerto Rico during year t (ratio of those working
to total population), usgnpt is real U.S. gross national product (in billions of dollars), and
mincov measures the importance of the minimum wage relative to average wages. In
particular, mincov = (avgmin/avgwage)=avgcov, where avgmin is the average minimum
wage, avgwage is the average overall wage, and avgcov is the average coverage rate (the
proportion of workers actually covered by the minimum wage law).

Using data for the years 1950 through 1987 gives

1.13
The estimated elasticity of prepop with respect to mincov is -0.154, and it is statistically
significant with t = -2.37. Therefore, a higher minimum wage lowers the employment
rate, something that classical economics predicts. The GNP variable is not statistically
significant, but this changes when we account for a time trend in the next section.

We can use logarithmic functional forms in distributed lag models, too. For example,
for quarterly data, suppose that money demand (Mt) and gross domestic product (GDPt)
are related by

1.14

Dummy Variables
It is not always to have numerical variables in the regression. Thus, if we incorporate
some qualitative variables such as nominal or ordinal variables (eg. gender, race,

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 5


Wollo University, College of Business and Economics, Department of Economics

religion, region etc.) in the regression analysis, we used a dummy variable to quantify the
qualitative variable and to obtain a meaningful result.

A dummy variable is, therefore, an artificial variable constructed such that it takes the value
unity whenever the qualitative phenomena it represent occurs and zero otherwise. For
example, if we take gender as a dummy variable, we may assign 1 for male and 0 for
female as a base or bench mark.

Dummy independent variables are also quite useful in time series applications. Since the
unit of observation is time, a dummy variable represents whether, in each time period, a
certain event has occurred. For example, for annual data, we can indicate in each year
whether a Democrat or a Republican is president of the United States by defining a variable
democt , which is unity if the president is a Democrat, and zero otherwise. Or, in looking at
the effects of capital punishment on murder rates in Texas, we can define a dummy variable
for each year equal to one if Texas had capital punishment during that year, and zero
otherwise.
Often dummy variables are used to isolate certain periods that may be systematically
different from other periods covered by a data set.

Example: Effects of personal exemption on fertility rate


The general fertility rate (gfr) is the number of children born to every 1,000 women of
childbearing age. For the years 1913 through 1984, the equation,

1.15
gfr explained in terms of the average real dollar value of the personal tax exemption (pe)
and two binary variables. The variable ww2 takes on the value unity during the years
1941 through 1945, when the United States was involved in World War II. The variable
pill is unity from 1963 on, when the birth control pill was made available for
contraception.

Using the data the estimated equation is given by:

1.16

In the regression result that the fertility rate was lower during World War II: given pe,
there were about 24 fewer births for every 1,000 women of childbearing age. Thus, the
historical event WWII has an effect of reducing births by 24 (being 98-24=74 for every
1,000 women of childbearing age) compared to the birth in rest periods which were free

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 6


Wollo University, College of Business and Economics, Department of Economics

from this historical event. The number of births in this period is 98 for every 1,000 women
of childbearing age.

Similarly, the fertility rate has been substantially lower since the introduction of the birth
control pill and the statistics is interpreted as did in the case of the dummy variable ww2.
The coefficient on pe implies that a 12-dollar increase in pe increases gfr by about one birth
per 1,000 women of childbearing age. The intercept term or value represents the value for
the bases of the dummy variables.

Index numbers
An index number is a parameter which aggregates a vast amount of information into a
single quantity. Index numbers are used regularly in time series analysis, especially in
macroeconomic applications. An example of an index number is the index of industrial
production (IIP), computed monthly by the Board of Governors of the Federal Reserve (in
the case of United States). The IIP is a measure of production across a broad range of
industries, and, as such, its magnitude in a particular year has no quantitative meaning.

In order to interpret the magnitude of the IIP, we must know the base period and the
base value. For instance, in the data set of industrial production from 1987 to 1999 the
base year is 1987 (the base period is just a convention) and the base value is 100. If the IIP
was 107.7 in 1992, we can say that industrial production was 7.7% higher in 1992 than in
1987. Similarly, if IIP _= 61.4 in 1970 and IIP = 85.7 in 1979, industrial production
grew by about 39.6% during the 1970s. Most of the time indexes are defined with one as
the base value in order to give much sense for the interpretation.

It is easy to change the base period for any index number, and sometimes we must do
this to give index numbers reported with different base years a common base year. For
example, if we want to change the base year of the IIP from 1987 to 1982, we simply
divide the IIP for each year by the 1982 value and then multiply by 100 to make the
base period value 100. Generally, the formula is

1.17
Where oldindexnewbase is the original value of the index in the new base year. For example,
with base year 1987, the IIP in 1992 is 107.7; if we change the base year to 1982, the
IIP in 1992 becomes 100(107.7/81.9) = 131.5 provided that 81.9 is the IIP in 1982.

Another important example of an index number is a price index, such as the consumer price
index (CPI). CPI is used to compute annual inflation rate and to change the nominal
economic variables into real terms. The annual inflation rate will be computed by

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 7


Wollo University, College of Business and Economics, Department of Economics

computing the percentage change of CPI across different years (or months/quarters, if we
are using monthly/quarterly data).

Price indexes are also used for turning a time series measured in nominal dollars (or current
dollars) into real dollars (or constant dollars). This is because most economic behavior is
assumed to be influenced by real, not nominal, variables. The real values of the economic
variables such as GDP and wage are obtained by dividing the nominal value by the CPI.
Note that we must be a little careful to first divide the CPI by 100 so as to make the value in
the base year is one (i.e. CPI/100=p). In other words, the real value is obtained by dividing
the nominal value by the p.

2.5 Trends and Seasonality


2.5.1 Trends
Characterizing Trending Time Series
Many economic time series have a common tendency of growing over time, which is
known as linear time trend – upward/downward linear trend. For example, in most of
the time labor productivity (output per hour of work) shows a linear upward trend
with time, which reflects the fact that workers have become more productive over
time. However, other economic time series, at least over certain time periods, have
clear linear downward trends. One popular formulation that captures linear trending
behavior of {yt} with time t is written as:h

1.18

Interpreting  1 in (1.18) is simple: holding all other factors (those in et) fixed,  1
measures the change in yt from one period to the next due to the passage of time: when
 et =0,

1.19
A more realistic characterization of trending time series allows {et} to be correlated over
time, but this does not change the flavor of a linear time trend. In fact, what is important for
regression analysis under the classical linear model assumptions is that E(yt) is linear in t.

1.20
Many other economic time series are better approximated by an exponential trend –
constant average growth, which follows when a series has the same average growth rate
from period to period. For example, In the early years, we see that the change in the
imports over each year is relatively small, whereas the change increases as time passes.
This is consistent with a constant average growth rate: the percentage change is roughly the

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 8


Wollo University, College of Business and Economics, Department of Economics

same in each period. In practice, an exponential trend in a time series is captured by


modeling the natural logarithm of the series as a linear trend.

1.21
1 is interpreted as the average per period growth rate in yt. For example, if t denotes year
and 1 = .027, then yt grows about 2.7% per year on average. Thus, 1 represents the
proportionate change in yt: log(yt) - log(yt-1), which is also called the growth rate in y from
period t -1 to period t.
 log(yt) = log(yt) - log(yt-1) = 1 for all t 1.22

Using Trending Variables in Regression Analysis


In regression analysis, we must be careful to allow for the fact that unobserved, trending
factors that affect the dependent variable yt might also be correlated with the explanatory
variables. If we ignore this possibility, we may find a spurious relationship between yt and
one or more explanatory variables. The phenomenon of finding a relationship between two
or more trending variables simply because each is growing over time is an example of
spurious regression. Fortunately, adding a time trend eliminates this problem.

For concreteness, consider a model where two observed factors, xt1 and xt2, affect yt. In
addition, there are unobserved factors that are systematically growing or shrinking over
time. A model that captures this is

1.23
Although assumptions 1, 2 and 3 are satisfied, omitting t from the regression and regressing
yt on xt1, xt2 will generally yield biased estimators of 1 and  2 : we have effectively
omitted an important variable, t, from the regression which reflects some relationship
between the explanatory due to time trend. This is especially true if xt1, xt2 are themselves
trending, because they can then be highly correlated with t.

Example: Housing investment and prices


The regression result below is based on the data of annual observations on housing
investment and a housing price index for 1947 through 1988. invpc denote real per capita
housing investment (in thousands of dollars) and let price denote a housing price index
(equal to one in 1982). A simple regression in constant elasticity form, which can be
thought of as a supply equation for housing stock, gives

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 9


Wollo University, College of Business and Economics, Department of Economics

1.24
The elasticity of per capita investment with respect to price is very large and statistically
significant; it is not statistically different from one. We must be careful here. Both invpc
and price have upward trends. In particular, if we regress log(invpc) on t, we obtain a
coefficient on the trend equal to .0081 (standard error = .0018); the regression of log( price)
on t yields a trend coefficient equal to .0044 (standard error = .0004). While the standard
errors on the trend coefficients are not necessarily reliable—these regressions tend to
contain substantial serial correlation—the coefficient estimates do reveal upward trends.

To account for the trending behavior of the variables, we add a time trend:

1.25
The story is much different now: the estimated price elasticity is negative and not
statistically different from zero. The time trend is statistically significant, and its
coefficient implies an approximate 1% increase in invpc per year, on average. From
this analysis, we cannot conclude that real per capita housing investment is influenced
at all by price. There are other factors, captured in the time trend, that affect invpc, but
we have not modeled these. The results in (1.24) show a spurious relationship between
invpc and price due to the fact that price is also trending upward over time.

A Detrending Interpretation of Regressions with a Time Trend


Including a time trend in a regression model creates a nice interpretation in terms of
detrending the original data series before using them in regression analysis. For
concreteness, we focus on model (1.23). When we regress yt on xt1, xt2, we obtain the fitted
equation

1.26
ˆ1 and ˆ 2 can be obtained by detrending the dependent and independent variables as
follows: detrending: regression without a time trend.
(i) Regress each of yt on xt1, xt2 on a constant and the time trend t and save the
residuals, say yt , x1t and x2t so that both the dependent and independent variables
are detrended.

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 10


Wollo University, College of Business and Economics, Department of Economics

(ii) Regress yt on x1t , x2t . This regression exactly yields ˆ1
and ˆ 2 .
This means that the estimates of primary interest, ˆ1 and ˆ 2 can be interpreted as coming
from a regression without a time trend, but where we first detrend the dependent variable
and all other independent variables. The same conclusion holds with any number of
independent variables and if the trend is quadratic or of some other polynomial degree.

Computing R-squared when the Dependent Variable is Trending


Compared with typical R-squareds for cross-sectional data, R-squareds in time series
regressions are often very high. Because time series data often come in aggregate form
(such as average hourly wages in the U.S. economy), and aggregates are often easier to
explain than outcomes on individuals, families, or firms, which is often the nature of cross-
sectional data. But the usual and adjusted R-squares for time series regressions can be
artificially high when the dependent variable is trending. Remember that R2 is a measure of
how large the error variance is relative to the variance of y. The formula for the adjusted R-
squared shows this directly:

1.27
2 2
where ˆ u is the unbiased estimator of the error variance, ˆ y = SST/(n -1), and SST =
n 2

y
t 1
t  y

When the dependent variable satisfies linear, quadratic, or any other polynomial trends, it is
easy to compute a goodness-of-fit measure that first nets out the effect of any time trend on
yt through detrending. That is, first regress yt on t and obtain the residuals yt . Then, we
regress yt on x1t , x2t , t. thus the R-squared can be computes as:

SSR
R2  1  n 1.28
 yt2
t 1

The R-squared in (1.28) better reflects how well xt1 and xt2 explain yt, because it nets out the
effect of the time trend. An adjusted R-squared can also be computed based on (1.28):
dividing SSR by the df (n-k) where k is the number of parameters in the usual regression
that includes the intercept term and the parameter of any time trends (in this case it is 4),
n
and dividing  y
t 1
2
t by n-p where p is the number of trend parameters estimated in

detrending yt (in this case it is 2).


Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 11
Wollo University, College of Business and Economics, Department of Economics

2.5.2 Seasonality
If a time series is observed at monthly or quarterly intervals (or even weekly or daily),
it may exhibit seasonality. For example, retail sales in the fourth quarter are typically
higher than in the previous three quarters because of the Christmas holidays
prevailed in the fourth quarter. This can be captured by allowing the average retail sales
to differ over the course of a year. This is in addition to possibly allowing for a trending
mean. For example, retail sales in the most recent first quarter were higher than retail sales
in the fourth quarter from 30 years ago, because retail sales have been steadily growing.
Nevertheless, if we compare average sales within a typical year, the seasonal holiday factor
tends to make sales larger in the fourth quarter.

Even though many monthly and quarterly data series display seasonal patterns, not all of
them do. For example, there is no noticeable seasonal pattern in monthly interest or
inflation rates. In addition, series that do display seasonal patterns are often seasonally
adjusted before they are reported for public use. A seasonally adjusted series is one that, in
principle, has had the seasonal factors removed from it. Sometimes we may face with
seasonally unadjusted data. GDP is a typical example in which the data is reported annually
so that it is impossible to adjust seasonally. However, simple methods are available for
dealing with seasonality in regression models for unadjusted data. Generally, we can
include a set of seasonal dummy variables to account for seasonality in the dependent
variable, the independent variables, or both.

The approach is simple. Suppose that we have monthly data, and we think that seasonal
patterns within a year are roughly constant across time. For example, since Christmas
always comes at the same time of year, we can expect retail sales to be, on average, higher
in months late in the year than in earlier months. Or, since weather patterns are broadly
similar across years, housing starts in the Midwest will be higher on average during the
summer months than the winter months. A general model for monthly data that captures
these phenomena is

1.29
where febt, mart,…, dect are dummy variables indicating whether time period t corresponds
to the appropriate month. In this formulation, January is the base month, and  0 is the
intercept for January. If there is no seasonality in yt, once the xtj have been controlled for,
then  1 through  11 are all zero. This is easily tested via an F test.

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 12


Wollo University, College of Business and Economics, Department of Economics

2.6 Stationary and Non-stationary Time Series


When a time series is stationary, it means that certain attributes of the data do not
change over time. That is, the statistical properties such as mean and variance are
constant overtime. Flipping a coin can be an intuitive example – you flip a coin. 50%
heads, regardless of whether you flip it today or tomorrow or next year. In economics, for
example, if you are analyzing stock prices daily data, then according to stationarity, the
statistical properties like mean and variance should remain constant across different days
and years. In general, data without the existence of trend, seasonality, cyclicity, and
irregularity are known as stationary data.

Stationarity is important in time series since it simplifies the complexities within time
series data, making it easier to model and forecast than non-stationary time series. The
consistency of the series or variables makes predictions easier and reliable. In contrary,
non-stationary data can lead to unreliable model outputs and inaccurate predictions, just
because the models are not expecting it.

Some time series are non-stationary as variables do vary with time. That is, the statistical
properties are changing through time. This implies that the data does not have a stable or
predictable behavior, and that the past observations may not be representative of the future
ones. Thus, non-stationary series is a series with the existence of trend, seasonality,
cyclicity, and irregularity. Accordingly, non-stationary data can be converted to stationary
data through detrending and differencing. In finance, for example, many processes are
non-stationary.

2.7 Test for Stationarity: Unit Root Test


A unit root (also called a “random walk with drift”) refers to a systematic pattern in
time series that is unpredictable. This would cause a change in the shape of the
distribution due to a shift in time so that the time series becomes non stationary. Unit
roots are, therefore, one cause for non-stationarity of time series. Unit root is thus a
problem since it leads to spurious regression - the case where some statistically
significant coefficients are often obtained in regression analysis even if there is no

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 13


Wollo University, College of Business and Economics, Department of Economics

relationship or correlation between the dependent and independent variables. So a high R-


squared and significant t-values might mislead us to nonsense regressions.

Correlation between ice cream sales and shark attacks can be an example of spuriousness.
The correlation between ice cream sales and shark attacks look like causal relationships in
both their statistical measures and in graphs, but it's not real. For example, ice cream sales
and shark attacks correlate positively at a beach. As ice cream sales increase, there are more
shark attacks.

If a time series has a unit root problem implying that if the time series is non stationary, the
first difference (i.e., the series of changes from one period to the next) of such time series is
'stationary'. Therefore, the solution for the unit root problem is to take the first
difference of this time series.

A unit root test refers to an econometric measure that aids researchers in identifying
whether a time series is stationary or non-stationary. Some of the standard methods are the
Augmented Dickey-Fuller (ADF), Dickey-Fuller (DF), and Phillips-Perron (PP) tests. A
well-known test that is valid in large samples is the augmented Dickey–Fuller test (ADF).

Testing for Unit Root


In Augmented Dickey-Fuller test, the null hypothesis is generally defined as the presence of
a unit root and the alternative hypothesis refers to has no unit root.
H0: has unit root H1: has no unit root
Below are the results from an Augmented Dickey Fuller Test from two different data sets.
The first test color coded in purple has a high p value and a test statistic well above the
highest critical value. This means it has a unit root process. We fail to reject the null
hypothesis.

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 14


Wollo University, College of Business and Economics, Department of Economics

AD Fuller Test for Google Stocks


The second test is color coded in green and has a low p value and a test statistic well below
the lowest critical value. This means it does not have a unit root process. We reject the null
hypothesis and accept the alternative.

AD Fuller Test for Grosses

Econometrics II Lecture Notes; Extracted from Wooldridge, 2000; By Addisu M. (PhD) 15

You might also like