The Nature ofECONOMETRICS
Econometrics
and Economic Data
Module Code: INS1064
Number of credits: 4
Pre-requisite(s): Theory of probability and mathematical statistics (MAT1004)
Teaching Language: English
Lecturer Information:
No Name Title Institution Email Phone
1. Trần Quang Tuyến Ph.D VNU-IS tuyenisvnu@gmail.com
0912474896
2. Lê Văn Đạo Master VNU-IS daoleisvnu@gmail.com
0394952064
The Nature of Econometrics
No Assessment items
Assessment methods
Value Notes
and Economic Data
1. -Regular Assessment
-Attendance and learning
20%
10% In-class and take-home exercises/
documents assignments: good presenting and writing.
-In-class and take-home 10%
exercises / assignments
2. -Midterm exam 20% One-hour written open-book exam
- Multiple choices
- Interpretation
3. -Final exam 60% The test consists of the group's project and an
oral examination lasting 15 minutes.
Each student receives a single grade based on
their entire written assignment and individual
oral presentation.
Total 100%
Required textbook:
1. Jeffrey M. Wooldridge, Introductory Econometrics: A Modern
Approach, 5th edition, Cengage Learning, 2016
References:
2. Damodar N. Gujarati, Dawn C. Porter, Basic Econometrics, 5th
edition, Mc Graw Hill, 2009.
The Nature of Econometrics CONTENTS
Chapter 1. The nature and methodology of econometrics
and
ChapterEconomic
2. Simple linear regressionData
(SLR)
Chapter 3. Multiple linear regression (MLR)
Chapter 4. Multiple linear regression model: Inference & asymptotics
Chapter 5. Further issues with multiple linear regression
Chapter 6: Regression models with dummy (binary) variables
Chapter 7: More on specification and data issues
Midterm exam: One-hour written open-book exam
Chapter 8: Basic regression analysis with time series data
Chapter 9: Further issues on using ordinary least squares method with time series
data
Chapter 10: Serial correlation and heteroskedasticity in time series regressions
Chapter 11: Carrying out an empirical project
The Nature of Econometrics
Chapter 1: Nature and methodology of econometrics
and Economic Data
1.1 The definition and purposes of econometrics
1.2. Methodology of econometrics
1.3. The significance of the error term
1.4. Types of economic data
1.5. Causality and the notion of ceteris paribus in
econometric analysis
Study case: Statistical relationship vs deterministic relationship
The Nature1.1. Definitionofand Econometrics
purposes of econometrics
and Economic Data
Econometrics can be defined as the social science that applies economic
theory, mathematics, and statistics to quantify economic phenomena.
Economic theory offers statements or hypotheses that are mostly qualitative
in nature.
Mathematical economics focuses on expressing economic relationships in
the form of mathematical equations, regardless of measurements or empirical
verification.
Economic statistics is mainly concerned with collecting, analyzing, and
presenting economic data (tables, figures, charts, etc.).
The Nature of Econometrics
and Economic
Common goals ofData
econometric analysis
Test economic theories and hypotheses.
Investigate relationships between economic
variables.
Forecast economic phenomena.
Assess government and business policies.
Testing an economic theory: Testing an economic theory:
Human capital theory Supply and demand
EEstimating the relationships between economic variables
ÊEstimating the relationship between socio-economic variabless
FForcasting economic phenomina
EvEvaluating policies implemented by the government or firms
1.2. Methodology of Econometrics
Generally speaking, traditional econometric methodogy proceeds along following
steps
Step 1. Formulating research questions/hypotheses
Step 2. Specification of a suitable economic model
Step 3. Turning the economic model into an econometric model
Step 4. Obtaining the data
Step 5. Estimating the parameters of the econometric model
Step 6. Testing the hypotheses
Step 7. Forcasting/policy implications
Step 1: Formulating research
questions/hypotheses
Examples:
Does job mismatch affect wage and job turnover?
Does greater household wealth make young children perform better?
Do mobile banner advertisements increase sales?
Does an increase in cigarette tax reduce cigarette consumption?
Step 2: Specifying a suitable economic model
This step is often skipped in empirical research
It may be micro or macro-models
Such models often base on optimizing behaviour or equilibrium
Some models establish relationships between economic variables: FDI & technology
transfer; CSR & firm performance,..
Step 2: Specifiying an economic model
Example 1:
The functional form was not specified (e.g., linear or non-linear)
The equation was proposed without a formal economic model
Step 2: Specifiying an economic model
Example 2:
Step 2. Economic model
What criteria are used to
choose variables for the
model?
1. Economic theory
2. Previous empirical studies
3. Intuition
How do I select relevant variables?
Tran (2014)
Step 3: Turning the economic model into an econometric model
Step 3: turning the economic model into an econometric model
Step 4: Obtaining the data
❑ Primary data are information that has
been collected directly by the researcher.
❑ Secondary data is information that
already exists and has been gathered by
other people or groups.
❑ The Vietnam Household Living Standard Survey (VHLSS); the Labour Force Survey; and
the Enterprise Census, which are conducted by the General Statistical Office (GSO).
❑ Other data available from the WB (World Bank), ILO (International Labor Organization),
and WTO (the World Trade Organization).
Step 5. Estimating the parameters of the
econometric model
❑ We employ various econometric techniques to estimate the population
parameters
Step 6. Hypothesis testing
❑ We have to test hypotheses about
population parameters (𝜷𝒋 ).
❑ Assuming the fitted model is a good
approximation of reality, we must construct
criteria to determine if our econometric
analysis estimates match the theory's
expectations.
Step 7. Forcasting and policy
implications
❑ The econometric results can be used to predict
the future value(s) of the dependent, or forecast,
variable Y based on the known or expected future
values of the explanatory variables (Xs).
❑ Some empirical findings offer useful information
for policymakers. This could aid in better policy
adjustment or intervention.
1.3. The significance of the error term
1.3.
Why does the error/disturbance term always exist?
❑ Ambiguity of theory: the list of factors that can affect Y is always
incomplete.
❑ Unavailability of data: Even though we know we omitted some
important factors that affect Y, we may not have data on them.
❑ Poor proxy variables : The error term also represents the errors of
measurement.
❑ Wrong functional form: Even if we have selected correct and relevant
variables (Xs) for our model, very often we do not know the form of
the functional relationship between Y and X.
❑ Principle of parsimony: It is better to keep our regression model as
simple as possible. Two or more X explaining a significant portion of Y
may be better than including many other variables without a strong
theoretical base.
1.4. Types of economic data
Four types of economic data sets
Cross-sectional data
Time series data
Pooled cross sections
Panel/Longitudinal data
Note: The selection of econometric methods depends on the type/nature of the data used.
The specification of inappropriate methods may provide misleading results.
Table 1.1: Cross-sectional data set on households in Hoai Duc District
Age of household head=54
Observation number: the 5th household Consumption per capita=1106.67
thousand VND/month
Indicator variable (1=poor;0=non-poor)
Table 1.2. Cross sectional data on countries’ GDP and education
Cross-sectional data sets
Random samples of individuals, families, enterprises,
cities, regions, nations,or other units of interest at a
given point of time/in a given period
Cross-sectional observations are more or less
independent
The pure random sampling is likely to be violated:, e.g.
respondents refuse to respond, or if the cluster
sampling is conducted.
Cross-sectional data is mostly applied in applied
microeconomics
Table 1.3 Time series data set on trade and tourism in Vietnam
(billion VND)
Năm Tổng số Bán lẻ Dịch vụ lưu trú, ăn uống Dịch vụ và du lịch
1990 19031.2 16747.4 2283.8 .
1991 33403.6 29183.3 4220.3 .
1992 51214.5 44778.3 6436.2 .
1993 67273.3 58424.4 8848.9 .
1994 93490 74091 11656 7743
1995 121160 94863 16957 9340
1996 145874 117547 18950 9377
1997 161899.7 131770.4 20523.5 9605.8
1998 185598.1 153780.6 21587.7 10229.8
1999 200923.7 166989 21672.1 12262.6
2000 220410.6 183864.7 23506.2 13039.7
2001 245315 200011 30535 14769
2002 280884 221569.7 35783.8 23530.5
2003 333809.3 262832.6 39382.3 31594.4
2004 398524.5 314618 45654.4 38252.1
2005 480293.5 373879.4 58429.3 47984.8
2006 596207.1 463144.1 71314.9 61748.1
2007 746159.4 574814.4 90101.1 81243.9
2008 1007213.5 781957.1 113983.2 111273.2
2009 1405864.6 1116477 158847.9 130540.1
2010 1677344.7 1254200 212065.2 211079.5
2011 2079523.5 1535600 260325.9 283597.6
2012 2369130.6 1740360 305651 323119.9
2013 2615203.6 1964667 315873.2 334663.9
2014 2916233.9 2189448 353306.5 373479
2015 3223202.6 2403723 399841.8 419637.6
2016 3546268.6 2648857 439892.3 457519.6
2017 3956599.1 2967485 488615.6 500498.8
2018 4393525.5 3308059 534168.5 551298
2019 4892114.39 3694560 595936.91 601617.59
2020 4847645.3 3815079 479715.67 552850.58
2021 4657066.28 3830560 379390.64 447115.82
Time series data
Observations of single variable or multiple variables over time
For example, GDP, inflation, stock prices, annual exchange rates,
agriculture sales, …
Such kind of data is mostly serially correlated (observations
are often not independent over time)=> requires more
advanced econometric techniques.
Observation order contains important information
Frequency: Daily, weekly, monthly, quarterly, annualy, …
Typical characteristics: trends and seasonality
Typical applications: applied macroeconomics and finance
Pooled cross sections
A combination of more than one cross-sectional data in
one data set
Cross sections are sampled independently of each other
Such kind of data is often used for assessing policy
changes
Example:
• Measure the effect of change in Hanoi‘ s expansion on house
prices
• Random sample of house prices for the year 2007
• A new random sample of house prices for the year 2009
• Compare before/after (2007: before expansion, 2009: after
expansion)
Table 1.4: Pooled cross sections on housing prices ( Woolridge, 2014)
Before reform
After reform
Table 1.5: Two-year panel data on provincial development statistics
Panel or longitudinal data
Data contain the same cross-sectional observations are followed over time
Such kind of data consists of a cross-sectional and a time series
dimension
Panel data enables researchers to eliminate time-invariant unobservables
Panel data can be used for models with lagged dependent variables
Example: Factors affecting provinces‘ economic growth
• Data on each province is observed in two or more years
• Time-invariant unobserved province characteristics ( that may affect
economic growth) can be modeled and removed
• Effect of government policy on growth may exhibit time lag
Panel data and unobservable time-invariant factors
𝑼: The error term: repesents all
unobservable factors
𝐸𝑛𝑔𝑙𝑖𝑠ℎ𝑖𝑡 = 𝛽0 + 𝛽1 𝐺𝐷𝑃𝑖𝑡 + 𝑒𝑖𝑡 + 𝑎𝑖
𝑒𝑖𝑡 represent unobservable factors that affect English scores but change over time
𝒂𝒊 : the unobservable factors that affect English scores in the 𝑝𝑟𝑜𝑣𝑖𝑛𝑐𝑒𝑖 but do not change over
time ( especially a short time)
𝑒.g., 𝒂𝒊 It represents social or historical traditions about studying and learning.
The omission of 𝑎𝑖 may cause the omited variable bias but we do not have data on it.
The key ideas is that any change in English score from 2021-2023 cannot be caused by 𝒂𝒊
because 𝒂𝒊 does not change during this period.
We have the regressions for 2023 and 2021
𝐸𝑛𝑔𝑙𝑖𝑠ℎ𝑖2023 = 𝛽0 + 𝛽1 𝐺𝐷𝑃𝑖2023 + 𝜷𝟐 𝒂𝒊 + 𝑒𝑖2023
𝐸𝑛𝑔𝑙𝑖𝑠ℎ𝑖2021 = 𝛽0 + 𝛽1 𝐺𝐷𝑃𝑖2021 + 𝜷𝟐 𝒂𝒊 + 𝑒𝑖2021
Then we make a difference
𝐸𝑛𝑔𝑙𝑖𝑠ℎ𝑖2023 − 𝐸𝑛𝑔𝑙𝑖𝑠ℎ𝑖2021 = 𝛽1 (𝐺𝐷𝑃𝑖2023 −𝐺𝐷𝑃𝑖2021 ) + (𝑒𝑖2023 − 𝑒𝑖2021 )
𝒂𝒊 is removed form differencing the two equations
Panel data allows for the elimination of unobservable time-invariant factors.
1.5. Causality and the notion of ceteris paribus
Ceteris paribus is a Latin phrase, showing an assumption that other (relevant)
factors being equal or held constant.
The notion of causal effect of X on Y :
“How does Y changes if X changed while all other factors are constant“.
❑ Most economic questions are ceteris paribus questions
In analyzing consumer behaviour ( micro economics), we have the law of demand:
“All other factors held constant, the higher the unit price of a good, the fewer the number
of units demanded by consumers and, consequently, sold by firms”Samuelson and Marks
(2009).
❑ The goal of econometric analysis is to infer that one variable (like education) causes
another (such as worker productivity). If other factors are not held fixed, then we
cannot know the causal effect of education on productivity.
❑ A careful application of econometric methods can simulate a ceteris paribus experiment.
TheNon Nature
experimental of Econometrics
data and econometrics
and
❑
Economic
statistics, Data
Econometrics has developed into a separate discipline from mathematical
which typically analyzes nonexperimental data.
Non-experimental data
Experimental data
(Observational or retrospective data)
Non-experimental data, or observational data, is Data is collected in a controlled environment and
collected in a real-life setting. manipulated by the researcher.
It is often impossible for researchers to control the
conditions or variables of interest. The researcher can create environments and carefully
control the conditions and variables of interest.
Researchers are active collectors of data
Researchers are passive collectors of data.
Low or even no cost, especially when using secondary
Costly and sometimes unethical
data.
It is more difficult to determine a causal relationship. It is possible to determine a causal relationship.
Mostly in social sciences, Mostly in natural sciences
Experiments Natural experiments
The 2019 Nobel Prize in Economic The 2021 Nobel Prize in Economic
Sciences Sciences
Abhijit Banerjee, Esther Duflo, and David Card, Joshua Angrist and Guido
Michael Kremer Imbens
The experimental approach to alleviating New insights about the labour market and
global poverty shown what conclusions about cause and
effect can be drawn from natural
experiments.
Randomized controlled trials (RCT) are Natural experiments are studies designed
experimental studies that apply an in which the units of analysis are exposed
intervention to a random subset of the to as good as random variation caused by
target population so that the effects of nature, institutions, or policy changes.
the intervention may be compared to
those of a control group.
The researcher can create environments Researchers do not create natural
and carefully control the conditions and
experiments; rather, they find them.
variables of interest.
Natural experiments are observational
studies, not true experiments; the
researcher has little (if any) control over
the social conditions of the studies.
How can a controlled experiment be constructed to infer the causal
effect?
Causal effect of fertilizer on crop yield
„By how much will the rice output increase if one increases the amount
of fertilizer applied to the field“
It must be assumed that all other factors that affect rice yield such
as land quality, temperature, rainfall, crop diseases, etc. are held
fixed.
Experiment:
Select several one-acre plots of land; randomly assign different amounts
of fertilizer to the different plots and then compare the output.
In this case, the experiment works because the amount of fertilizer
applied is unrelated to other factors that affect rice yields.
In other word, the experiment helps isolate other factors than
fertilizer that affects rice yields.
Experiment and ethical issue
Causal effect of education on productivity
In order to estimate the causal effect of education on labour productivity, all other
factors that influence wages such as experience, innate ability, family background, etc.
are held fixed.
Problem without random assignment: nonexperimental or observational data
often suffers from self-selection or endogeneity.
E.g., education level is more likely to related to unobservables, such as innate ability.
People with higher abilities, for example, tend to have higher levels of education.
An experiment can make sure that education is unrelated to other factors that
affect wages. E.g., choose a group of children, making sure that different levels of
education are randomly assigned to them. Finally, compare the wage outcomes.
Is this experiment unethical?
With non-experimental data, discovering causality is very challenging.
But it is infeasible to conduct an expriment due to ethical issues.
SEconometrics: statistical relationships
Statistical relationship Deterministic relationship
Variables are random or stochastic and Variables are not random or stochastic.
have probability distributions.
There are errors in measurement. There are no errors in measurement.
The relationship between economic There is an exact relationship
variables are generally inexact between variables.
E.g., the rate of return to education is found E.g.,Newton’s law of gravity
to be 12.6% in Thailand but 10.3% in China. 𝑀1, 𝑀2
𝐹=𝐺
𝑟2
𝑵𝒎𝟐
𝐺 =6,67*10-11 𝟐
𝒌𝒈
Exercises 1
1. What are the differences between economic and econometric models?
2. What is your comment about this statement? "An econometric model is always derived from a formal
economic model."
3. Why does observational data often not guarantee the assumption "ceteris paribus"?
4. In a study on the effect of fertilizer on rice productivity, more fertilizer is used in less fertile plots, but
we do not have data on land fertility. If we found a positive link between fertilizer and rice yields,
would we have convincingly concluded that fertilizer makes rice production more productive?
5. Say you have to do research on whether violent video games cause school violence among students.
Is it feasible? Why?
6. Can you infer a casual effect of violent video games on school violence if your research based on
observational data? explain why?
7. Name other factors other than violent video games that can affect school violence. Name some factors
that can be measurable and unmeasurable.
8. Each group comes up with a hypothesis or a research question and then specifies an economic model
and the variables that are relevant to it.
A. Please explain why these variables were chosen.
B. How to measure or collect data for these variables
C. What is the expected direction of the effect of each variable?
9. Please specify variables for a economic model of corruption in Vietnam.
A. Please explain why these variables were chosen.
B. How to measure or collect data for these variables
C. What is the expected direction of the effect of each variable?