0% found this document useful (0 votes)

8 views7 pages

Introduction of Econometrics

Uploaded by

sharmajiganesh607

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views7 pages

Introduction of Econometrics

Uploaded by

sharmajiganesh607

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Introduction to Econometrics

Econometrics consists of the application of mathematical statistics to economic data to lend empirical
support to the models constructed by mathematical economics and to obtain numerical results
Econometrics may be defined as the social science in which the tools of economic theory, mathematics,
and statistical inference are applied to the analysis of economic phenomena.
Econometrics is concerned with the empirical determination of economic laws.
Econometrics is based upon the development of statistical methods for estimating economic relationships,
testing economic theories, and evaluating and implementing government and business policy.
The most common application of econometrics is the forecasting of such important macroeconomic
variables as interest rates, inflation rates, and gross domestic product. Whereas forecasts of economic
indicators are highly visible and often widely published, econometric methods can be used in economic
areas that have nothing to do with macroeconomic forecasting.
Econometrics has evolved as a separate discipline from mathematical statistics because the former
focuses on the problems inherent in collecting and analyzing nonexperimental economic data.
Nonexperimental data are not accumulated through controlled experiments on individuals, firms, or
segments of the economy. (Nonexperimental data are sometimes called observational data, or
retrospective data, to emphasize the fact that the researcher is a passive collector of the data.)
Experimental data are often collected in laboratory environments in the natural sciences, but they are
much more difficult to obtain in the social sciences. Although some social experiments can be devised, it
is often impossible, prohibitively expensive, or morally repugnant to conduct the kinds of controlled
experiments that would be needed to address economic issues.
Economic mode
Wage = f(educ, exper, training),
Where,
wage = hourly wage,
educ = years of formal education,
exper = years of workforce experience, and
training = weeks spent in job training.
Econometric model
wage = β0 + β1 educ + β2 exper + β3 training + U
Where, the term U contains factors such as “innate ability,” quality of education, family background, and
the myriad other factors that can influence a person’s wage. Also known as error term which represents
the difference between the actual value of a dependent variable and the value predicted by the regression
model. It captures the unexplained variance in the dependent variable and represents the impact of factors
not included in the model
For the most part, econometric analysis begins by specifying an econometric model
Once an econometric model has been specified, various hypotheses of interest can be stated
An empirical analysis, by definition, requires data. After data on the relevant variables have been
collected, econometric methods are used to estimate the parameters in the econometric model and to
formally test hypotheses of interest. In some cases, the econometric model is used to make predictions in
either the testing of a theory or the study of a policy’s impact.

The Significance of the Stochastic Disturbance Term (Gujarati Page 41-42)

The disturbance term ui is a surrogate for all those variables that are omitted from the model but that
collectively affect Y. The obvious question is: Why not introduce these variables into the model
explicitly? Stated otherwise, why not develop a multiple regression model with as many variables as
possible? The reasons are many.
1. Vagueness of theory:
The theory, if any, determining the behavior of Y may be, and often is, incomplete. We might know for
certain that weekly income X influences weekly consumption expenditure Y, but we might be ignorant or
unsure about the other variables affecting Y. Therefore, ui may be used as a substitute for all the excluded
or omitted variables from the model.
2. Unavailability of data:
Even if we know what some of the excluded variables are and therefore consider a multiple regression
rather than a simple regression, we may not have quantitative information about these variables. It is a
common experience in empirical analysis that the data we would ideally like to have often are not
available. For example, in principle we could introduce family wealth as an explanatory variable in
addition to the income variable to explain family consumption expenditure. But unfortunately,
information on family wealth generally is not available. Therefore, we may be forced to omit the wealth
variable from our model despite its great theoretical relevance in explaining consumption expenditure.
3. Core variables versus peripheral variables:
Assume in our consumption-income example that besides income X1, the number of children per family
X2, sex X3, religion X4, education X5, and geographical region X6 also affect consumption expenditure.
But it is quite possible that the joint influence of all or some of these variables may be so small and at best
nonsystematic or random that as a practical matter and for cost considerations it does not pay to introduce
them into the model explicitly. One hopes that their combined effect can be treated as a random variable
ui.
4. Intrinsic randomness in human behavior:
Even if we succeed in introducing all the relevant variables into the model, there is bound to be some
“intrinsic” randomness in individual Y’s that cannot be explained no matter how hard we try. The
disturbances, the u’s, may very well reflect this intrinsic randomness.
5. Poor proxy variables:
Although the classical regression model assumes that the variables Y and X are measured accurately, in
practice the data known theory of the consumption function. He regards permanent consumption (Y p) as a
function of permanent income (X p). But since data on these variables are not directly observable, in
practice we use proxy variables, such as current consumption (Y) and current income (X), which can be
observable. Since the observed Y and X may not equal Y p and Xp, there is the problem of errors of
measurement. The disturbance term u may in this case then also represent the errors of measurement. If
there are such errors of measurement, they can have serious implications for estimating the regression
coefficients, the β’s.
6. Principle of parsimony:
Following Occam’s razor, we would like to keep our regression model as simple as possible. If we can
explain the behavior of Y “substantially” with two or three explanatory variables and if our theory is not
strong enough to suggest what other variables might be included, why introduce more variables? Let ui
represent all other variables. Of course, we should not exclude relevant and important variables just to
keep the regression model simple.

The Coefficient of Determination r2: A Measure of “Goodness of Fit” (Gujarati page 73)

We consider the goodness of fit of the fitted regression line to a set of data; that is, we shall find out how
“well” the sample regression line fits the data. It is clear that if all the observations were to lie on the
regression line, we would obtain a “perfect” fit, but this is rarely the case. The coefficient of
determination r2 (two-variable case) or R2 (multiple regression) is a summary measure that tells how well
the sample regression line fits the data.
The quantity r2 thus defined is known as the (sample) coefficient of determination and is the most
commonly used measure of the goodness of fit of a regression line. Verbally, r 2 measures the proportion
or percentage of the total variation in Y explained by the regression model.
The Structure of Economic Data
Cross-Sectional Data
A cross-sectional data set consists of a sample of individuals, households, firms, cities, states, countries,
or a variety of other units, taken at a given point in time. Cross-section data are data on one or more
variables collected at the same point in time.

Time Series Data

A time series data set consists of observations on a variable or several variables over time. Examples of
time series data include stock prices, money supply, consumer price index, gross domestic product,
annual homicide rates, and automobile sales figures.
Pooled cross Sections
Some data sets have both cross-sectional and time series features. In pooled, or combined, data are
elements of both time series and cross-section data. A researcher might collect data on student test scores
in different schools (cross-section) in the year before the policy was implemented, and then again in the
year after. By combining these two cross-sectional datasets, the researcher can analyze the effect of the
policy on test scores over time.
Suppose we have data on 250 houses for 1993 and on 270 houses for 1995.
Panel or longitudinal Data
A panel data (or longitudinal data) set consists of a time series for each cross-sectional member in the
data set. This is a special type of pooled data in which the same cross-sectional unit (say, a family or a
firm) is surveyed over time. As an example, suppose we have wage, education, and employment history
for a set of individuals followed over a ten-year period. Or we might collect information, such as
investment and financial data, about the same set of firms over a five-year time period.

Key Assumptions of Linear Regression

1. Linearity

Assumption: The relationship between the independent and dependent variables is linear.

The first and foremost assumption of linear regression is that the relationship between the
predictor(s) and the response variable is linear. This means that a change in the independent
variable results in a proportional change in the dependent variable. This can be visually
assessed using scatter plots or residual plots.

If the relationship is not linear, the model may underfit the data, leading to inaccurate
predictions. In such cases, transformations of the data or the use of non-linear regression models
may be more appropriate.

Example:
Consider a dataset where the relationship between temperature and ice cream sales is being
studied. If sales increase non-linearly with temperature (e.g., significantly more sales at high
temperatures), a linear model may not capture this effect well. We'll also show a scenario where
the relationship is not linear.

 Linear Relationship: This is where the increase in temperature results in a consistent

increase in ice cream sales.

 Non-Linear Relationship: In this case, the increase in temperature leads to a more

significant increase in ice cream sales at higher temperatures, indicating a non-linear
relationship.

2. Multivariate Normality - Normal Distribution

Multivariate normality is a key assumption for linear regression models when making
statistical inferences. Specifically, it means that the residuals (the differences between observed
and predicted values) should follow a normal distribution when considering multiple
predictors together. This assumption ensures that hypothesis tests, confidence intervals, and p-
values are valid. This assumption can be assessed by examining histograms or Q-Q plots of the
residuals, or through statistical tests such as the Kolmogorov-Smirnov test.

This assumption is crucial because it allows us to make valid inferences about the model's
parameters and the relationship between the dependent and independent variables.

3. Lack of Multicollinearity

Assumption: The independent variables are not highly correlated with each other.

Multicollinearity occurs when two or more independent variables in the model are highly
correlated, leading to redundancy in the information they provide. This can inflate the standard
errors of the coefficients, making it difficult to determine the effect of each independent variable.
It is essential that the independent variables are not too highly correlated with each other, a
condition known as multicollinearity. This can be checked using:

 Correlation matrices, where correlation coefficients should ideally be below 0.80.

 Variance Inflation Factor (VIF), with VIF values above 10 indicating problematic
multicollinearity. Solutions may include centering the data (subtracting the mean score from
each observation) or removing the variables causing multicollinearity. Example: In a model
predicting health outcomes based on multiple health metrics, if both blood pressure and heart
rate are included as predictors, their high correlation may lead to multicollinearity.

4. Homoscedasticity of Residuals in Linear Regression

Homoscedasticity is one of the key assumptions of linear regression, which asserts that the
residuals (the differences between observed and predicted values) should have a constant
variance across all levels of the independent variable(s). In simpler terms, it means that the
spread of the errors should be relatively uniform, regardless of the value of the predictor.

When the residuals maintain constant variance, the model is said to be homoscedastic.
Conversely, when the variance of the residuals changes with the level of the independent
variable, we refer to this phenomenon as heteroscedasticity.

Heteroscedasticity can lead to several issues:

 Inefficient Estimates: The estimates of the coefficients may not be the best linear unbiased
estimators (BLUE), meaning that they could be less accurate than they should be.

 Impact on Hypothesis Testing: Standard errors can become biased, leading to unreliable
significance tests and confidence intervals.

5. Absence of Endogeneity

No endogeneity is an important assumption in the context of multiple linear regression. The

assumption of no endogeneity states that the independent variables in the regression model
should not be correlated with the error term. If this assumption is violated, it leads to biased and
inconsistent estimates of the regression coefficients.

 Bias and Consistency: When endogeneity is present, the estimates of the regression coefficients
are biased, meaning they do not accurately reflect the true relationships between the variables.
Additionally, the estimates become inconsistent, which means they do not converge to the true
parameter values as the sample size increases.

 Valid Inference: The assumption of no endogeneity is critical for conducting valid hypothesis
tests and creating reliable confidence intervals. If endogeneity exists, the statistical tests based on
these estimates may lead to incorrect conclusions.
Detecting Violations of Assumptions

It is crucial to assess whether the assumptions of linear regression are met before fitting a model.
Here are some common techniques to detect violations:

1. Residual Plots: Plotting the residuals against the fitted values or independent variables can help
visualize linearity, homoscedasticity, and independence of errors. Ideally, the residuals should
show no pattern, indicating a linear relationship and constant variance.

2. Q-Q Plots: A Quantile-Quantile plot can be used to assess the normality of residuals. If the
residuals follow a straight line in a Q-Q plot, they are normally distributed.

3. Variance Inflation Factor (VIF): To check for multicollinearity, calculate the VIF for each
independent variable. A VIF value greater than 5 or 10 indicates significant multicollinearity.

4. Durbin-Watson Test: This statistical test helps detect the presence of autocorrelation in the
residuals. A value close to 2 indicates no autocorrelation, while values significantly less than or
greater than 2 indicate the presence of positive or negative autocorrelation, respectively.

5. Statistical Tests: Perform statistical tests like the Breusch-Pagan test for homoscedasticity and
the Shapiro-Wilk test for normality.

Addressing Violations of Assumptions

If any of the assumptions are violated, there are various strategies to mitigate the issue:

 Transformations: Apply transformations (e.g., logarithmic, square root) to the dependent

variable to address non-linearity and heteroscedasticity.

 Adding Variables: If autocorrelation or omitted variable bias is suspected, consider adding

relevant predictors to the model.

 Generalized Least Squares (GLS): This approach can be used when the residuals are
heteroscedastic or correlated.

The Nature of Econometrics and Economic Data
No ratings yet
The Nature of Econometrics and Economic Data
9 pages
Econometrics Lecture1
No ratings yet
Econometrics Lecture1
24 pages
Assignment
No ratings yet
Assignment
20 pages
Econometrics Notes 2024
100% (3)
Econometrics Notes 2024
46 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
105 pages
Econometrics Lecture Chapter 2 Note pdf-1
No ratings yet
Econometrics Lecture Chapter 2 Note pdf-1
34 pages
Econometrics 1
No ratings yet
Econometrics 1
7 pages
Introduction to Econometrics
No ratings yet
Introduction to Econometrics
6 pages
CHAPTER 2 AcFn
No ratings yet
CHAPTER 2 AcFn
83 pages
Introduction to Econometrics Basics
No ratings yet
Introduction to Econometrics Basics
16 pages
Econometric S
No ratings yet
Econometric S
49 pages
Unit 2metrics
No ratings yet
Unit 2metrics
50 pages
Introduction to Econometrics
No ratings yet
Introduction to Econometrics
32 pages
EconometricsChapter One
No ratings yet
EconometricsChapter One
32 pages
Regression Analysispdf
No ratings yet
Regression Analysispdf
20 pages
Econometrics for Economics Students
No ratings yet
Econometrics for Economics Students
16 pages
Econometrics Notes PDF
No ratings yet
Econometrics Notes PDF
8 pages
Regression Analysis Basics
No ratings yet
Regression Analysis Basics
6 pages
Econometrics Note
No ratings yet
Econometrics Note
13 pages
Lecture 1 Introduction To Econometrics
No ratings yet
Lecture 1 Introduction To Econometrics
28 pages
ECO - Chapter 2 SLRM
No ratings yet
ECO - Chapter 2 SLRM
40 pages
Sample
No ratings yet
Sample
11 pages
Unit Regression Analysis: Objectives
No ratings yet
Unit Regression Analysis: Objectives
18 pages
Econometrics Unit 1
No ratings yet
Econometrics Unit 1
34 pages
Econometrics Notes
No ratings yet
Econometrics Notes
29 pages
Principles of Economics Types of Data
No ratings yet
Principles of Economics Types of Data
32 pages
Econometrics Basics and Data Types
No ratings yet
Econometrics Basics and Data Types
264 pages
Lecture #1
No ratings yet
Lecture #1
22 pages
Econometrics Notes
No ratings yet
Econometrics Notes
30 pages
Intro To Data Analysis, Economic Statistics and Econometrics
No ratings yet
Intro To Data Analysis, Economic Statistics and Econometrics
9 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
36 pages
EC212: Introduction To Econometrics (Wooldridge, Ch. 1) : Tatiana Komarova
No ratings yet
EC212: Introduction To Econometrics (Wooldridge, Ch. 1) : Tatiana Komarova
28 pages
CH2 Complete Simple Linear Regression 2011 Mesfin
No ratings yet
CH2 Complete Simple Linear Regression 2011 Mesfin
42 pages
Chapter Two Cep
No ratings yet
Chapter Two Cep
33 pages
Econometrics II
No ratings yet
Econometrics II
15 pages
1 205
No ratings yet
1 205
205 pages
What Is A Math/Stats Model?: 1. Often Describe Relationship Between Variables 2. Types
No ratings yet
What Is A Math/Stats Model?: 1. Often Describe Relationship Between Variables 2. Types
64 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Regression Analysis With Cross-Sectional Data
No ratings yet
Regression Analysis With Cross-Sectional Data
0 pages
Econometrics I CH 1 and 2
No ratings yet
Econometrics I CH 1 and 2
137 pages
Pertemuan 1
No ratings yet
Pertemuan 1
32 pages
Lecture 3
No ratings yet
Lecture 3
31 pages
Chapter 1 - Nature of Applied Econometrics and Economic Data
No ratings yet
Chapter 1 - Nature of Applied Econometrics and Economic Data
38 pages
Sent Econometrics May 2025
No ratings yet
Sent Econometrics May 2025
36 pages
2 Variable Regression
No ratings yet
2 Variable Regression
28 pages
Intro to Econometrics for Policy
No ratings yet
Intro to Econometrics for Policy
34 pages
Introduction S
No ratings yet
Introduction S
31 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Econometrics and Regression Analysis
No ratings yet
Econometrics and Regression Analysis
23 pages
The Nature of Regression Analysis: Explanatory Variables, With A View To Estimating And/predicting The (Populatiojn)
No ratings yet
The Nature of Regression Analysis: Explanatory Variables, With A View To Estimating And/predicting The (Populatiojn)
6 pages
Chapter Two Metrics (I)
No ratings yet
Chapter Two Metrics (I)
35 pages
CHAPTER TWO-Econometrics I (Econ 2061) Edited1 PDF
100% (1)
CHAPTER TWO-Econometrics I (Econ 2061) Edited1 PDF
35 pages
Econometrics Assig 1
0% (1)
Econometrics Assig 1
13 pages
E 340
No ratings yet
E 340
6 pages
Understanding Regression Analysis
No ratings yet
Understanding Regression Analysis
2 pages
Chapter One Review of Linear Regression Models: Definitions and Components of Econometrics
No ratings yet
Chapter One Review of Linear Regression Models: Definitions and Components of Econometrics
65 pages
Econometrics - Basic 1-8
100% (1)
Econometrics - Basic 1-8
58 pages
Interaction Nonlinearity and Multicollinearity Implica - 1993 - Journal of Ma
No ratings yet
Interaction Nonlinearity and Multicollinearity Implica - 1993 - Journal of Ma
8 pages
2020-Student Performance Prediction Based On Blended Learning
No ratings yet
2020-Student Performance Prediction Based On Blended Learning
8 pages
1 s2.0 S2590048X22000486 Main
No ratings yet
1 s2.0 S2590048X22000486 Main
15 pages
1968 5987 1 PB
No ratings yet
1968 5987 1 PB
14 pages
The Rise of Startups
No ratings yet
The Rise of Startups
29 pages
House Price Prediction
No ratings yet
House Price Prediction
52 pages
Financial Inclusion in India
100% (1)
Financial Inclusion in India
20 pages
CQF June 2021 M4L4 Solutions
No ratings yet
CQF June 2021 M4L4 Solutions
14 pages
Linear Regression Assumptions Impact
No ratings yet
Linear Regression Assumptions Impact
21 pages
Twin Momentum - Fundamental Trends Matter
No ratings yet
Twin Momentum - Fundamental Trends Matter
65 pages
Customer Review, Influencer Endorsement, and Purchase Intention The Moderating Role of Brand Image
No ratings yet
Customer Review, Influencer Endorsement, and Purchase Intention The Moderating Role of Brand Image
18 pages
Impact of Remuneration On Motivation: A Study of Acwell Engineering (PVT) LTD Sri Lanka
No ratings yet
Impact of Remuneration On Motivation: A Study of Acwell Engineering (PVT) LTD Sri Lanka
9 pages
Impact of Improved Maize Adoption On Household Food Security
No ratings yet
Impact of Improved Maize Adoption On Household Food Security
13 pages
PPTML
No ratings yet
PPTML
16 pages
2023 Level II Key Facts and Formula Sheet (KFFS)
No ratings yet
2023 Level II Key Facts and Formula Sheet (KFFS)
14 pages
Lec 6
No ratings yet
Lec 6
133 pages
Regression Analysis Workshop
No ratings yet
Regression Analysis Workshop
36 pages
Anotasyon Sa Artikulong Pampananaliksik: file:///C:/Users/PC - 00022/Downloads/1455-7908-1-PB PDF
No ratings yet
Anotasyon Sa Artikulong Pampananaliksik: file:///C:/Users/PC - 00022/Downloads/1455-7908-1-PB PDF
6 pages
(Ebook PDF) Discovering Statistics Using IBM SPSS Statistics 4th PDF Download
No ratings yet
(Ebook PDF) Discovering Statistics Using IBM SPSS Statistics 4th PDF Download
57 pages
Financial Risk Analysis Guide
No ratings yet
Financial Risk Analysis Guide
49 pages
6.erlina Dewi Endah Amaliyah, Lilik Rohmawati, Scorina Dwiantari (2023) - Improving Market Value of Tex
No ratings yet
6.erlina Dewi Endah Amaliyah, Lilik Rohmawati, Scorina Dwiantari (2023) - Improving Market Value of Tex
16 pages
Chapter 3 Econometrics
No ratings yet
Chapter 3 Econometrics
34 pages
Discriminant Analysis Guide
No ratings yet
Discriminant Analysis Guide
83 pages
IDX Firm Size & Debt Impact Analysis
No ratings yet
IDX Firm Size & Debt Impact Analysis
9 pages
Examining Financial Risk Tolerance Via Mental Accounting and The Behavioral Life Cycle Hypothesis 1528 2678 22-4-172
No ratings yet
Examining Financial Risk Tolerance Via Mental Accounting and The Behavioral Life Cycle Hypothesis 1528 2678 22-4-172
13 pages
Consequences of Multicollinearity
100% (2)
Consequences of Multicollinearity
2 pages
FUJAFR Mustapha+&+Abdullahi 2023 v1 I1 12
No ratings yet
FUJAFR Mustapha+&+Abdullahi 2023 v1 I1 12
7 pages
Multicollinearity
No ratings yet
Multicollinearity
36 pages
2024 - Algal Research Publication
No ratings yet
2024 - Algal Research Publication
20 pages
Vigor, Dedication & Employee Performance
No ratings yet
Vigor, Dedication & Employee Performance
6 pages

Introduction of Econometrics

Uploaded by

Introduction of Econometrics

Uploaded by

Introduction to Econometrics

The Significance of the Stochastic Disturbance Term (Gujarati Page 41-42)

Time Series Data

Key Assumptions of Linear Regression

 Linear Relationship: This is where the increase in temperature results in a consistent

 Non-Linear Relationship: In this case, the increase in temperature leads to a more

2. Multivariate Normality - Normal Distribution

 Correlation matrices, where correlation coefficients should ideally be below 0.80.

4. Homoscedasticity of Residuals in Linear Regression

Heteroscedasticity can lead to several issues:

No endogeneity is an important assumption in the context of multiple linear regression. The

Addressing Violations of Assumptions

 Transformations: Apply transformations (e.g., logarithmic, square root) to the dependent

 Adding Variables: If autocorrelation or omitted variable bias is suspected, consider adding

You might also like