0% found this document useful (0 votes)
21 views71 pages

Introduction To Stata

Uploaded by

tesfafentaw89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views71 pages

Introduction To Stata

Uploaded by

tesfafentaw89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

CHAPTER 1-

INTRODUCTION
WHAT IS STATA?
COMPILED BY:
T E S F AY E A B AT E
.
• STATA is a software package widely used in any
discipline.
• As many of software packages it used to handle
rigorous and complex numerical manipulations.
• Hence, it can provide you with lots of mathematical,
statistical, and econometric etc manipulations and
computations, your mind being the master.
.
• Other software packages that can be used to
serve the same purpose include: SHAZAM,
LIMPDEP, SAS, SPSS, E-VIEWS, PC-GIVE,
EASY-REG which is a choice basket for us to
pick the one of our preference.
. WHY STATA?
• Usually STATA seems to be popular among Economists and for
Economic data. The reasons include:
Special importance for Economic data (Popularity)
Inexpensive
Interactive
Excellent graphics: having neat display
Excellent technical support
Powerful but simple commands
Too much strong in regression and logistic regression
Has ‘robust’ facilities
. TYPES OF STATA
• Generally, there are three types of STATA:
I. Stata SE
II. Intercooled Stata and
III. Small Stata.
• Their difference lies in the capacity of handling
observations and variables.
• However, in regard to the functionality they are identical
and the same.
.
No of No of
Type Observations Variables Matrix Size

> 2 Bill 32,766 11000´11000

STATA SE

> 2 Bill 2,047 800´800

Intercooled
Stata
1000 99 40´40
Small Stata
. HOW TO MANIPULATE STATA?
• We can manipulate or work with STATA in two different
and usually supplementary ways:
Graphical User Interface
Using Command
1. Graphical User Interface
• In the case of Graphical user Interface, as the name implies, you will
have an interface with the menus found in STATA.
• The menu bar of STATA is a row at the top of the window.
• It includes File Edit, Preference, Data, Graphics, Help, Window,
Statistics, and User.
.
• Moreover, below the menu bar there is a ‘Button
Bar’ each button serving only for one command.
• These buttons are selected from menu bar believing
as they are the most frequently used commands.
• The button bar include: Open, Save, Print, Log,
Viewer (Help Window), Results, Graph, Do-
Editor, Data Editor, Data Browser, Go, and
Break.
Brain Storming: What is the function of Go and Break button?
.

2. Using command
• It is manipulating STATA by typing commands in the command
window.
• This method requires the user to know each command in a precise
manner.
• STATA is also case sensitive in that the user should identify what to
write: capital or small letters.
• While using STATA, we are allowed to have only one command at a time.
• For example: If you want to make a regression of death against pop, you
should
.

– sysuse census, Enter


– reg death pop
• However, it is impossible to perform the above task writing
it at once in the command window
• Why Using Command
• It is strongly recommended for a user to follow the
command window for the following features
• Avoids laziness
• Reproducibility
• Extensibility
• Traceability
• Comprehensiveness
• documentation
Ö For some cases you might not know the correct command and will be enforced
to follow GUI, but still it provides you a paramount importance to recall the
command that is displayed in the results window. Actually, it is the effect of GUI
but will indicate you for later use!
F I L E M A N A G E M E N T I N S TATA

• File is a place where the results of your work, may be data,


log file, do file is placed.
• Managing you file means knowing where you are working
in now, where to search for it for another time.
1. Print working directory- when you have pwd
command STATA will display where you are working
in.
To justify the importance of pwd command along with
cd (change directory) try to perform activity 1.1
• Syntax- pwd
.

2.Creating a new directory


• Making directory is a process of reserving a place in the
memory of STATA to help you work with name of your
choice.
• As long as you are working in the same directory, the
results and processes you have followed will be
accumulated there making retrieval system simple and
achievable.
• Syntax- mkdir c:\stataclass\introduction
3.Changing working directory
• You can identify where you are working using the pwd command.
• In case you need to change the directory, cd (change directory) will help you.
• To do so
• 1st. write the command ‘cd’ and hit enter [the current working directory will be displayed]
• 2nd. Write ‘cd c:\datum’ if the phrase under single quotation is the file name you want to work in

• Here you should know that the directory ‘datum’ should be first created either
as a new folder or through directory command [mkdir c:\datum]
• Syntax: cd
• Cd c:\datum
4. Remove Directory
• It is used to remove a directory you have previously made.
Syntax - rmdir c:\stataclass\introduction
5. Erasing files
• If the thing to be removed is a file not directory, you can use ‘erase’
command.
• For example to cancel log file, data set etc, it is erase not remove
command which is to be used.
– [Try to identify the difference between directory and file after knowing the log
system and through the check out presented after]
• Syntax- erase c:\datum\mydata.dta
erase c:\datum\mylogfile.smcl
.

• Keeping our Work- Logging


• By default a STATA’S results window only has a temporary memory; after
closing STATA it could not be re-displayed.
• To make storage of outputs in the results window, we have to make a logging
system.
• Log is a command that informs STATA to save the coming features in the
results window.
• The word coming is to mean STATA only saves those features that came after
the log file is opened.
• In opening a log file you should follow the following solid procedures
– Make your own directory
– Try to assure that the logging name you are planning to use is not used
before (If it exists try to clear it out using erase command.
.
• Ex. Lets take that originally there was a log file named
c:\data\mccain.smcl
use the command erase c:\data\mccain.smcl
• mccain will no more exist.
– Having a new directory open the new log file
• Syntax: log using c:\data\essay

– Then try to use STATA for what ever command you have used.
• Here do not forget that every thing you work, good or bad, right or wrong is being saved.
– After completion of your work inform STATA to terminate the logging.
• Syntax: log close

– Try to put a hard copy (written document) of the log file name. for this
particular example the logging file name will be ‘c:\data\essay.smcl’
– To look the log file you have created so far, use a viewing command.
• Syntax: view c:\data\essay.smcl
.
• Logging-on and Logging-off
• In the process of using a log file, in the above steps, there is a
phrase which says, bad or good, right or wrong, which will
signify the importance of logging off and logging on.
• Logging off and on could help us distill the contents of the results
window to be included and excluded.
• Being in a logging system, when you come up with ambiguities
of what a correct command to use, inform STATA not to log the
commands you are going to use by the command ‘log off’.
• This means the logging system, the saving profile that is opened
is at least temporarily suspended. Or as of your familiar example
you are working with in the rough paper.

.

A Common Mistake with logging OFF and ON!

• People usually seen trying to cancel out an incorrect


command or any thing else after it is already logged
but which is impossible.
• You can ignore undesirable results or commands
before you type the command and scarcely before
hitting enter menu!!
.

CHAPTER T WO- DATA MANAGEMENT


• Methods of Data Entry
• Data entry is a technique of entering a data set in to
one of the temporary memories of STATA, which is
data editor.
• A manipulation and computation process could not
be held if the data editor is empty.
• There are five different ways of importing data in to
STATA
.

A.Simple Methods
• Writing on the Data editor
• Input command
• Copy paste from spreadsheet programs
B. Importing
• Using ‘infile’ and ‘insheet’ commands
• Stat Transfer System (A Software Package)
.

• Using input command


• The other way of data entry is using an input command.
• While using an input command an observation for all of the
variables is to take place at a time.
• Before starting the data entering process, you should prove
the data editor to be empty.
• To do so you can use the ‘clear’ command.
• Syntax: input No str8 Name Total Grade
• Why we have written the letters ‘str’ and ‘8’ before the
variable Name
• At this time the data editor is ready to accept observations
(values) for the variables No, Name, Total and Grade.
• From now on, you can write each observation at once without
.

including the ‘input’ command


–Abel 65 3
–Biniam 59 4
–Chirstopher 74 2
–Daniel 85 1
–Eyuel 68 3
–Fetahi 91 1
–Genet 76 2
• Qn.What is the difference between a String and a Numeric variable?
• What differences will it create in estimation and any computational
process?
• Could you make a regression result for a string variable?
. ENCODING AND DECODING (OPTIONAL)

• We can transform a string in to numeric and vice-


versa
• The command to be used is encode var-name, gen
(any name of choice)
• decode numeric name, gen (any name of choice)
• Changing of String variables in to Defined values
can be done in Alphabetical order
.

• Labeling a Data Set


• Labeling in general refers to giving a meaning, description and content to a
data set.
syntax label data “label”
• Labeling variable
• Syntax: label var [varname]”the labeling of a variable you want”

• Labeling Values
• Value in STATA is expected to be understood as a given character or observation for a
variable, intersection of only one variable and one observation.
• In some cases we might be interested to assign a value for a variable. For example,
grade of students was written in numbers, with a vague meaning.
• If you want to represent number ‘1’ as letter grade ‘A’, number ‘2’ as letter grade ‘B’,
etc, you can use labeling values command and the process is called assigning values.
• Syntax: label define letter 1”A” 2”B” 3”C” 4”D”
label values grade grade
.

• Creating a New Variable


Generate
• In cases you want to have a new variable which is a
manipulation of other already existing variables, you can use
generating command.
• The command as the name implies generate (breed) a new
variable keeping already existing variables.
• Because of this you are expected to write new variable name.
• Syntax: gen new-var-name= (mathematical operation of
existing variables)
• Notes:
• gen is the general formula with generating new variables
.

• new-var-name- refers to the variable name intended to be


created. It can assume what ever combination of
characters it is unless there is space. A variable name
having a space in between can not be read by STATA.
Example- South Africa, new var, my var etc.
• Single equality sign (=) - usually STATA only accepts a
double equality sign. However if it is with generate and
replace commands, only a single equality sign is
necessary.
• Mathematical operation- it is manipulation of one or
more existing variables. It is expected to be combined by
the mathematical operators listed before.
.

–Replace
• Replace can serve to perform the tasks that generate
command will undertake except in that it is by
canceling out already existing values of variables.
• It can be used both for changing an existing variable
or filling out a given value from a variable.
• It can also be used with ‘in’ and ’if’ qualifiers.
Extension to generate (egen)
Serves usually the same purpose with generate
command; however we are not free to choose whatever
equation we want except those understood by STATA
itself.
.• Examples of functions that could be handled by egen command

• rowtotal- horizontally sums values of variables listed in the command


• Syntax: egen [varname]=rowtotal(list of variables you want to make a horizontal
sumation)
• Sd- generates standard deviation of one variable. Will have only one value for the
whole of observations
• Syntax: egen [varname]=sd(the variable you want, only one can be included)
• rowmax- it selects one value from many variables with a highest value at each
observation Syntax: egen [varname]=rowmax(varlist)
• rowmean- it makes the mean value of variables selecte, listed in the varlist Syntax:
egen [varname]=rowmean(varlist)
• rowmin, rowmax etc work in the same manner.
• mean, median mode, min, max work with the same command and the result we will
found is too much similar except the value. The general syntax is displayed below.
Only one command is to be selected
• Syntax: egen [varname]=mean(variable name) egen [varname]=min(variable name)
egen [varname]=max(variable name) egen [varname]=mode(variable name)
.

• Reshaping a Data-Set
• Let’s think that you are preparing your senior essay on micro and
small enterprises.
• The performance of these institutions is likely to be determined
trading items, working capital, etc.
• The organization you are claiming for the data might put each
variable in column wise manner, year being treated as one variable.
• In such instances you might become interested to change the
ordering of that data set, which STATA can handle it by reshaping
command.
where, wc representing working capital ti- number of trading items

Individuals WC96 WC97 WC98 WC99 WC200

Aaa 10 12 15 19 20

Bbb 5 8 9 10 11

Ccc 12 14 15 16 18

Ddd 9 10 12 15 17

Fff 8 13 16 18 19

Individuals TI96 TI97 TI98 TI99 TI100

Aaa 2 2 3 2 3

Bbb 1 2 2 1 1

Ccc 2 3 3 3 3

Ddd 3 4 3 2 2

Fff 3 4 2 1 3
.
• The syntax is:
reshape long WC, i(Individuals) j(Year)
• Description:
• reshape is the general format to switch from long to wide and vice-versa
• WC- is a variable name that has been listed dismantling from the original var-
name of WC96, WC97,… etc
• i(Individuals) j(Year)- this nomination of ‘i’ and ‘j’ variables is to make the
reshaping possible.
• For example, the value ‘8’ which is the intersection of Bbb and WC96.
i(Individual), which is a row entry I
• To change a data in long format in to wide the command will be:
- reshape long WC, i(individual) j(Year)
.

• Collapsing
• Collapsing is a process of summarizing a given variable for
a common value like mean, total, median etc. the long
format of the above data shows a working capital for about
five years.
• If we want the total amount that is spend for the time under
consideration, collapse command will help us do so.
• Syntax: collapse (sum) wc, by (year)
• You should note that collapse command is undertaken when
the data is in the long format.

.

• Combining a data-set
• Combining a data-set is to mean adding or including a new STATA data in to an
original one.
• The addition or the process of combination might be either adding of new
observations or new variables.
• In the above example, there are five individuals (Aaa-Bbb) whose working capital has
been investigated for a consecutive five years.
• If you found other individuals having observation of working capital for the mentioned
five years, and you are interested to include, appending command will help you do so.
• In the same manner you have identified the observation for 2001, i.e. additional
variables you can use merging command.
• NB: merging command can correctly be undertaken if there is a common variable that
can serve as a reference.
.

Individuals WC96 WC97 WC98 WC99 WC200

Ggg
11 14 15 19 22

 The original data is technically known as master data.


 The data using now is known as using data
 Both master and using data should be in STATA format, i.e. created in STATA’s data editor
• Process of appending:
.

– open the using data and save it in a systematically known place,


preferable to save it in (Local disk- data (folder)), c:\data\
– clear the data editor and open master data
– write the command append using and the file path of using data
you have created If the file name was ‘theappend’, the syntax
will be:
• append using c:\data\theappend
• Merging
• Merging is adding a new set of variables in to an original data set.
• The using data, the one to e included, and the master data should
have a common variable that can serve as an ID or frame of
reference. And the ID var will be written in the command.
.

• Process of Merging:
• Open the using data, sort it using No.
• Save it in a systematically known place. Mostly
preferable to save it in Local disk, data folder
• clear data editor and open the master data
• sort it using the variable ‘No’
• the write the merging command
• merge No using c:\data\(saving name you have used)
.

• Do-file Editor
• Do file editor is a system of file that used to store a collection of
commends we frequently use.
• If we have collected those commands in the do file editor, then simply
typing a do command will handle those of commands.
• Objective:
• Let’s think that we usually open census data and make a regression of
death against population.
• Then make a test of multi-co linearity, hetero-scedasticity and
autocorrelation.
• If this process is frequent and fade up with typing these commands at any
time we open STATA, we can use/operate do-file editor by the following
procedure:
• Chapter Three: Descriptive Analysis
.

• Descriptive analysis (non regression) represents


statistical computations like mean, median, frequency
etc which will give us a general understanding about
our data set if not a whole picture of variables.
• Before rushing in to an econometric analysis, a
researcher is advised to look for description
including frequency distribution (and or tabulation),
t-test (a test of mean value), correlation, analysis of
variance (oneway and ANOVA).
.
• Summarizing

• Summary is a type of descriptive analysis giving information


regarding with parameters or statistics of Observations,
Mean, Standard deviation, minimum and maximum values.
• A summarizing command without a list of variables will
mean for all variables, no separation will be found i.e. by
default STATA will summarize the whole of variables.
• In case you are interested to make a summarizing only for
a set of variables, it is possible to make ‘sum’ command
with a list of variables.
Syntax: sum
.

–Tabulating
• Tabulate is a process of arranging a data set in tabular (table) structure.
• Specific case of STATA, tabulating refers to listing values of a variable in
ordered manner counting the number of frequencies.
– Syntax tab varname ……….it is simple tabulation
• In some cases one variable could be tabulated under the frame of
another variable.
• Each value could be crossly linked with the other.

• Technically such types of tabulations are referred as cross


tabulations.
– Syntax: tab var1 var2
.

• Var1 will make up the rows and var2 will make the
columns.
• In the same automobile data, you can make a cross
tabulation between repair rate and foreign.
• Sysuse auto, clear
• Syntax: tab foreign rep78
.

• From a separate simple tabulation, repair rate has been


listed and the number of cars with 1 repair rate was 2.
–What is the proportion of these cars between
domestic and foreign?
• We could answer this question by tabulating it with foreign.
• In the above example, from two cars with one repair rate,
both of them are of domestic origin.
• There are totally of 11 cars with a repair rate of 5 from
which 9 are foreign and 2 are domestic.
.

• Options: Chi2, cell, row and column


• Syntax:
– tab foreign rep78, chi2
– tab foreign rep78, row
– tab foreign rep78, cell
– tab foreign rep78, column
• Why chi2?
• Is the variation of repair rate between domestic and foreign cars
significant?
• The question requires a statistical test like chi2.
• To do so we can include a chi2 option and the interpretation is as
follows:
.

If pr() is greater than the chosen level of significance


Pr()<significance level, accept the null
hypothesis and reject the alternative
If pr() is less than the chosen level of significance,
Pr()<significance level, accept the alternate hypothesis and reject the
null

– Option of row, the percentage summation will have a row wise


summation equal to 100%.
– Option of column, the percentage summation will have a column
wise summation equal to 100%.
– Option of cell, the summation of all cell points will be added to a
summation equal to 100%.
.

• If you want to tabulate including mean value it could not possible


using tab command.
• The possible situation will be to use a tabstat command instead of
the sole ‘tab’.
• Syntax: tabstat var1, stats(stats of your choice*)
– Stats include: mean, n(), v(variance), min, max, (k)kurtosis,
(s)skewness, p50(median), sum

• Advanced form of tabulation


• Syntax: tab rep78, sum(foreign)
• The advanced form of tabulation provides information with
parameters of mean, standard deviation and frequency of
distribution.
.
.

• In the results window look over the shaded area of ‘4’ which is a
combination of foreign and 1.5 head room.
• There fore, the interpretation will be: the mean repair rate for
foreign cars of a 1.5 head room is ‘4’.

•Graphing
– Histogram
– Scatter
– Matrix Graph
– Line
– Pie
– Drop line
– spike
.
– . Histogram
• Histogram is a one way plot type, i.e. it only accommodates one
variable.
• Syntax: histogram var1 ………………………….. (Graph 1)
……let var1= price in the auto data
• Options- the options that are available for histogram include
changing the frequency name, giving title, overlaying a normal
distribution line.
• Syntax: -histogram var1, title(It is Graph 2) ….you can take any
variable for v1
– histogram var1, frequency title(It is graph 3)
– histogram var1, frequency title(It is graph 4) normal

.

• scatter plot
• It is a two way plot type where the graph(and each
scatter) being combinations of two different
variables.
– Syntax: tw scatter mpg weight
• Commands of option
–tw scatter mpg weight, mcolor(green)
msymbol(diamond) mlabel(weight)
• tw ( scatter mpg weight) (lfit mpg weight )
.

• A matrix graph Syntax: graph matrix [list of variables]


.

• The graph remarked as box-1 is joint matrix of price


and bedroom.
• In this box Price represent x-axis (hence, explanatory
variable) and bedroom the y-axis (dependent
variable).
• The polar opposite of this graph is the one remarked
under box-2.
• It is the same with the original one except in that
price once has been explanatory variable is turned to
be dependent variable.
.

• Line graphing
• It is a two way plot type there fore should
necessarily accommodate two variables.
• Try to make a line plot using ‘bpwide’ data.
• Syntax: tw line bp_before bp_after patient
• Options
–syntax: tw line bp_before bp_after patient, legend(
label(1 “Before Diagnosis”) label(2 “After Diagnosis”)
position(6) ring(1) rows(1))
.

• Graph pie
• A pie chart is a circle which used to present a
percentage distribution of different variables.
• The whole circle will represent for 100% of the
distributions; and the variables listed for the pie are
expected to exploit the possible maximum number of
observations.
• Syntax: graph pie var1 var2 var3
• Eg use population2000 data
.

• sysuse pop2000,clear
• Syntax : graph pie White Black Indian Asian Island
• The slice of each variable could be identified as pie1, pie2 etc as per
the order of listing variables.
• The options of making a pie graph include: color, plabel,
explode
– graph pie white black indian asian island, pie(3, color(yellow))
– graph pie white black indian asian island, pie(2, color(blue)) pie(2,
explode)
– graph pie white black indian asian island, pie(4, color(yellow))
pie(5, color(red)) pie(1, color(green)).
• ANALYSIS OF VARIANCE (ANOVA)
• One sample ttest

• Ttest helps to compare mean values of different


variables.
• It is followed with a sample statistic of student’s‘t’
distribution.
• To follow the procedures, please try to use the Us
Life Expectancy Data (sysuse uslifeexp).
• test as if the mean vale of life-expectancy (le)
is equal to 64 syntax: ttest le=64
.

• Rejection rule:
• If Pr(T>t) > level of significance (which is
many of the times 5%), accept the null
hypothesis
• If Pr(T>t) < level of significance, accept
the alternate hypothesis
Conclusion….see the stata out put
test if the mean life expectancy of white
males is equal with black males
.

•Two sample ttest


• Two sample ttest compares mean of a variables with values of another given that the values
are only two.
• Example: in the automobile data the variable foreign has two different values; either foreign or
domestic.
• If you make a ttest of this variable, ‘price’ with foreign you will find the following result:

• Syntax ttest price , by(foregn)


• Hypothesis formulation: H0: diff=0
• Ha: diff !=0
• Where, diff is mean(Domestic) – mean(Foreign)
• As the points we have seen before, if difference is equal to zero we will conclude that mean
price of domestic and foreign cars is the same.
• Decision:
.

• ANOVA is usually expected to matter with nature and


relationship of sums of squares.
• The F-statistic is to be computes using the ratio

• where the n-2 is degree of freedom for RSS and 1 for ESS.
For
a model Ui = b 0 + b 1 Xi + Ui , a higher or significant F-
statistic means that the explanatory variable Xi is a cause for
significant variation in Y.
• ANOVA can be one way if the explanatory variable is only
• One-way ANOVA
• To make a one way ANOVA test, the command to be used is ‘oneway’.
• In this command we can only have one explanatory variable, and the word one-way is
signifying as the variation in the dependent variable is being tested by one variable.
• From Automobile data try to see the variation in price as a function of displacement’
• N-way ANOVA

– An N- way ANOVA is the same with one-way except in that in


the case of an N-way the explanatory variables are necessarily
more than one.
• Use the ‘nlsw88’ data which lists the determinants of wage rate
(sysuse nlsw88). The data set has more than 2,000 observations and
delete those after 36 (drop in 36/l).
– anova wage age occupation union
• Anova, could be followed by an option of regress. It will give us a regression coefficient though
.

the interpretation is different from the classical regression coefficient.


• The reason is that the coefficient of anova is unique to each different value of explanatory
variables.
• Try to make a regression of price as a function of ‘rep78’ and ‘mpg’. The same thing is for
ANOVA
• Chapter Four: Econometric Analysis
• Regression
• A linear regression model could be made using ‘reg’ command
followed by a dependent variable and a list of one or more
explanatory variables.
• Options:
• Level- helps to specify a limit or width of confidence interval.
• Ex. reg var1 var2 var3, level(90)
• Non-constant- used to suppress the constant term of a
regression model.
• Using such model means that directing the regression
(predicted) to pass through the origin.
1
.
• Working with ‘error’ term
• Knowing about error term lies at the heart of an econometric science as our
goal is usually to minimize its variance.
• For a simple linear regression model, Yi = b0 + b1 Xi + Ui , it might be
impossible to identify the true population parameters, rather we will find out
the BLUE sample estimators with the following form:
Yi = bˆ 0 + bˆ Xi + Ui ,

• For a given X, the expected value of Y will be Yˆi = bˆ 0 + bˆ Xi , as EUi = 0


• Therefore, the expected value could be predicted by STATA, and also is of the
error tem which is the difference between actual Y value and the predicted
value.
• reg divorce marriage predict error, resid…….using census data
• tw scatter error marriage, yline(0)
.

• Correlation and Covariance


• Correlation tries to measure a linear association between
variables.
• Using census data we can make a correlation between the
variables pop, poplt5, popurban, pop65p.
• Syntax: correlate pop poplt5 popurban pop65p
• Calculation of the covariance among different variables could be
made by including the option of covariance.
• correlate pop poplt5 popurban pop65p, covariance
• Regression on Dummy Variables
• Variables of not freely numeric, characterized by some yes or
no type, what technically called a limited variable, the nature
making it dwarfed in terms of variety but full in meaning,
desirable to your model, may be used to make a regression
model.
• While studying among participants of an informal worker one
of your points might be if those people have undertaken a
training, you will expect ‘yes’ or ‘no’. if it is on success of
students, may be ‘pass’ or ‘fail’.
• A regression on such types of variables are referred as dummy.
• Autocorrelation:
• Autocorrelation refers to any form of correlation between consecutive values of a given
variable.
• For an error term, if the error term at time period ‘Ut’ and the previous 1 or more periods Ut-1,
or Ut-2 correlate with each other we will say there is a problem of autocorrelation.
• We can therefore understand that the problem of autocorrelation will exist for time series
data.
• Use sp500 data to have the following data
tset date
• The time variable is identified to be date by the command of
‘tset’.
• To check for the problem of heteroscedasticity, you can type
the ‘godfrey’ command.
•Multicollinearity
• Multicollinearity is a correlation between two or more sets of
explanatory variables.
• By the assumption of classical linear regression models,
explanatory variables are expected to be fixed numbers: not a
function of any endogenous variable rather to be exogenously
determined.
• If this assumption is violated, we will say there is a problem of
autocorrelation.
• To undertake a test of multicolinearity, the command you are expected to type is ‘vif’.
• The following test has been made after regressing the following model from census data
• Meaning vif- shows by how much is the true variance inflated because of ‘multicollinearity’
• The coefficient rep78 over price when the former is
1 is -1563.861.
• This coefficient should be analyzed in comparison of
the dropped value of ‘5’.
• Hence we will say:
–Other things remaining constant (ceteris paribus), price
will decrease by 1563.861 when rep78 decrease from ‘5’
to ‘1’.
–Other thing remaining constant a fall of rep78 from 5 to
3 will cause a price fall of 1,316

You might also like