CHAPTER 1-
INTRODUCTION
WHAT IS STATA?
     COMPILED BY:
     T E S F AY E A B AT E
.
• STATA is a software package widely used in any
  discipline.
• As many of software packages it used to handle
  rigorous and complex numerical manipulations.
• Hence, it can provide you with lots of mathematical,
  statistical, and econometric etc manipulations and
  computations, your mind being the master.
.
• Other software packages that can be used to
  serve the same purpose include: SHAZAM,
  LIMPDEP, SAS, SPSS, E-VIEWS, PC-GIVE,
  EASY-REG which is a choice basket for us to
  pick the one of our preference.
. WHY STATA?
• Usually STATA seems to be popular among Economists and for
  Economic data. The reasons include:
Special importance for Economic data (Popularity)
Inexpensive
Interactive
Excellent graphics: having neat display
Excellent technical support
Powerful but simple commands
Too much strong in regression and logistic regression
Has ‘robust’ facilities
. TYPES OF STATA
• Generally, there are three types of STATA:
  I. Stata SE
  II. Intercooled Stata and
  III. Small Stata.
• Their difference lies in the capacity of handling
  observations and variables.
• However, in regard to the functionality they are identical
  and the same.
.
              No of          No of
Type          Observations   Variables Matrix Size
              > 2 Bill       32,766   11000´11000
STATA SE
              > 2 Bill       2,047    800´800
Intercooled
Stata
              1000           99       40´40
Small Stata
. HOW TO MANIPULATE STATA?
• We can manipulate or work with STATA in two different
  and usually supplementary ways:
   Graphical User Interface
   Using Command
1. Graphical User Interface
• In the case of Graphical user Interface, as the name implies, you will
  have an interface with the menus found in STATA.
• The menu bar of STATA is a row at the top of the window.
• It includes File Edit, Preference, Data, Graphics, Help, Window,
  Statistics, and User.
.
• Moreover, below the menu bar there is a ‘Button
  Bar’ each button serving only for one command.
• These buttons are selected from menu bar believing
  as they are the most frequently used commands.
• The button bar include: Open, Save, Print, Log,
  Viewer (Help Window), Results, Graph, Do-
  Editor, Data Editor, Data Browser, Go, and
  Break.
Brain Storming: What is the function of Go and Break button?
.
2. Using command
• It is manipulating STATA by typing commands in the command
  window.
• This method requires the user to know each command in a precise
  manner.
• STATA is also case sensitive in that the user should identify what to
  write: capital or small letters.
• While using STATA, we are allowed to have only one command at a time.
• For example: If you want to make a regression of death against pop, you
  should
.
        – sysuse census, Enter
        – reg death pop
• However, it is impossible to perform the above task writing
  it at once in the command window
• Why Using Command
• It is strongly recommended for a user to follow the
  command window for the following features
      • Avoids laziness
      • Reproducibility
      • Extensibility
      • Traceability
      • Comprehensiveness
      • documentation
Ö For some cases you might not know the correct command and will be enforced
to follow GUI, but still it provides you a paramount importance to recall the
command that is displayed in the results window. Actually, it is the effect of GUI
but will indicate you for later use!
F I L E M A N A G E M E N T I N S TATA
• File is a place where the results of your work, may be data,
  log file, do file is placed.
• Managing you file means knowing where you are working
  in now, where to search for it for another time.
    1. Print working directory- when you have pwd
       command STATA will display where you are working
       in.
         To justify the importance of pwd command along with
        cd (change directory) try to perform activity 1.1
• Syntax- pwd
.
     2.Creating a new directory
• Making directory is a process of reserving a place in the
  memory of STATA to help you work with name of your
  choice.
• As long as you are working in the same directory, the
  results and processes you have followed will be
  accumulated there making retrieval system simple and
  achievable.
• Syntax- mkdir c:\stataclass\introduction
3.Changing working directory
• You can identify where you are working using the pwd command.
• In case you need to change the directory, cd (change directory) will help you.
• To do so
       • 1st. write the command ‘cd’ and hit enter [the current working directory will be displayed]
       • 2nd. Write ‘cd c:\datum’ if the phrase under single quotation is the file name you want to work in
• Here you should know that the directory ‘datum’ should be first created either
  as a new folder or through directory command [mkdir c:\datum]
• Syntax: cd
• Cd c:\datum
   4. Remove Directory
• It is used to remove a directory you have previously made.
       Syntax - rmdir c:\stataclass\introduction
     5. Erasing files
• If the thing to be removed is a file not directory, you can use ‘erase’
  command.
• For example to cancel log file, data set etc, it is erase not remove
  command which is to be used.
   – [Try to identify the difference between directory and file after knowing the log
     system and through the check out presented after]
• Syntax- erase c:\datum\mydata.dta
          erase c:\datum\mylogfile.smcl
.
• Keeping our Work- Logging
• By default a STATA’S results window only has a temporary memory; after
  closing STATA it could not be re-displayed.
• To make storage of outputs in the results window, we have to make a logging
  system.
• Log is a command that informs STATA to save the coming features in the
  results window.
• The word coming is to mean STATA only saves those features that came after
  the log file is opened.
• In opening a log file you should follow the following solid procedures
    – Make your own directory
    – Try to assure that the logging name you are planning to use is not used
      before (If it exists try to clear it out using erase command.
.
• Ex. Lets take that originally there was a log file named
        c:\data\mccain.smcl
    use the command erase c:\data\mccain.smcl
• mccain will no more exist.
    – Having a new directory open the new log file
        • Syntax: log using c:\data\essay
    – Then try to use STATA for what ever command you have used.
        • Here do not forget that every thing you work, good or bad, right or wrong is being saved.
    – After completion of your work inform STATA to terminate the logging.
        • Syntax: log close
    – Try to put a hard copy (written document) of the log file name. for this
      particular example the logging file name will be ‘c:\data\essay.smcl’
    – To look the log file you have created so far, use a viewing command.
        • Syntax: view c:\data\essay.smcl
.
• Logging-on and Logging-off
• In the process of using a log file, in the above steps, there is a
  phrase which says, bad or good, right or wrong, which will
  signify the importance of logging off and logging on.
• Logging off and on could help us distill the contents of the results
  window to be included and excluded.
• Being in a logging system, when you come up with ambiguities
  of what a correct command to use, inform STATA not to log the
  commands you are going to use by the command ‘log off’.
• This means the logging system, the saving profile that is opened
  is at least temporarily suspended. Or as of your familiar example
  you are working with in the rough paper.
•
.
    A Common Mistake with logging OFF and ON!
• People usually seen trying to cancel out an incorrect
  command or any thing else after it is already logged
  but which is impossible.
• You can ignore undesirable results or commands
  before you type the command and scarcely before
  hitting enter menu!!
.
CHAPTER T WO- DATA MANAGEMENT
    • Methods of Data Entry
    • Data entry is a technique of entering a data set in to
      one of the temporary memories of STATA, which is
      data editor.
    • A manipulation and computation process could not
      be held if the data editor is empty.
    • There are five different ways of importing data in to
      STATA
.
    A.Simple Methods
     • Writing on the Data editor
     • Input command
     • Copy paste from spreadsheet programs
    B. Importing
• Using ‘infile’ and ‘insheet’ commands
• Stat Transfer System (A Software Package)
.
     • Using input command
• The other way of data entry is using an input command.
• While using an input command an observation for all of the
  variables is to take place at a time.
• Before starting the data entering process, you should prove
  the data editor to be empty.
• To do so you can use the ‘clear’ command.
• Syntax: input No str8 Name Total Grade
• Why we have written the letters ‘str’ and ‘8’ before the
  variable Name
• At this time the data editor is ready to accept observations
  (values) for the variables No, Name, Total and Grade.
• From now on, you can write each observation at once without
.
  including the ‘input’ command
         –Abel 65 3
         –Biniam 59 4
         –Chirstopher 74 2
         –Daniel 85 1
         –Eyuel 68 3
         –Fetahi 91 1
         –Genet 76 2
• Qn.What is the difference between a String and a Numeric variable?
• What differences will it create in estimation and any computational
  process?
• Could you make a regression result for a string variable?
.   ENCODING AND DECODING (OPTIONAL)
• We can transform a string in to numeric and vice-
  versa
• The command to be used is encode var-name, gen
  (any name of choice)
• decode numeric name, gen (any name of choice)
• Changing of String variables in to Defined values
  can be done in Alphabetical order
.
•         Labeling a Data Set
• Labeling in general refers to giving a meaning, description and content to a
  data set.
  syntax label data “label”
• Labeling variable
•   Syntax: label var [varname]”the labeling of a variable you want”
• Labeling Values
• Value in STATA is expected to be understood as a given character or observation for a
  variable, intersection of only one variable and one observation.
• In some cases we might be interested to assign a value for a variable. For example,
  grade of students was written in numbers, with a vague meaning.
• If you want to represent number ‘1’ as letter grade ‘A’, number ‘2’ as letter grade ‘B’,
  etc, you can use labeling values command and the process is called assigning values.
• Syntax: label define letter 1”A” 2”B” 3”C” 4”D”
                 label values grade grade
.
• Creating a New Variable
      Generate
• In cases you want to have a new variable which is a
  manipulation of other already existing variables, you can use
  generating command.
• The command as the name implies generate (breed) a new
  variable keeping already existing variables.
• Because of this you are expected to write new variable name.
• Syntax: gen new-var-name= (mathematical operation of
  existing variables)
• Notes:
• gen is the general formula with generating new variables
.
• new-var-name- refers to the variable name intended to be
  created. It can assume what ever combination of
  characters it is unless there is space. A variable name
  having a space in between can not be read by STATA.
  Example- South Africa, new var, my var etc.
• Single equality sign (=) - usually STATA only accepts a
  double equality sign. However if it is with generate and
  replace commands, only a single equality sign is
  necessary.
• Mathematical operation- it is manipulation of one or
  more existing variables. It is expected to be combined by
  the mathematical operators listed before.
.
       –Replace
• Replace can serve to perform the tasks that generate
  command will undertake except in that it is by
  canceling out already existing values of variables.
• It can be used both for changing an existing variable
  or filling out a given value from a variable.
• It can also be used with ‘in’ and ’if’ qualifiers.
    Extension to generate (egen)
Serves usually the same purpose with generate
command; however we are not free to choose whatever
equation we want except those understood by STATA
itself.
.•   Examples of functions that could be handled by egen command
• rowtotal- horizontally sums values of variables listed in the command
• Syntax: egen [varname]=rowtotal(list of variables you want to make a horizontal
  sumation)
• Sd- generates standard deviation of one variable. Will have only one value for the
  whole of observations
• Syntax: egen [varname]=sd(the variable you want, only one can be included)
• rowmax- it selects one value from many variables with a highest value at each
  observation Syntax: egen [varname]=rowmax(varlist)
• rowmean- it makes the mean value of variables selecte, listed in the varlist Syntax:
  egen [varname]=rowmean(varlist)
• rowmin, rowmax etc work in the same manner.
• mean, median mode, min, max work with the same command and the result we will
  found is too much similar except the value. The general syntax is displayed below.
  Only one command is to be selected
• Syntax: egen [varname]=mean(variable name) egen [varname]=min(variable name)
  egen [varname]=max(variable name) egen [varname]=mode(variable name)
.
• Reshaping a Data-Set
• Let’s think that you are preparing your senior essay on micro and
  small enterprises.
• The performance of these institutions is likely to be determined
  trading items, working capital, etc.
• The organization you are claiming for the data might put each
  variable in column wise manner, year being treated as one variable.
• In such instances you might become interested to change the
  ordering of that data set, which STATA can handle it by reshaping
  command.
where, wc representing working capital              ti- number of trading items
Individuals        WC96             WC97       WC98               WC99                WC200
Aaa                10               12         15                 19                  20
Bbb                5                8          9                  10                  11
Ccc                12               14         15                 16                  18
Ddd                9                10         12                 15                  17
Fff                8                13         16                 18                  19
Individuals        TI96         TI97       TI98               TI99                TI100
Aaa                2            2          3                  2                   3
Bbb                1            2          2                  1                   1
Ccc                2            3          3                  3                   3
Ddd                3            4          3                  2                   2
Fff                3            4          2                  1                   3
.
• The syntax is:
              reshape long WC, i(Individuals) j(Year)
• Description:
• reshape is the general format to switch from long to wide and vice-versa
• WC- is a variable name that has been listed dismantling from the original var-
  name of WC96, WC97,… etc
• i(Individuals) j(Year)- this nomination of ‘i’ and ‘j’ variables is to make the
  reshaping possible.
• For example, the value ‘8’ which is the intersection of Bbb and WC96.
  i(Individual), which is a row entry I
• To change a data in long format in to wide the command will be:
-      reshape long WC, i(individual) j(Year)
.
• Collapsing
• Collapsing is a process of summarizing a given variable for
  a common value like mean, total, median etc. the long
  format of the above data shows a working capital for about
  five years.
• If we want the total amount that is spend for the time under
  consideration, collapse command will help us do so.
• Syntax: collapse (sum) wc, by (year)
• You should note that collapse command is undertaken when
  the data is in the long format.
•
.
• Combining a data-set
• Combining a data-set is to mean adding or including a new STATA data in to an
  original one.
• The addition or the process of combination might be either adding of new
  observations or new variables.
• In the above example, there are five individuals (Aaa-Bbb) whose working capital has
  been investigated for a consecutive five years.
• If you found other individuals having observation of working capital for the mentioned
  five years, and you are interested to include, appending command will help you do so.
• In the same manner you have identified the observation for 2001, i.e. additional
  variables you can use merging command.
• NB: merging command can correctly be undertaken if there is a common variable that
  can serve as a reference.
    .
           Individuals        WC96          WC97           WC98          WC99          WC200
           Ggg
                              11            14             15            19            22
       The original data is technically known as master data.
       The data using now is known as using data
       Both master and using data should be in STATA format, i.e. created in STATA’s data editor
• Process of appending:
.
    – open the using data and save it in a systematically known place,
      preferable to save it in (Local disk- data (folder)), c:\data\
    – clear the data editor and open master data
    – write the command append using and the file path of using data
      you have created If the file name was ‘theappend’, the syntax
      will be:
•            append using c:\data\theappend
• Merging
• Merging is adding a new set of variables in to an original data set.
• The using data, the one to e included, and the master data should
  have a common variable that can serve as an ID or frame of
  reference. And the ID var will be written in the command.
.
• Process of Merging:
    • Open the using data, sort it using No.
    • Save it in a systematically known place. Mostly
      preferable to save it in Local disk, data folder
    • clear data editor and open the master data
    • sort it using the variable ‘No’
    • the write the merging command
• merge No using c:\data\(saving name you have used)
.
• Do-file Editor
• Do file editor is a system of file that used to store a collection of
  commends we frequently use.
• If we have collected those commands in the do file editor, then simply
  typing a do command will handle those of commands.
• Objective:
• Let’s think that we usually open census data and make a regression of
  death against population.
• Then      make a test of multi-co linearity, hetero-scedasticity and
  autocorrelation.
• If this process is frequent and fade up with typing these commands at any
  time we open STATA, we can use/operate do-file editor by the following
  procedure:
• Chapter Three: Descriptive Analysis
.
• Descriptive analysis (non regression) represents
  statistical computations like mean, median, frequency
  etc which will give us a general understanding about
  our data set if not a whole picture of variables.
• Before rushing in to an econometric analysis, a
  researcher is advised to look for description
  including frequency distribution (and or tabulation),
  t-test (a test of mean value), correlation, analysis of
  variance (oneway and ANOVA).
.
• Summarizing
• Summary is a type of descriptive analysis giving information
  regarding with parameters or statistics of Observations,
  Mean, Standard deviation, minimum and maximum values.
• A summarizing command without a list of variables will
  mean for all variables, no separation will be found i.e. by
  default STATA will summarize the whole of variables.
• In case you are interested to make a summarizing only for
  a set of variables, it is possible to make ‘sum’ command
  with a list of variables.
                         Syntax: sum
.
            –Tabulating
• Tabulate is a process of arranging a data set in tabular (table) structure.
• Specific case of STATA, tabulating refers to listing values of a variable in
  ordered manner counting the number of frequencies.
     – Syntax tab varname ……….it is simple tabulation
• In some cases one variable could be tabulated under the frame of
  another variable.
• Each value could be crossly linked with the other.
• Technically such types of tabulations are referred as cross
    tabulations.
     – Syntax: tab var1 var2
.
• Var1 will make up the rows and var2 will make the
  columns.
• In the same automobile data, you can make a cross
  tabulation between repair rate and foreign.
• Sysuse auto, clear
     • Syntax: tab foreign rep78
.
• From a separate simple tabulation, repair rate has been
  listed and the number of cars with 1 repair rate was 2.
    –What is the proportion of these cars between
     domestic and foreign?
• We could answer this question by tabulating it with foreign.
• In the above example, from two cars with one repair rate,
  both of them are of domestic origin.
• There are totally of 11 cars with a repair rate of 5 from
  which 9 are foreign and 2 are domestic.
.
• Options: Chi2, cell, row and column
• Syntax:
    – tab foreign rep78, chi2
    – tab foreign rep78, row
    – tab foreign rep78, cell
    – tab foreign rep78, column
• Why chi2?
• Is the variation of repair rate between domestic and foreign cars
  significant?
• The question requires a statistical test like chi2.
• To do so we can include a chi2 option and the interpretation is as
  follows:
.
    If pr() is greater than the chosen level of significance
           Pr()<significance level, accept the null
    hypothesis and reject the alternative
     If pr() is less than the chosen level of significance,
            Pr()<significance level, accept the alternate hypothesis and reject the
         null
     – Option of row, the percentage summation will have a row wise
       summation equal to 100%.
     – Option of column, the percentage summation will have a column
       wise summation equal to 100%.
     – Option of cell, the summation of all cell points will be added to a
       summation equal to 100%.
.
• If you want to tabulate including mean value it could not possible
  using tab command.
• The possible situation will be to use a tabstat command instead of
  the sole ‘tab’.
• Syntax: tabstat var1, stats(stats of your choice*)
    – Stats include: mean, n(), v(variance), min, max, (k)kurtosis,
      (s)skewness, p50(median), sum
• Advanced form of tabulation
• Syntax: tab rep78, sum(foreign)
• The advanced form of tabulation provides information with
  parameters of mean, standard deviation and frequency of
  distribution.
.
.
• In the results window look over the shaded area of ‘4’ which is a
  combination of foreign and 1.5 head room.
• There fore, the interpretation will be: the mean repair rate for
  foreign cars of a 1.5 head room is ‘4’.
•Graphing
    – Histogram
    – Scatter
    – Matrix Graph
    – Line
    – Pie
    – Drop line
    – spike
.
    – . Histogram
• Histogram is a one way plot type, i.e. it only accommodates one
  variable.
• Syntax: histogram var1 ………………………….. (Graph 1)
  ……let var1= price in the auto data
• Options- the options that are available for histogram include
  changing the frequency name, giving title, overlaying a normal
  distribution line.
       • Syntax: -histogram var1, title(It is Graph 2) ….you can take any
         variable for v1
                  – histogram var1, frequency title(It is graph 3)
                  – histogram var1, frequency title(It is graph 4) normal
•
.
• scatter plot
• It is a two way plot type where the graph(and each
  scatter) being combinations of two different
  variables.
    – Syntax: tw scatter mpg weight
• Commands of option
    –tw   scatter   mpg      weight,            mcolor(green)
     msymbol(diamond) mlabel(weight)
• tw ( scatter mpg weight) (lfit mpg weight )
.
• A matrix graph   Syntax: graph matrix [list of variables]
.
• The graph remarked as box-1 is joint matrix of price
  and bedroom.
• In this box Price represent x-axis (hence, explanatory
  variable) and bedroom the y-axis (dependent
  variable).
• The polar opposite of this graph is the one remarked
  under box-2.
• It is the same with the original one except in that
  price once has been explanatory variable is turned to
  be dependent variable.
.
• Line graphing
• It is a two way plot type there fore should
  necessarily accommodate two variables.
• Try to make a line plot using ‘bpwide’ data.
• Syntax: tw line bp_before bp_after patient
• Options
    –syntax: tw line bp_before bp_after patient, legend(
     label(1 “Before Diagnosis”) label(2 “After Diagnosis”)
     position(6) ring(1) rows(1))
.
• Graph pie
• A pie chart is a circle which used to present a
  percentage distribution of different variables.
• The whole circle will represent for 100% of the
  distributions; and the variables listed for the pie are
  expected to exploit the possible maximum number of
  observations.
• Syntax: graph pie var1 var2 var3
• Eg use population2000 data
.
• sysuse pop2000,clear
• Syntax : graph pie White Black Indian Asian Island
• The slice of each variable could be identified as pie1, pie2 etc as per
  the order of listing variables.
• The options            of making a pie graph include: color, plabel,
    explode
     – graph pie white black indian asian island, pie(3, color(yellow))
     – graph pie white black indian asian island, pie(2, color(blue)) pie(2,
       explode)
     – graph pie white black indian asian island, pie(4, color(yellow))
       pie(5, color(red)) pie(1, color(green)).
• ANALYSIS OF VARIANCE (ANOVA)
• One sample ttest
• Ttest helps to compare mean values of different
  variables.
• It is followed with a sample statistic of student’s‘t’
  distribution.
• To follow the procedures, please try to use the Us
  Life Expectancy Data (sysuse uslifeexp).
• test as if the mean vale of life-expectancy (le)
  is equal to 64 syntax: ttest le=64
.
• Rejection rule:
• If Pr(T>t) > level of significance (which is
  many of the times 5%), accept the null
  hypothesis
• If Pr(T>t) < level of significance, accept
  the alternate hypothesis
Conclusion….see the stata out put
test if the mean life expectancy of white
males is equal with black males
.
•Two sample ttest
• Two sample ttest compares mean of a variables with values of another given that the values
  are only two.
• Example: in the automobile data the variable foreign has two different values; either foreign or
  domestic.
• If you make a ttest of this variable, ‘price’ with foreign you will find the following result:
• Syntax ttest price , by(foregn)
• Hypothesis formulation: H0: diff=0
• Ha: diff !=0
• Where, diff is mean(Domestic) – mean(Foreign)
• As the points we have seen before, if difference is equal to zero we will conclude that mean
  price of domestic and foreign cars is the same.
• Decision:
.
• ANOVA is usually expected to matter with nature and
  relationship of sums of squares.
• The F-statistic is to be computes using the ratio
• where the n-2 is degree of freedom for RSS and 1 for ESS.
  For
a model Ui = b 0 + b 1 Xi + Ui , a higher or significant F-
statistic means that the explanatory variable Xi is a cause for
significant variation in Y.
• ANOVA can be one way if the explanatory variable is only
• One-way ANOVA
• To make a one way ANOVA test, the command to be used is ‘oneway’.
• In this command we can only have one explanatory variable, and the word one-way is
  signifying as the variation in the dependent variable is being tested by one variable.
• From Automobile data try to see the variation in price as a function of displacement’
• N-way ANOVA
   – An N- way ANOVA is the same with one-way except in that in
     the case of an N-way the explanatory variables are necessarily
     more than one.
• Use the ‘nlsw88’ data which lists the determinants of wage rate
  (sysuse nlsw88). The data set has more than 2,000 observations and
  delete those after 36 (drop in 36/l).
   – anova wage age occupation union
• Anova, could be followed by an option of regress. It will give us a regression coefficient though
.
  the interpretation is different from the classical regression coefficient.
• The reason is that the coefficient of anova is unique to each different value of explanatory
  variables.
• Try to make a regression of price as a function of ‘rep78’ and ‘mpg’. The same thing is for
  ANOVA
• Chapter Four: Econometric Analysis
      • Regression
• A linear regression model could be made using ‘reg’ command
  followed by a dependent variable and a list of one or more
  explanatory variables.
• Options:
• Level- helps to specify a limit or width of confidence interval.
• Ex. reg var1 var2 var3, level(90)
• Non-constant- used to suppress the constant term of a
  regression model.
• Using such model means that directing the regression
  (predicted) to pass through the origin.
 1
  .
• Working with ‘error’ term
• Knowing about error term lies at the heart of an econometric science as our
  goal is usually to minimize its variance.
• For a simple linear regression model, Yi = b0 + b1 Xi + Ui , it might be
  impossible to identify the true population parameters, rather we will find out
  the BLUE sample estimators with the following form:
            Yi = bˆ 0 + bˆ Xi + Ui ,
• For a given X, the expected value of Y will be Yˆi = bˆ 0 + bˆ Xi , as EUi = 0
• Therefore, the expected value could be predicted by STATA, and also is of the
  error tem which is the difference between actual Y value and the predicted
  value.
• reg divorce marriage predict error, resid…….using census data
• tw scatter error marriage, yline(0)
.
• Correlation and Covariance
• Correlation tries to measure a linear association between
  variables.
• Using census data we can make a correlation between the
  variables pop, poplt5, popurban, pop65p.
• Syntax: correlate pop poplt5 popurban pop65p
• Calculation of the covariance among different variables could be
  made by including the option of covariance.
• correlate pop poplt5 popurban pop65p, covariance
• Regression on Dummy Variables
• Variables of not freely numeric, characterized by some yes or
  no type, what technically called a limited variable, the nature
  making it dwarfed in terms of variety but full in meaning,
  desirable to your model, may be used to make a regression
  model.
• While studying among participants of an informal worker one
  of your points might be if those people have undertaken a
  training, you will expect ‘yes’ or ‘no’. if it is on success of
  students, may be ‘pass’ or ‘fail’.
• A regression on such types of variables are referred as dummy.
• Autocorrelation:
• Autocorrelation refers to any form of correlation between consecutive values of a given
  variable.
• For an error term, if the error term at time period ‘Ut’ and the previous 1 or more periods Ut-1,
  or Ut-2 correlate with each other we will say there is a problem of autocorrelation.
• We can therefore understand that the problem of autocorrelation will exist for time series
  data.
• Use sp500 data to have the following data
tset date
• The time variable is identified to be date by the command of
  ‘tset’.
• To check for the problem of heteroscedasticity, you can type
  the ‘godfrey’ command.
•Multicollinearity
• Multicollinearity is a correlation between two or more sets of
  explanatory variables.
• By the assumption of classical linear regression models,
  explanatory variables are expected to be fixed numbers: not a
  function of any endogenous variable rather to be exogenously
  determined.
• If this assumption is violated, we will say there is a problem of
  autocorrelation.
• To undertake a test of multicolinearity, the command you are expected to type is ‘vif’.
• The following test has been made after regressing the following model from census data
• Meaning vif- shows by how much is the true variance inflated because of ‘multicollinearity’
• The coefficient rep78 over price when the former is
  1 is -1563.861.
• This coefficient should be analyzed in comparison of
  the dropped value of ‘5’.
• Hence we will say:
  –Other things remaining constant (ceteris paribus), price
   will decrease by 1563.861 when rep78 decrease from ‘5’
   to ‘1’.
  –Other thing remaining constant a fall of rep78 from 5 to
   3 will cause a price fall of 1,316