PRACTICAL EXERCISE 10: AUTOCORRELATION
This practical exercise focuses on the methods of detection and remedies for autocorrelation in
multiple regression estimation as discussed in Session 10. Autocorrelation is most often
encountered when working with time-series data.
Please follow the instructions carefully, and only ask a tutor for help if you can’t work out what to
do yourself.
1.      Log on to the workstation using your student username and password (Recall: for
        emergencies, the generic login is username teacher and password training).
2.      Open STATA:
3.      Start a log file:
            Click on the log icon (fourth button from the left – it looks like a brown notebook).
            Select where you want the log file to be saved.
            Choose the Log (*.log) format from the ‘save as type’ drop-down list.
            Provide a name for the log file (for this prac exercise, you can call your log file prac11).
            Click on ‘Save’.
            Check that the message in the Results window shows that you’ve started a text file.
4.      Download the dataset autocorr1.dta from the Moodle site for the course and save it in a
        location that is easy for you to find. Go back to Stata and open the dataset from where you
        have saved it.
        This data set contains time series data on aggregate consumption expenditure and
        disposable income of South African households. To confirm this, type:
                    desc 
5.      Regress consumption on disposable income, predicting the residuals and calling the
        variable that contains the residuals e. Type:
                    reg cons inc
                    predict e, resid
6.      Graph the residuals against time (year). Use the option yline(0) to include a horizontal line
        at 0 in your scatter plot. Type:
                    scatter e year, yline(0)
        From what you observe, do you think there may be a problem and if so, what sort?
7.      We need to generate the lagged values of the residuals in order to see whether there is a
        relationship between current and previous values of the residuals. Note that _n is Stata’s
        internal reference for the current observation; _n-1 means the previous observation. Type:
                    gen elag=e[_n-1]
        Check the effect of this command by looking at the values of the variable elag in the Data
        Browser (you can open this by clicking on the magnifying glass icon).
8.      Graph elag against e, including the parameters xline(0)yline(0)in order for Stata to
        draw an x-axis and a y-axis in the scatter plot. Type:
                    scatter e elag, xline(0) yline(0)
        What can you conclude?
Prac 10 - Autocorrelation                                                                       Page 1 of 4
9.      Conduct the Durbin-Watson test for autocorrelation. First you have to tell Stata which
        variable corresponds to the time period, and then you can do the test. Type:
                    tsset year
        This tells Stata that the variable corresponding to the time period is called ‘year’. In order to
        perform the Durbin-Watson test, type:
                    dwstat
        You need to look up the critical values in the Durbin-Watson d-statistic table (Table B-4 in
        Appendix B (from Moodle)), and use the rules in the lecture notes to interpret them. What
        can you conclude? Does it confirm your earlier impressions?
10.     Conduct the runs test for autocorrelation to confirm your findings above:
        runtest e, mean
      The null hypothesis is that the residuals are randomly distributed (i.e. no autocorrelation).
      Using the p-value displayed with the results of the test, what can you conclude?
A     First Approach to Removing Autocorrelation
11.     Given your findings above, first assume that the value of rho = 1 and perform the GLS
        transformation of the variables. Note that we are assuming an AR(1) process – is this
        necessarily a valid assumption? You can transform your model into a first differenced
        equation as indicated in your notes as follows.
        First generate lagged values of inc and cons:
                    gen consl=cons[_n-1]
                    gen incl=inc[_n-1]
        Then form the first differences of both the dependent and explanatory variables:
                    gen dcons=cons-consl
                    gen dinc=inc-incl
        Run the regression with the differenced variables as follows:
                    regress dcons dinc, noconstant
        Why is no constant included?
12.     Obtain the DW statistic. Type:
                    dwstat
        What do you conclude? Note that, strictly speaking, the DW test is not appropriate here – a
        non-parametric test would be suitable (why?).
13.     Predict the residuals of the transformed model and conduct the runs test. Type:
                    predict et, resid
Prac 10 - Autocorrelation                                                                       Page 2 of 4
        At what level of significance would you fail to reject the null hypothesis of no
        autocorrelation? Why do we use (only) a non-parametric test such as the runs test for these
        transformed regressions?
B     Second Approach
14.     Now estimate the value of rho from the DW statistic of the original regression model (the
        one plagued by autocorrelation in 5 above). Recall that d ≈ 2(1 –     ) from equation 4 of your
        notes, hence        ≈1–   . Obtain the estimate of rho using this equation and the DW statistic
        for the original model, which you calculated in 9 above. Type:
                    gen rho2=…
        What is its value? Type:
                    display rho2
15.     Now using this estimate of rho (rho2), perform the GLS transformation of the variables as
        explained in your notes. [Notice that we are assuming an AR(1) process – is this
        necessarily a valid assumption?]
                    gen dcons2=cons-(rho2*consl)
                    gen dinc2=inc-(rho2*incl)
16.     Regress the transformed variables as follows (using regdw in order to get Stata to print out
        a DW statistic):
                    regdw    dcons2 dinc2
17.     What do you conclude? Check, using the runs test to confirm your findings. Type:
                    predict et2, resid
                    runtest et2, mean
C     Third Approach
18.     Now obtain an estimate of rho from the OLS residuals as follows.
                    regress e elag, noconstant
        What is the estimated value of rho? Why is noconstant included?
      The estimated value of rho (i.e. rho3) is the coefficient on elag. To get Stata to retrieve this
      value, enter the command below. (Note: _b means the coefficient on the variable that
      follows.)
                    gen rho3=_b[elag]
Prac 10 - Autocorrelation                                                                     Page 3 of 4
19.     Perform the GLS transformation of the variables (using this value of rho) as follows:
                    gen dcons3=cons-(rho3*consl)
                    gen dinc3=inc-(rho3*incl)
20.     Regress the transformed variables as follows:
                    regdw   dcons3 dinc3
21.     What do you conclude? Check, using the runs test to confirm your findings. Type:
                    predict et3, resid
                    runtest et3, mean
Prac 10 - Autocorrelation                                                                       Page 4 of 4