Assignment 8
1     R Tutorial
Today, we will be going over the Australian hotel data (motel.dat) to see how to fit seasonal
ARIMA (SARIMA) models. The time series consists of total room nights occupied at hotels,
motels, and guest houses in Victoria, Australia from Jan 80 - Jun 95. Use the following
commands to read the data from the file and enter the number of rooms occupied into the
variable nights. We will only use the first 100 data points (why???).
motel<-read.table("motel.dat")
nights<-motel$V1[1:100]
    1. Exploratory data analysis. Plot the data and comment on the seasonal and overall
       trend.
       (a) Let’s take a look at the differenced data which should remove some of the overall
           trend but not the seasonal trend.
           plot.ts(diff(nights))
       (b) Apply a logarithmic transform of the data.
           lnights<-log(nights)
       (c) Next, we should look at both the differenced data and the seasonally differenced
           data (after the log transform). We can use the following commands to create the
           plots.
           par(mfrow=c(1,2))
           plot.ts(diff(lnights))
           plot.ts(diff(lnights,12))
       (d) We want to apply both non-seasonal and seasonal differencing, and examine the
           time series plot, ACF, and PACF of the data.
           plot.ts(diff(diff(lnights),12))
           acf(diff(diff(lnights),12),36, xlim=c(1,36))
           pacf(diff(diff(lnights),12),36)
                                             1
2. Model fitting. Next, we want to fit appropriate models ARIMA(p, d, q) × (P, D, Q)s
   with s = 12.
   (a) Since we consider taking differences for both the regular trend and the seasonal
       trend, d = 1 and D = 1.
   (b) To determine p, q, P, Q, we can check ACF and PACF plots from part 1d.
         • The ACF plot at lags 12,24,36. . . . suggests a seasonal moving average of order
           Q = 0; the PACF plot at lags 12,24,36, . . . suggests a seasonal autoregressive
           of order P = 1. We can also think that both the ACF and the PACF may
           be tailing off at the seasonal lags, so perhaps both components P = 1 and
           Q = 1 are needed.
         • To determine the values of p and q, we check the ACF and PACF plots at the
           within season lags 1,2,. . .,11. From the ACF plot, we can consider an MA(1),
           and hence q = 1; from the PACF plot, we can consider an AR(1), and hence
           p = 1; or we consider ARMA(1,1) with p = q = 1.
   (c) We have identified a few possible models for our data.
          i. ARIMA(1,1,0)×(1, 1, 0)12
         ii. ARIMA(1,1,0)×(1, 1, 1)12
        iii. ARIMA(0,1,1)×(1, 1, 0)12
        iv. ARIMA(0,1,1)×(1, 1, 1)12
         v. ARIMA(1,1,1)×(1, 1, 0)12
        vi. ARIMA(1,1,1)×(1, 1, 1)12
   (d) Recall we use the sarima() function to fit seasonal ARIMA models. Make sure
       you have “sarima.R” saved in your working directory, and type source("sarima.R")
       to load the function into your workspace. For example, to fit model 2(c)i, we type
       fit1<-sarima(lnights,1,1,0,1,1,0,12)
       ##arguments are the vector, p, d, q, P, D, Q, s
   (e) From the diagnostic plots, it appears that model 2(c)i is not a good model. So
       we drop this model from consideration.
3. Model selection. We fit the other models using sarima() and only consider those that
   have adequate diagnostic plots.
   (a) If you go through the models listed in 2c one by one, models 2(c)iii to 2(c)vi have
       adequate diagnostic plots.
   (b) Finally we choose model 2(c)vi, ARIMA(1,1,1)×(1, 1, 1)12 , which passes the diag-
       nostic test and has the smallest AIC, AICc , and BIC.
   (c) For completeness, check that the estimated coefficients of model 2(c)vi are signif-
       icant.
                                          2
            Coefficients:
                     ar1     ma1           sar1         sma1
                  0.3927 -0.9999         0.3518      -0.9997
            s.e. 0.1037   0.2434         0.1453       0.3326
            What do these values tell us?
    4. Prediction. We would like to use model 2(c)vi to forecast 12 months into the future.
       This can be done with the command
      sarima.for(lnights,12,1,1,1,1,1,1,12)
      The first argument 12 means we forecast 12 months into the future.
       (a) Recall for this data set that we only analyzed the first 100 observations. We now
           have a forecast for these observations, and we may compare them to the actual
           observations. The following command will include the observations to our plot
            all_nights<-motel$V1
            lines(101:112, log(all_nights[101:112]), type="b")
            Is it surprising for this data that some of the observations lie outside the confidence
            intervals? Do you think we were justified in truncating the data at 100? (It turns
            out that the bicentenary of Australia took place in 1988, after time point 100.
            Take home message: when data exhibit sudden changes, it does not make sense
            to forecast the values of post-change using the pre-change model, because the two
            segments of data are governed by different model dynamics).
       (b) To better understand the goodness of our forecasting, we can redo the model-
           ing based on the first 88 data points and use the resulting model to predict 12
           observations from 89 to 100:
            lnights3<-lnights[1:88]
            sarima.for(lnights3,12,1,1,1,1,1,1,12)
            lines(89:100, lnights[89:100], type="b")
2     Assignment
    1. (no R needed) Consider the following stationary seasonal model
                                       xt = Φxt−4 + wt − θwt−1 .
      Derive the autocorrelation, ρ(h) for h = 0, · · · , 3.
    2. The data set (labour.dat) that we are going to analyze is the number of persons in the
       civilian labor force in Australia each month from Feb 1978 - Aug 1995. You only need
       to fit a model for the first 12 years (the first 144 observations). (There was a rather
                                                 3
intense recession in Australia in 1990-1991. ) On top of fitting an appropriate SARIMA
model, you need to use your model to forecast 12 months into the future. (i.e. forecast
into the times 145, 146, . . ., 156.) Comment on whether the true observations lie in
the prediction intervals. If observation(s) do not lie in the intervals, give a plausible
explanation. Make sure to outline the steps you used in analyzing the data. If there
are two (or more) competing models, make sure you discuss why you chose your model
in favor of the others.