0% found this document useful (0 votes)
123 views21 pages

Project 6 - Time Series PDF

- The document describes using an ARIMA model to forecast monthly gas production in Australia from 1956 to 1995. - The time series is found to be non-stationary and differencing is used to make it stationary. - Both manual and automatic ARIMA models are developed and their accuracy is evaluated on training and test data. The automatic ARIMA model is found to have better accuracy on the test set based on error metrics.

Uploaded by

Akshita Raut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views21 pages

Project 6 - Time Series PDF

- The document describes using an ARIMA model to forecast monthly gas production in Australia from 1956 to 1995. - The time series is found to be non-stationary and differencing is used to make it stationary. - Both manual and automatic ARIMA models are developed and their accuracy is evaluated on training and test data. The automatic ARIMA model is found to have better accuracy on the test set based on error metrics.

Uploaded by

Akshita Raut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Project 6 – Time Series

Forecasting
Akshita Raut – PG BABI
Project Overview

• The given dataset is for monthly gas production in Australia from 1956 to 1995.
• The Forecast package which contains the data package also contains methods and tools for displaying and analysing
univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA
modelling.

Project Objectives

• To read the data as time series object in R


• To explore components of Time Series present in the dataset
• To check if the time series is stationary
• To develop an ARIMA Model to forecast for a period of next 12 months
1. Overview of the dataset:
• The given data is a time series data (as required) with monthly frequency and hence, there is no need to convert it into any
format.
• From the plot 1, we can observe an upward trend in the gas production 1970 onwards.
• Start year = Jan 1995
• The season plot indicates increase in demand from May to highest in July and then again decline towards the end of the
year.

1. Examining the dataset:


• View(gas)
• summary(gas)
• head(gas)
• tsclean(gas)

• The above commands are used for


basic check and understand if any
outliers or imputing any missing values.
• To stabilize the data, we will use a
logarithm of the series.

1.1. Decomposition

2. Decomposition – Components
of Time Series
• Seasonal component –
shows the fluctuations in
data related to historical
data
• Trend component – the
overall pattern – increasing
or decreasing
• Cyclic component –
components that are not
seasonal.
1.1. Decomposition

2. Decomposition
• The plot shows us that
seasonality is constant.
• The trend is upwards
(increasing) from 1970 to 1990,
and after a short decline, the
trend continues upwards.
2. Components of Time Series:

• The yearly plot shows us that the production spiked up from 1970, took a small decline in 1990 and
is again trending upwards.
2. Components of Time Series:

• The month plot and boxplot of the dataset show us the variation that exists within months.
2. Components of Time Series:

• Let us apply log


transformation to stabilize
the variance.
3. Stationarity

• Augmented Dickey-Fuller Test


data: gas Dickey-Fuller = -2.7131, Lag order = 7, p-value = 0.2764
alternative hypothesis: stationary
• The ADF test is a formal test for
stationary.
• Visually, the given time series As the p-value is less than 0.5, we can conform the hypothesis, the
looks non-stationary. time series is non-stationary.
• The hypothesis to check if the
time series is stationary is as
follows:
• H0 (Null Hypothesis) – TS is not
stationary
• H1 (Alternate Hypothesis) –
Time Series is stationary
4. ARIMA Model

1. Autocorrelation
• The correlation is declining from
lag 1 to lag 5.
4. ARIMA Model

1. Autocorrelation
• The seasonal effect
can be seen in the ACF
plot.
• ACF plots are used to
understand the
correlation between a
series and its lags.
2. Differencing:

• Since the time series is


non-stationary, we can
use differencing to
make it stationary.

• Differencing normal
time series, shows
inconsistency, hence
we use log values of
time series.

• Differencing the time


series with a lag of 10,
can help remove trend
and seasonality both.
2. Differencing:

• The ADF test on differenced data does


not accept the null hypothesis of non-
stationary.

Augmented Dickey-Fuller Test

data: gas.diff
Dickey-Fuller = -18.14, Lag order = 7, p-value
= 0.01
alternative hypothesis: stationary
2. Differencing:

• Plotting the ACF & PACF for differenced values, gives us q = 0, p= 2 when d is considered to be 1.
3. ARIMA Model Selection: (Manual / Auto
ARIMA)

• Looking at ACF & PACF charts, we


can find out optimal p, q & d values
• In this case, we can select 5,6,8

AutoArima<-auto.arima(deseason, seasonal = FALSE)


> print(AutoArima)
Series: deseason
ARIMA(1,1,5) with drift

Coefficients:
ar1 ma1 ma2 ma3 ma4 ma5 drift
0.4747 -0.5575 0.1028 -0.2108 -0.0746 -0.1242
107.6904
s.e. 0.0922 0.0939 0.0624 0.0683 0.0650 0.0495
24.1201

sigma^2 estimated as 3966907: log likelihood=-4279.32


AIC=8574.64 AICc=8574.95 BIC=8607.95
The ACF & PACF plots
indicate repeated
residuals at lag 6, so
using a different
specification, p=6 or
q=6
4. Ljung box test
H0: Residuals are independent
Ha: Residuals are not independent

> Box.test(gasAR1$residuals)

Box-Pierce test

data: gasAR1$residuals
X-squared = 0.072517, df = 1, p-value = 0.7877

> Box.test(gasARfit$residuals)

Box-Pierce test

data: gasARfit$residuals
X-squared = 2.55, df = 1, p-value = 0.1103
5. Forecasting on Manual and Auto ARIMA Models for training data
• Forecasting on ARIMA Models
with seasonality
5. Accuracy Calculations
• > accuracy(acc, gasTest)
• ME RMSE MAE MPE MAPE MASE
• Training set 97.55989 3542.529 2660.401 -0.06479871 5.777713 0.809962
• Test set 4026.99119 6762.854 5870.357 6.79731137 11.089278 1.787236
• ACF1 Theil's U
• Training set -0.03173606 NA
• Test set 0.53466285 1.548875

• > a1<-forecast(gasARfit)
• > accuracy(a1, gasTest)
• ME RMSE MAE MPE MAPE MASE
• Training set -35.15758 2766.478 2171.046 -0.4554556 4.705954 0.6609773
• Test set 4699.81125 5586.601 5111.735 8.6922055 9.679309 1.5562732
• ACF1 Theil's U
• Training set 0.1881929 NA
• Test set 0.1598068 1.328607

You might also like