Project 6 – Time Series
Forecasting
Akshita Raut – PG BABI
Project Overview
• The given dataset is for monthly gas production in Australia from 1956 to 1995.
• The Forecast package which contains the data package also contains methods and tools for displaying and analysing
univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA
modelling.
Project Objectives
• To read the data as time series object in R
• To explore components of Time Series present in the dataset
• To check if the time series is stationary
• To develop an ARIMA Model to forecast for a period of next 12 months
1. Overview of the dataset:
• The given data is a time series data (as required) with monthly frequency and hence, there is no need to convert it into any
format.
• From the plot 1, we can observe an upward trend in the gas production 1970 onwards.
• Start year = Jan 1995
• The season plot indicates increase in demand from May to highest in July and then again decline towards the end of the
year.
1. Examining the dataset:
• View(gas)
• summary(gas)
• head(gas)
• tsclean(gas)
• The above commands are used for
basic check and understand if any
outliers or imputing any missing values.
• To stabilize the data, we will use a
logarithm of the series.
•
1.1. Decomposition
2. Decomposition – Components
of Time Series
• Seasonal component –
shows the fluctuations in
data related to historical
data
• Trend component – the
overall pattern – increasing
or decreasing
• Cyclic component –
components that are not
seasonal.
1.1. Decomposition
2. Decomposition
• The plot shows us that
seasonality is constant.
• The trend is upwards
(increasing) from 1970 to 1990,
and after a short decline, the
trend continues upwards.
2. Components of Time Series:
• The yearly plot shows us that the production spiked up from 1970, took a small decline in 1990 and
is again trending upwards.
2. Components of Time Series:
• The month plot and boxplot of the dataset show us the variation that exists within months.
2. Components of Time Series:
• Let us apply log
transformation to stabilize
the variance.
3. Stationarity
• Augmented Dickey-Fuller Test
data: gas Dickey-Fuller = -2.7131, Lag order = 7, p-value = 0.2764
alternative hypothesis: stationary
• The ADF test is a formal test for
stationary.
• Visually, the given time series As the p-value is less than 0.5, we can conform the hypothesis, the
looks non-stationary. time series is non-stationary.
• The hypothesis to check if the
time series is stationary is as
follows:
• H0 (Null Hypothesis) – TS is not
stationary
• H1 (Alternate Hypothesis) –
Time Series is stationary
4. ARIMA Model
1. Autocorrelation
• The correlation is declining from
lag 1 to lag 5.
4. ARIMA Model
1. Autocorrelation
• The seasonal effect
can be seen in the ACF
plot.
• ACF plots are used to
understand the
correlation between a
series and its lags.
2. Differencing:
• Since the time series is
non-stationary, we can
use differencing to
make it stationary.
• Differencing normal
time series, shows
inconsistency, hence
we use log values of
time series.
• Differencing the time
series with a lag of 10,
can help remove trend
and seasonality both.
2. Differencing:
• The ADF test on differenced data does
not accept the null hypothesis of non-
stationary.
Augmented Dickey-Fuller Test
data: gas.diff
Dickey-Fuller = -18.14, Lag order = 7, p-value
= 0.01
alternative hypothesis: stationary
2. Differencing:
• Plotting the ACF & PACF for differenced values, gives us q = 0, p= 2 when d is considered to be 1.
3. ARIMA Model Selection: (Manual / Auto
ARIMA)
• Looking at ACF & PACF charts, we
can find out optimal p, q & d values
• In this case, we can select 5,6,8
AutoArima<-auto.arima(deseason, seasonal = FALSE)
> print(AutoArima)
Series: deseason
ARIMA(1,1,5) with drift
Coefficients:
ar1 ma1 ma2 ma3 ma4 ma5 drift
0.4747 -0.5575 0.1028 -0.2108 -0.0746 -0.1242
107.6904
s.e. 0.0922 0.0939 0.0624 0.0683 0.0650 0.0495
24.1201
sigma^2 estimated as 3966907: log likelihood=-4279.32
AIC=8574.64 AICc=8574.95 BIC=8607.95
The ACF & PACF plots
indicate repeated
residuals at lag 6, so
using a different
specification, p=6 or
q=6
4. Ljung box test
H0: Residuals are independent
Ha: Residuals are not independent
> Box.test(gasAR1$residuals)
Box-Pierce test
data: gasAR1$residuals
X-squared = 0.072517, df = 1, p-value = 0.7877
> Box.test(gasARfit$residuals)
Box-Pierce test
data: gasARfit$residuals
X-squared = 2.55, df = 1, p-value = 0.1103
5. Forecasting on Manual and Auto ARIMA Models for training data
• Forecasting on ARIMA Models
with seasonality
5. Accuracy Calculations
• > accuracy(acc, gasTest)
• ME RMSE MAE MPE MAPE MASE
• Training set 97.55989 3542.529 2660.401 -0.06479871 5.777713 0.809962
• Test set 4026.99119 6762.854 5870.357 6.79731137 11.089278 1.787236
• ACF1 Theil's U
• Training set -0.03173606 NA
• Test set 0.53466285 1.548875
• > a1<-forecast(gasARfit)
• > accuracy(a1, gasTest)
• ME RMSE MAE MPE MAPE MASE
• Training set -35.15758 2766.478 2171.046 -0.4554556 4.705954 0.6609773
• Test set 4699.81125 5586.601 5111.735 8.6922055 9.679309 1.5562732
• ACF1 Theil's U
• Training set 0.1881929 NA
• Test set 0.1598068 1.328607