MINI PROJECT BASED ON TIME SERIES FORECASTING METHODS
By Vipul Malpani
Data used:- Australian monthly gas production
Contents :-
1) Objective----------------------------------------------------------------------------------------------3
2) Setting working directory and loading working data. ---------------------------------------3
2.1) TS Plot of working data------------------------------------------------------------------------4
2.2) Extracting Data for building model----------------------------------------------------------4
2.3) Decomposing the data--------------------------------------------------------------------------5
2.4) De-seasonalizing data--------------------------------------------------------------------------6
3) Check for stationarity-------------------------------------------------------------------------------6
3.1) ACF and PACF plots------------------------------------------------------------------------------6
3.2) Differencing the time series data-------------------------------------------------------------7
3.3) ACF and PACF for difference time series----------------------------------------------------8
4) Splitting into training and test sets---------------------------------------------------------------9
5) Arima Model-----------------------------------------------------------------------------------------10
5.1) Fitting with Auto ARIMA----------------------------------------------------------------------11
6)Forecasting & Accuracy-----------------------------------------------------------------------------13
6.1) Forecasting with the ARIMA model---------------------------------------------------------13
6.2) Accuracy of the forecast-----------------------------------------------------------------------14
1)Objective:-
The main objective is to analyze past data of Australia gas production to build a
model for forecasting next 12 months gas production.
2) Setting working directory and loading working data.
setwd(“C:/Users/Vipul/Desktop/GL/Project/project 6”)
library(forecast)
library(tseries)
#IMPORTING DATA
Working_data=forecast::gas
str(Working_data)
## Time-Series [1:476] from 1956 to 1996: 1709 1646 1794 1878 2173 ...
#Working_data
2.1) TS Plot of working data.
plot.ts(Working_data)
#FROM THE ABOVE PLOT WE CAN’T SAY THAT GAS PRODUCTION FROM 1956 TO 1970 IS
CONSTANT BUT FROM 1970 WE CAN SEE BOTH TREND AND SEASONILTY IN DATA.
2.2) Extracting Data for building model.
Gasts=ts(Working_data, start = c(1970,1), end = c(1995,8),frequency = 12)
str(Gasts)
## Time-Series [1:308] from 1970 to 1996: 1709 1646 1794 1878 2173 ...
plot(Gasts)
2.3) Decomposing the data
Gas_decompose = stl(Gasts, s.window = "periodic")
plot(Gas_decompose)
2.4) De-seasonalizing data
#Since there is seasonality in data so we will de-seasonalize data
deseasonal_gas=seasadj(Gas_decompose)
3) Check for stationarity
adf.test(Gasts)
##
## Augmented Dickey-Fuller Test
##
## data: Gasts
## Dickey-Fuller = 0.73972, Lag order = 6, p-value = 0.99
## alternative hypothesis: stationary
#From above test it is clear that our p value is greater than 5% so we can’t reject our null
hypothesis that means our time series not stationary.
3.1) ACF and PACF plots
acf(Gasts,lag=50)
pacf(Gasts,lag=50)
#from above ACF Plot we can see significant auto correlation till lag 50 , so we can say that
there is influce of long past data on the current or recent data.
#PACF plot shows that there could be monthly seasonality
3.2) Differencing the time series data
Gas_data_diff = diff(Gasts, differences = 1)
plot(Gas_data_diff)
adf.test(Gas_data_diff, alternative = "stationary")
## Warning in adf.test(Gas_data_diff, alternative = "stationary"): p-value
## smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: Gas_data_diff
## Dickey-Fuller = -15.575, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
3.3)acf and pacf for dif time series
Acf(Gas_data_diff, main='ACF for Differenced Series')
Pacf(Gas_data_diff, main='PACF for Differenced Series')
4) Splitting into training and test sets
Gas_train = window(Gasts, start=c(1985,1),end=c(1993,12))
Gas_Test= window(Gasts, start=c(1994,1))
5) Arima Model
Gas.Arima=arima(Gas_train,order=c(0,1,1))
Gas.Arima
##
## Call:
## arima(x = Gas_train, order = c(0, 1, 1))
##
## Coefficients:
## ma1
## 0.3797
## s.e. 0.0806
##
## sigma^2 estimated as 1939217: log likelihood = -926.47, aic = 1856.93
tsdisplay(residuals(Gas.Arima),lag.max = 20,main = "Model_residuals")
gas.arima.fit=fitted(Gas.Arima)
ts.plot(Gas_train,gas.arima.fit,col=c("red","blue"))
#Since our fitted value is very close to actual value so we can say that our Model is correct
and working well
5.1)Fitting with Auto ARIMA
auto_arima<-auto.arima(Gas_train, seasonal = TRUE)
auto_arima
## Series: Gas_train
## ARIMA(1,0,0)(2,1,0)[12] with drift
##
## Coefficients:
## ar1 sar1 sar2 drift
## 0.6764 -0.4588 -0.3065 165.2297
## s.e. 0.0759 0.1121 0.1118 12.0501
##
## sigma^2 estimated as 604632: log likelihood=-775.44
## AIC=1560.89 AICc=1561.55 BIC=1573.71
tsdisplay(residuals(auto_arima), lag.max=45, main='Auto ARIMA Model
Residuals')
gas.fit=fitted(auto_arima)
ts.plot(Gas_train,gas.fit,col=c("red","blue"))
Auto ARIMA also fits the same p and q parameters for the model, but has a slightly
lower AIC.
#Ljung box test
H0: Residuals are independent
Ha: Residuals are not independent
library(stats)
Box.test(Gas.Arima$residuals)
##
## Box-Pierce test
##
## data: Gas.Arima$residuals
## X-squared = 0.61408, df = 1, p-value = 0.4333
Since p value is greater than 0.5 so, Residuals are independent
6)Forecasting & Accuracy
6.1)Forecasting with the ARIMA model
fcast.arima <- forecast(Gas.Arima,h=20)
fcast.auto_arima<- forecast(auto_arima, h=20)
plot(fcast.arima)
plot(fcast.auto_arima)
6.2)Accuracy of the forecast
f.arima=forecast(Gas.Arima)
accuracy(f.arima, Gas_Test)
## ME RMSE MAE MPE MAPE MASE
## Training set 106.4061 1386.096 1092.937 0.6521214 6.686101 0.5294037
## Test set 9704.4827 11573.729 9739.039 28.0056123 28.166844 4.7174555
## ACF1 Theil's U
## Training set 0.07540524 NA
## Test set 0.75049455 3.326208
f.auto_arima=forecast(auto_arima)
accuracy(f.auto_arima,Gas_Test)
## ME RMSE MAE MPE MAPE
## Training set 28.00523 717.6744 528.3848 0.09295282 3.229727
## Test set 4350.84815 5196.2061 4350.8482 12.74873170 12.748732
## MASE ACF1 Theil's U
## Training set 0.2559422 -0.01981323 NA
## Test set 2.1074904 0.70679799 1.546872
#From Above it is clear that auto arima is working well because its mape value is very close
to 3.
#Now we will fore cast for next 12 months using auto arima model.
fcast.auto_arima1<- forecast( auto_arima, h=12)
plot(fcast.auto_arima1)
7) Model Conclusion and Accuracy.
From different model we build we can conclude that ARIMA model with parameters
(1,0,0) is the best model and provides good accuracy .
So we can use Arima model for forecasting.