SHIVAJI UNIVERSITY KOLHAPUR
DEPARTMENT OF STATISTICS
M. Sc-II Sem.-IV Statistics
Modeling and analysis of univariate time series
Name: Vaishnavi Vishwas Jadhav Practical No:
PRN: 2023000320 Date
Q.1
Aim: To analyze the time series using ARIMA/SARIMA techniques and also perform residual
analysis for model adequacy checking of the model.
Solution:
Time series plot for the unemployed data taken over several months from July 1975 to September
1979 is as follows-
Time Series Plot of UNEMPLYD
120000
100000
80000
UNEMPLYD
60000
40000
20000
0
1 5 10 15 20 25 30 35 40 45 50
Index
The unemployment data exhibits a strong seasonal pattern with regular peaks, but no clear
long-term increasing or decreasing trend. The cyclical behaviour suggests a predictable pattern
influenced by external factors, with no major outliers disturbing the trend.
ACF -
Autocorrelation Function for UNEMPLYD
(with 5% significance limits for the autocorrelations)
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9 10 11 12 13
Lag
The ACF plot shows significant autocorrelations at multiple lags. A strong positive
autocorrelation at lag 1 suggests that unemployment in one period is highly dependent on the
previous period. Also, positive autocorrelation at lag 12 suggest a repeating cycle every 12
periods of time and in this case it is yearly.
PACF -
Partial Autocorrelation Function for UNEMPLYD
(with 5% significance limits for the partial autocorrelations)
1.0
0.8
0.6
Partial Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9 10 11 12 13
Lag
The PACF plot shows a strong positive spike at lag 1 and a moderate negative spike at lag 2. A
positive lag 1 spike suggests that the current value is positively correlated with the previous
value. A negative lag 2 spike suggests a possible oscillatory pattern, meaning that the value at
time t is positively related to t−1 but negatively related to t−2.
By observing the time series plot and ACF, it is clearly seen that the data has seasonality (12
months cycle). So, it is suitable to go to SARIMA model rather than ARIMA.
Parameters for SARIMA model:
The SARIMA model has total 6 parameters as (p, d, q) × (P, D, Q)𝑠 .
p: The PACF cuts off after lag 2, indicating an autoregressive (AR) component of order 2.
d: The original time series likely showed a trend, and first differencing was applied to achieve
stationarity.
q: The ACF does not exhibit a sharp cutoff, suggesting a weak MA component, so 1st trying for
q = 0.
P: No significant spike at lag 12 in the PACF suggests a seasonal AR(0) component.
D: The ACF shows a strong seasonal pattern, indicating the need for first seasonal differencing.
Q: The seasonal component in ACF suggests a possible MA effect, but it's weak, so trying Q =
0 first is reasonable.
s: The seasonal period is 12 months, matching the observed yearly seasonality in the ACF.
Model I: SARIMA (𝟐, 𝟏, 𝟎) × (𝟎, 𝟏, 𝟎)𝟏𝟐
ACF of Residuals for UNEMPLYD
(with 5% significance limits for the autocorrelations)
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9
Lag
PACF of Residuals for UNEMPLYD
(with 5% significance limits for the partial autocorrelations)
1.0
0.8
0.6
Partial Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9
Lag
Final Estimates of Parameters
Type Coef SE Coef T P
AR 1 -0.2151 0.1565 -1.37 0.178
AR 2 -0.3729 0.1564 -2.38 0.023
Constant 855.8 926.2 0.92 0.362
AR(1) is not statistically significant at the 5% level, suggesting weak evidence for an AR(1)
term. AR(2) is statistically significant at the 5% level, meaning the AR(2) component has a
meaningful contribution.
Differencing: 1 regular, 1 seasonal of order 12
Number of observations: Original series 51, after differencing 38
Residuals: SS = 1140814193 (back forecasts excluded)
MS = 32594691 DF = 35
The model applies first-order differencing to remove trend and seasonal differencing of order
12 to address seasonality.
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag 12 24 36 48
Chi-Square 17.5 32.3 36.3 *
DF 9 21 33 *
P-Value 0.042 0.054 0.319 *
The Ljung-Box test suggests the presence of mild residual autocorrelation, meaning the model
may require additional refinements.
Conclusion:
The SARIMA (2,1,0) × (0,1,0)12 model captures some seasonal and trend components, but
residual analysis indicates mild autocorrelation at lag 12. The AR(2) term is significant, while
AR(1) and the constant are not. The Ljung-Box test suggests the model may need refinement.
Further improvements can be tested using alternative SARIMA parameter specifications.
Model II: SARIMA (𝟐, 𝟏, 𝟎) × (𝟎, 𝟏, 𝟏)𝟏𝟐
ACF of Residuals for UNEMPLYD
(with 5% significance limits for the autocorrelations)
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9
Lag
PACF of Residuals for UNEMPLYD
(with 5% significance limits for the partial autocorrelations)
1.0
0.8
0.6
Partial Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9
Lag
Final Estimates of Parameters
Type Coef SE Coef T P
AR 1 -0.0932 0.1635 -0.57 0.572
AR 2 -0.2529 0.1639 -1.54 0.132
SMA 12 0.7342 0.2398 3.06 0.004
Constant 613.9 242.4 2.53 0.016
Differencing: 1 regular, 1 seasonal of order 12
Number of observations: Original series 51, after differencing 38
Residuals: SS = 630089924 (back forecasts excluded)
MS = 18532057 DF = 34
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag 12 24 36 48
Chi-Square 12.5 30.7 34.3 *
DF 8 20 32 *
P-Value 0.130 0.059 0.360 *
The SARIMA (2,1,0) × (0,1,1)12 model shows an improvement over the previous model. The
seasonal MA(12) term is significant, suggesting the presence of seasonal effects. However, the
AR terms are not significant, indicating that the autoregressive component may not be
necessary. The model's residual variance is lower, and the Ljung-Box test suggests no strong
autocorrelation issues, implying a better fit. This model is a reasonable choice, but further
refinements could be explored.
Model III: SARIMA (𝟐, 𝟏, 𝟎) × (𝟏, 𝟏, 𝟎)𝟏𝟐
ACF of Residuals for UNEMPLYD
(with 5% significance limits for the autocorrelations)
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9
Lag
PACF of Residuals for UNEMPLYD
(with 5% significance limits for the partial autocorrelations)
1.0
0.8
0.6
Partial Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9
Lag
Final Estimates of Parameters
Type Coef SE Coef T P
AR 1 0.2123 0.1722 1.23 0.226
AR 2 0.3117 0.1653 1.89 0.068
SAR 12 -0.9970 0.0738 -13.51 0.000
Constant 405.8 465.8 0.87 0.390
Differencing: 1 regular, 1 seasonal of order 12
Number of observations: Original series 51, after differencing 38
Residuals: SS = 280222757 (back forecasts excluded)
MS = 8241846 DF = 34
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag 12 24 36 48
Chi-Square 7.9 12.4 12.4 *
DF 8 20 32 *
P-Value 0.443 0.902 0.999 *
The SARIMA (2,1,0) × (1,1,0)12 model shows a significant improvement over the previous
models. The seasonal AR(12) term is highly significant, indicating strong seasonal
dependencies in the data. The AR(2) term is close to significance, suggesting a possible short-
term autoregressive effect, while AR(1) is not significant. The model has the lowest residual
variance among the tested models, implying a better fit. The Ljung-Box test results indicate no
strong autocorrelation in residuals, confirming the model's adequacy. This model is the best so
far, but minor refinements could be explored for further optimization.
Model IV: SARIMA (𝟏, 𝟏, 𝟎) × (𝟏, 𝟏, 𝟎)𝟏𝟐
ACF of Residuals for UNEMPLYD
(with 5% significance limits for the autocorrelations)
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9
Lag
PACF of Residuals for UNEMPLYD
(with 5% significance limits for the partial autocorrelations)
1.0
0.8
0.6
Partial Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9
Lag
Final Estimates of Parameters
Type Coef SE Coef T P
AR 1 0.2847 0.1708 1.67 0.104
SAR 12 -0.9959 0.0841 -11.84 0.000
Constant 449.8 480.7 0.94 0.356
Differencing: 1 regular, 1 seasonal of order 12
Number of observations: Original series 51, after differencing 38
Residuals: SS = 306891510 (back forecasts excluded)
MS = 8768329 DF = 35
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag 12 24 36 48
Chi-Square 11.0 13.6 13.6 *
DF 9 21 33 *
P-Value 0.279 0.886 0.999 *
The SARIMA (1,1,0) × (1,1,0)12 model performs well, capturing the seasonal structure
effectively. The seasonal AR(12) term is highly significant, confirming strong seasonal
dependencies. However, the AR(1) term is not statistically significant, suggesting that the
short-term autoregressive component may not be essential. The model's residual variance is
slightly higher than SARIMA(2,1,0)(1,1,0)[12], indicating a marginally worse fit. The Ljung-
Box test results suggest no significant autocorrelation issues in the residuals, supporting the
model's adequacy.
Conclusion:
In order to decide the best model, we check the AIC and BIC criteria for these models as
follows:
R Code-
It gives following output-
Model AIC BIC
SARIMA (𝟐, 𝟏, 𝟎) × (𝟎, 𝟏, 𝟎)𝟏𝟐 769.5356 774.4483
SARIMA (𝟐, 𝟏, 𝟎) × (𝟎, 𝟏, 𝟏)𝟏𝟐 758.4213 764.9717
SARIMA (𝟐, 𝟏, 𝟎) × (𝟏, 𝟏, 𝟎)𝟏𝟐 747.5776 754.1279
SARIMA (𝟏, 𝟏, 𝟎) × (𝟏, 𝟏, 𝟎)𝟏𝟐 745.9788 750.8916
The 𝑆𝐴𝑅𝐼𝑀𝐴 (1,1,0) × (1,1,0)12 is the best model based on AIC and BIC. Adding an extra
AR term in 𝑆𝐴𝑅𝐼𝑀𝐴 (2,1,0) × (1,1,0)12 does not improve performance significantly. Hence,
the 𝑆𝐴𝑅𝐼𝑀𝐴 (1,1,0) × (1,1,0)12 is the best fit model to the given data of monthly numbers of
unemployed workers in the building trade in Germany from July 1975 to September 1979.