1.
Task 1
Explanation (What, How and Why) and example of:
b) ARIMA (Autoregressive integrated moving average)
What is ARIMA?
ARIMA is a popular time series forecasting method that combines autoregression,
differencing, and moving averages. It is widely used for predicting future points in a time
series by analyzing past data points.
While:
Time series forecasting: Is a method used to predict future values based on past
observations in a time-ordered sequence. This type of forecasting is applicable to data points
that are collected or recorded over a period of time.
For example: Stock prices, temperature readings, sales figures, or any other time-stamped
data.
The name ARIMA is an acronym that stands for Autoregressive Integrated Moving
Average.
With:
o Autoregressive (AR):
- What: Autoregressive refers to the model's dependence on its own past values.
- How: The AR component involves predicting a future value in the time series based on its own
past values.
- Why: AR captures the idea that the next value in the series is a linear function of its previous
values.
o Integrated (I):
- What: Integrated indicates the differencing of raw observations to make the time series stationary.
This also involves differencing to make the time series stationary.
+ This helps remove trends and ensures that statistical properties remain constant over time. When
a time series is not stationary, it may be challenging to determine models and perform forecasting due
to the presence of unpredictable trends or variations. Therefore, transforming a series into a stationary
form through differencing is a critical step in time series analysis.
- How: Differencing involves subtracting the current observation from the previous one. We can
also use The Dickey-Fuller test.
+ This is a statistical test used to determine whether a time series is stationary or not. The test
involves assessing the presence of a unit root in the autoregressive model. A unit root implies that the
time series is non-stationary.
And this is its formula:
With:
While the null hypothesis (H0) of the ADF test is that the time series has a unit root and is non-
stationary, and the alternative hypothesis (H1) is that the time series is stationary.
The null hypothesis H0 is assumed that if γ=0 (i.e., the presence of a unit root), and the alternative
hypothesis H1 is assumed if γ<0 (i.e., the absence of a unit root and stationarity). The test statistic is
then compared to critical values to decide whether to reject the null hypothesis. If the test statistic is
more negative than the critical values, you can reject the null hypothesis and conclude that the time
series is stationary.
- Why: Stationarity is crucial for time series analysis, and differencing helps stabilize the mean and
variance over time.
o Moving Average (MA):
- What: Moving Average involves modeling the error term as a linear combination of previous
error terms.
- How: The MA component is used to smooth out short-term fluctuations and highlight longer-
term trends or cycles.
- Why: MA helps in capturing the effects of random shocks in the time series.
How to find ARIMA?
The general formula for an ARIMA model is as follows:
Where:
These are the steps you need to follow to be able to determine ARIMA:
o Data Collection:
Gather historical time series data. Ensure the data is in a time-ordered sequence.
o Data Exploration and Preprocessing:
Explore the data to understand its characteristics, trends, and patterns.
Check for missing values, outliers, and anomalies.
Ensure the data is stationary (constant mean and variance over time) or make it
stationary through differencing.
o Differencing for Stationarity:
If the data is not stationary, apply differencing (subtracting the current observation from
the previous one) until stationarity is achieved.
Use visual inspection and statistical tests (e.g., Augmented Dickey-Fuller test) to
confirm stationarity.
o Autocorrelation and Partial Autocorrelation Analysis:
Examine the Autocorrelation Function (ACF) and Partial Autocorrelation Function
(PACF) plots to identify potential values for the AR (autoregressive) and MA (moving average)
components.
The ACF plot helps determine the MA order (q), and the PACF plot helps determine the
AR order (p).
o Model Selection:
Decide on the appropriate ARIMA model order (p, d, q).
With:
p (AR order): The number of autoregressive terms.
d (Integration order): The number of differences needed to make the series stationery.
q (MA order): The number of moving average terms.
o Parameter Tuning:
Fine-tune the chosen model parameters based on performance evaluation metrics. This
could involve trying different combinations of (p, d, q) and selecting the one that minimizes the
metric.
o Model Fitting:
Fit the ARIMA model to the training data using the selected orders.
o Validation and Evaluation:
Reserve a portion of the data for validation.
Evaluate the model's performance using metrics such as Mean Absolute Error (MAE),
Mean Squared Error (MSE), or others, depending on the specific application.
o Forecasting:
Use the fitted ARIMA model to make predictions for future time points.
o Model Monitoring and Updating:
Regularly monitor the model's performance over time.
Update the model if the underlying patterns in the data change.
Why the ARIMA is used?
o Modeling Temporal Patterns:
ARIMA is effective for modeling time series data where the current value is dependent on its past
values. It captures temporal patterns and trends in the data.
o Flexibility:
ARIMA can handle a wide range of time series data, including economic indicators, stock prices,
sales figures, and more. It is a versatile model that can be applied to various domains.
o Incorporating Stationarity:
ARIMA includes an integration component (the "I" in ARIMA), which involves differencing the
time series data to achieve stationarity. Stationarity is often a prerequisite for applying time series
models, and ARIMA can handle non-stationary data through differencing.
o Prediction Accuracy:
ARIMA models, when properly tuned and fitted, can provide accurate predictions for future values
in a time series. This is especially valuable for decision-making, resource planning, and risk
management.
o Simple and Interpretable:
ARIMA models are relatively simple and easy to interpret. The parameters of the model have clear
meanings (autoregressive order, differencing order, moving average order), making it accessible for
users without advanced statistical knowledge.
o Foundation for More Complex Models:
ARIMA serves as a foundational model for more complex time series forecasting techniques. For
example, the seasonal version of ARIMA, known as SARIMA, extends ARIMA to handle seasonal
patterns in the data.
o Forecasting in Different Contexts:
ARIMA is applied in various contexts, including finance (stock price prediction), economics
(economic indicators), meteorology (weather forecasting), and many other fields where understanding
and predicting trends over time is essential.
o Model Diagnostics:
ARIMA provides diagnostic tools such as residuals analysis, ACF, and PACF plots, which can be
helpful in assessing the adequacy of the model and identifying areas for improvement.
o Availability in Software Packages:
ARIMA models are implemented in various statistical and data analysis software packages,
making it accessible and easy to use for practitioners.
o Time Efficiency:
ARIMA models are computationally efficient, making them suitable for analyzing and forecasting
time series data with a reasonable number of computational resources.
Example:
Suppose you have monthly sales data for a retail store. You want to predict future sales based on
past performance.
Or this time using a hypothetical dataset of monthly website traffic. We'll find an ARIMA model
for predicting future web traffic based on past performance.