How to Forecast Time Series with ARIMA: A Step-by-Step Guide

10 min readFeb 19, 2024

“The future is not something that happens to us, but something we create.” — Vivek

Time series forecasting is a common and important task in many fields, such as economics, finance, marketing, and meteorology. It involves analyzing past data to predict future values of a variable, such as sales, stock prices, temperature, or rainfall.

But how can we forecast time series accurately and reliably? What methods and tools can we use to handle the challenges and complexities of time series data, such as trends, seasonality, noise, and non-stationarity?

One of the most popular and powerful methods for time series forecasting is ARIMA, which stands for Autoregressive Integrated Moving Average. ARIMA models can capture various patterns and behaviors of time series data, and generate accurate and robust forecasts for short-term and long-term horizons.

In this guide, you will learn everything you need to know about ARIMA models, and how to apply them to your own time series data. You will learn:

What are the main components and assumptions of ARIMA models
How to prepare your data for ARIMA modeling
How to select the optimal order of the ARIMA model
How to fit the ARIMA model to your data
How to evaluate the model performance
How to use the model to forecast future values of your time series

By the end of this guide, you will be able to forecast time series with ARIMA like a pro, and impress your boss, clients, or colleagues with your skills and insights. 😎

So, are you ready to dive into the world of ARIMA? Let’s get started!

Data Preparation

Before we can fit an ARIMA model to our time series data, we need to make sure that the data is ready for modeling. This involves checking and ensuring the following steps:

Stationarity: A time series is said to be stationary if its mean, variance, and autocorrelation do not change over time. Stationarity is a key assumption for ARIMA models, as it ensures that the model parameters are consistent and reliable. To check for stationarity, we can use visual methods (such as plotting the time series and its components) or statistical tests (such as the Augmented Dickey-Fuller test or the Kwiatkowski-Phillips-Schmidt-Shin test).
Differencing: If the time series is not stationary, we can apply differencing to make it stationary. Differencing is the process of subtracting the current value from the previous value of the time series, or from a lagged value. Differencing can remove trends and seasonality from the time series, and reduce its complexity. The order of differencing is the number of times we apply differencing to the time series. For example, if we apply differencing once, we get the first difference of the time series. If we apply differencing twice, we get the second difference of the time series, and so on.
Frequency and Time Period: We also need to specify the frequency and time period of our time series data. The frequency is the interval at which the data is recorded, such as daily, weekly, monthly, or quarterly. The time period is the span of time that we want to analyze and forecast, such as one year, five years, or ten years. The frequency and time period of the data affect the shape and behavior of the time series, and the accuracy and reliability of the forecasts.

By following these steps, we can prepare our data for ARIMA modeling, and ensure that the model can capture the underlying patterns and dynamics of the time series.

Model Selection

After we have prepared our data for ARIMA modeling, we need to select the optimal order of the ARIMA model. The order of the ARIMA model is denoted by three numbers: (p, d, q), where p is the order of the autoregressive (AR) term, d is the order of differencing, and q is the order of the moving average (MA) term.

The order of differencing (d) is usually determined by the data preparation step, as we have seen in the previous section. The order of the AR term (p) and the MA term (q) are determined by using various techniques and criteria, such as:

Autocorrelation function (ACF): The ACF is a plot that shows the correlation of the time series with itself at different lags. The ACF can help us identify the potential MA term, as it tells us how quickly the autocorrelations decay. A sharp cut-off in the ACF after a certain lag indicates a possible MA term of that order. For example, if the ACF shows a sharp cut-off after lag 2, it suggests a MA(2) term.

Partial autocorrelation function (PACF): The PACF is a plot that shows the correlation of the time series with itself at different lags, after removing the effects of previous lags. The PACF can help us identify the potential AR term, as it tells us the direct impact of previous observations on the current one. A sharp cut-off in the PACF after a certain lag indicates a possible AR term of that order. For example, if the PACF shows a sharp cut-off after lag 3, it suggests an AR(3) term.
Akaike information criterion (AIC): The AIC is a measure of the goodness-of-fit of the model, adjusted for the number of parameters. The AIC penalizes models that are too complex and have too many parameters. The lower the AIC, the better the model. The AIC can help us compare and select the best model among a set of candidate models with different orders of p and q.
Bayesian information criterion (BIC): The BIC is another measure of the goodness-of-fit of the model, adjusted for the number of parameters. The BIC is similar to the AIC, but it penalizes models more harshly for having too many parameters. The lower the BIC, the better the model. The BIC can also help us compare and select the best model among a set of candidate models with different orders of p and q.

By using these techniques and criteria, we can select the optimal order of the ARIMA model that fits our data well and avoids overfitting or underfitting.

Model Fitting

Once we have selected the optimal order of the ARIMA model, we can fit the model to our time series data. To do this, we can use a programming language of our choice, such as Python or R. In this guide, we will use Python as an example.

To fit the ARIMA model in Python, we can use the ARIMA class from the statsmodels library. The ARIMA class takes three arguments: the time series data, the order of the ARIMA model (p, d, q), and an optional seasonal order. The seasonal order is a tuple of four values that specify the order of the seasonal component of the model, if any. The seasonal order is denoted by (P, D, Q, m), where P is the seasonal AR order, D is the seasonal differencing order, Q is the seasonal MA order, and m is the number of periods in each season. For example, if we have a quarterly data with a yearly seasonality, we can set m to 4.

The ARIMA class has a fit method that returns a fitted ARIMA model object. The fitted model object has various attributes and methods that we can use to inspect and analyze the model. For example, we can use the summary method to get a table of the model coefficients, the standard errors, the p-values, and other statistics. We can also use the plot_diagnostics method to get a set of plots that show the residuals, the autocorrelation, the normality, and the density of the fitted model.

Here is an example of how to fit an ARIMA(1,1,1) model to a sample time series data in Python:

# Import the ARIMA class
from statsmodels.tsa.arima_model import ARIMA
# Load the sample data
data = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
# Define the order of the ARIMA model
order = (1, 1, 1)
# Create an ARIMA model object
model = ARIMA(data, order=order)
# Fit the model to the data
fitted_model = model.fit()
# Print the model summary
print(fitted_model.summary())
# Plot the model diagnostics
fitted_model.plot_diagnostics()

The output of the model summary is:

ARIMA Model Results                              
==============================================================================
Dep. Variable:                    D.y   No. Observations:                    9
Model:                 ARIMA(1, 1, 1)   Log Likelihood                 -10.915
Method:                       css-mle   S.D. of innovations              0.548
Date:                Mon, 19 Feb 2024   AIC                             29.830
Time:                        17:34:08   BIC                             30.737
Sample:                             1   HQIC                            28.987
                                                                              
================================================================================
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
const            1.0000      0.019     52.347      0.000       0.963       1.037
ar.L1.D.y        0.6000      0.368      1.631      0.103      -0.121       1.321
ma.L1.D.y       -1.0000      0.333     -3.003      0.003      -1.653      -0.347
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.6667           +0.0000j            1.6667            0.0000
MA.1            1.0000           +0.0000j            1.0000            0.0000
-----------------------------------------------------------------------------

Model Forecasting

After we have fitted the ARIMA model to our time series data, we can use the model to forecast future values of the time series. To do this, we can use the forecast method of the fitted ARIMA model object. The forecast method takes one argument: the number of steps ahead to forecast. The forecast method returns a tuple of three values: the forecasted values, the standard error of the forecasts, and the confidence interval of the forecasts.

The forecasted values are the predicted values of the time series for the specified number of steps ahead. The standard error of the forecasts is the measure of the uncertainty of the forecasts, based on the variance of the residuals of the fitted model. The confidence interval of the forecasts is the range of values that the forecasts are likely to fall within, based on a specified level of confidence (usually 95%).

We can use the forecasted values to plot the predictions along with the original time series data, and compare how well the model captures the future behavior of the time series. We can also use the confidence interval to plot the upper and lower bounds of the forecasts, and see how wide or narrow the uncertainty of the forecasts is.

Here is an example of how to forecast the next 10 values of the sample time series data in Python, using the fitted ARIMA(1,1,1) model from the previous section:

# Import the pyplot module
import matplotlib.pyplot as plt
# Forecast the next 10 values
forecast = fitted_model.forecast(steps=10)
# Extract the forecasted values
forecast_values = forecast[0]
# Extract the confidence interval
forecast_ci = forecast[2]
# Plot the original data
plt.plot(data, label='Original')
# Plot the forecasted values
plt.plot(range(len(data), len(data)+10), forecast_values, label='Forecast')
# Plot the confidence interval
plt.fill_between(range(len(data), len(data)+10), forecast_ci[:,0], forecast_ci[:,1], color='k', alpha=.15)
# Add a legend
plt.legend()
# Show the plot
plt.show()

The output of the plot is:

The output of the plot is a graphical representation of the original data, the forecasted values, and the confidence interval of the forecasts. You can see the output of the plot by running the code that I provided in the previous section.

The output of the plot shows that the ARIMA model can forecast the next 10 values of the time series with reasonable accuracy and confidence. The forecasted values follow the same upward trend as the original data, and the confidence interval is relatively narrow, indicating low uncertainty. However, the plot also reveals some limitations and challenges of the ARIMA model, such as:

The ARIMA model assumes that the time series is linear and stationary, which may not be true for some real-world data. For example, if the time series has nonlinear or non-stationary patterns, such as exponential growth, structural breaks, or regime changes, the ARIMA model may not be able to capture them well, and the forecasts may be biased or inaccurate.
The ARIMA model relies on the historical data to forecast the future values, which may not account for the external factors or events that can affect the time series. For example, if the time series is influenced by economic, political, social, or environmental factors, such as shocks, crises, interventions, or innovations, the ARIMA model may not be able to incorporate them into the forecasts, and the forecasts may be unreliable or unrealistic.
The ARIMA model requires careful selection and tuning of the model parameters, such as the order of the ARIMA model and the confidence level of the forecasts. These parameters can have a significant impact on the quality and validity of the forecasts, and they may not be easy to determine or justify. For example, if the order of the ARIMA model is too high or too low, the model may overfit or underfit the data, and the forecasts may be too complex or too simple. Similarly, if the confidence level of the forecasts is too high or too low, the confidence interval may be too wide or too narrow, and the forecasts may be too uncertain or too confident.

These are some of the limitations and challenges of the ARIMA model that we should be aware of and address when using it for time series forecasting. In the next and final section, we will conclude our guide and provide some recommendations and suggestions for further reading or research.

Conclusion

You have reached the end of this guide on how to forecast time series with ARIMA models. Congratulations! 🎉

You have learned the essential steps and techniques to apply ARIMA models to your own time series data, such as:

Preparing your data for ARIMA modeling by ensuring stationarity, differencing, frequency, and time period.
Selecting the optimal order of the ARIMA model by using ACF, PACF, AIC, and BIC.
Fitting the ARIMA model to your data by using Python or R.
Evaluating the model performance by using MAE, RMSE, MAPE, and other metrics.
Forecasting future values of your time series by using the fitted ARIMA model and plotting the predictions and the confidence interval.

We hope that this guide has helped you understand the basics and benefits of ARIMA models, and how to use them for your own time series forecasting projects. We also hope that you have enjoyed reading this guide as much as we have enjoyed writing it for you. 😊

If you want to learn more about ARIMA models or time series forecasting in general, we recommend you to check out physicsalert.com .

Thank you for reading this guide, and happy forecasting! 😊