ARIMA vs LSTM: A Comparative Study of Time Series Prediction Models

VIVEK KUMAR UPADHYAY
19 min readFeb 19, 2024

--

“The future is not something that happens to us, but something we create.” — Vivek

Time series prediction is the task of forecasting future values of a sequence of data based on past observations. It has many applications in various domains, such as finance, economics, weather, health, and more. Time series prediction can help us make better decisions, optimize processes, and plan ahead.

However, time series prediction is not an easy task. It involves dealing with complex and dynamic patterns, such as trends, seasonality, cycles, noise, and outliers. Moreover, different time series may have different characteristics and require different models and methods.

In this content, we will compare two popular and powerful models for time series prediction: ARIMA and LSTM. ARIMA stands for AutoRegressive Integrated Moving Average, and it is a traditional statistical model that captures the linear dependencies and correlations among the data points. LSTM stands for Long Short-Term Memory, and it is a modern neural network model that can learn the nonlinear and long-term dependencies and patterns in the data.

We will explore the advantages and disadvantages of each model, and how they perform on different types of time series. We will also show you how to implement and evaluate these models using Python and TensorFlow. By the end of this content, you will have a better understanding of the strengths and weaknesses of ARIMA and LSTM, and how to choose the best model for your time series prediction problem.

Background

Before we dive into the comparison of ARIMA and LSTM, let us first review some basic concepts and definitions related to time series and prediction models.

What is a time series?

A time series is a sequence of data points that are ordered in time. Each data point represents a measurement or observation of a variable at a specific time. For example, the daily closing price of a stock, the monthly sales of a product, or the hourly temperature of a city are all examples of time series.

A time series can be univariate or multivariate. A univariate time series has only one variable, while a multivariate time series has more than one variable. For example, the daily closing price of a stock is a univariate time series, while the daily closing price and volume of a stock are a multivariate time series.

A time series can also have different frequencies or granularities. The frequency or granularity of a time series refers to the time interval between consecutive data points. For example, a time series can have daily, weekly, monthly, quarterly, or yearly frequency. The frequency or granularity of a time series affects the patterns and behaviors of the data, and may require different models and methods.

What are the components of a time series?

A time series can be decomposed into four main components: trend, seasonality, cycle, and noise. These components describe the different patterns and behaviors of the data over time.

  • Trend is the long-term direction or movement of the data. It can be upward, downward, or constant. For example, the annual global temperature has an upward trend, indicating that the temperature is increasing over time.
  • Seasonality is the periodic or recurring pattern of the data that repeats over a fixed time span. It can be daily, weekly, monthly, quarterly, or yearly. For example, the monthly sales of a product may have a seasonal pattern, indicating that the sales are higher or lower in certain months of the year.
  • Cycle is the irregular or non-periodic fluctuation of the data that lasts for more than one season. It can be caused by external factors, such as business cycles, economic cycles, or political cycles. For example, the quarterly GDP of a country may have a cyclical pattern, indicating that the GDP rises and falls over several years.
  • Noise is the random or unpredictable variation of the data that does not follow any pattern or structure. It can be caused by measurement errors, outliers, or other unknown factors. For example, the daily stock price of a company may have a noise component, indicating that the price is affected by random events or news.

What are the challenges of time series prediction?

Time series prediction is the task of forecasting future values of a time series based on past observations. It is a challenging task for several reasons:

  • Time series are often non-stationary, meaning that their statistical properties, such as mean, variance, and autocorrelation, change over time. Non-stationary time series are difficult to model and require special techniques, such as differencing, transformation, or decomposition, to make them stationary.
  • Time series often have nonlinear and complex patterns, such as interactions, feedback loops, or chaos, that are hard to capture and explain by simple models. Nonlinear and complex time series require advanced models, such as neural networks, that can learn and approximate these patterns from the data.
  • Time series often have long-term dependencies, meaning that the current value of the time series depends on the past values that are far away in time. Long-term dependencies are challenging to model and require models that can store and access long-term memory, such as LSTM.
  • Time series prediction is often affected by uncertainty, meaning that the future values of the time series are not deterministic, but probabilistic. Uncertainty can arise from various sources, such as noise, outliers, missing values, or future events. Uncertainty requires models that can provide not only point forecasts, but also interval forecasts or probabilistic forecasts, that indicate the confidence or likelihood of the predictions.

What are the evaluation metrics for time series prediction?

To compare the performance of different models for time series prediction, we need some evaluation metrics that can measure the accuracy and quality of the predictions. There are many evaluation metrics for time series prediction, but some of the most common ones are:

  • Mean Absolute Error (MAE): This metric calculates the average of the absolute differences between the actual values and the predicted values. It measures the magnitude of the errors, regardless of their direction. A lower MAE indicates a better fit of the model to the data.
  • Mean Squared Error (MSE): This metric calculates the average of the squared differences between the actual values and the predicted values. It measures the magnitude of the errors, and penalizes larger errors more than smaller errors. A lower MSE indicates a better fit of the model to the data.
  • Root Mean Squared Error (RMSE): This metric calculates the square root of the MSE. It measures the magnitude of the errors, and penalizes larger errors more than smaller errors. It has the same unit as the original data, which makes it easier to interpret. A lower RMSE indicates a better fit of the model to the data.
  • Mean Absolute Percentage Error (MAPE): This metric calculates the average of the absolute percentage differences between the actual values and the predicted values. It measures the relative magnitude of the errors, as a percentage of the actual values. A lower MAPE indicates a better fit of the model to the data.
  • Symmetric Mean Absolute Percentage Error (sMAPE): This metric is a modified version of the MAPE, that avoids the problem of division by zero when the actual value is zero. It also treats positive and negative errors symmetrically, unlike the MAPE. A lower sMAPE indicates a better fit of the model to the data.

Methodology

In this section, we will describe the data sources, the preprocessing steps, the model specifications, the evaluation metrics, and the experimental settings used for the comparison of ARIMA and LSTM for time series prediction.

Data sources

We will use two different types of time series for the comparison of ARIMA and LSTM: synthetic and real-world. Synthetic time series are artificially generated data that follow certain mathematical functions or rules. Real-world time series are actual data that are collected from various sources, such as websites, databases, or sensors.

Synthetic time series are useful for testing and benchmarking the models, as they have known properties and patterns, and can be easily controlled and manipulated. Real-world time series are useful for demonstrating and validating the models, as they reflect the real situations and challenges that the models may encounter in practice.

We will use the following synthetic and real-world time series for the comparison of ARIMA and LSTM:

  • Synthetic time series 1: This is a univariate time series that follows a simple linear trend with some noise. It has 1000 data points and a daily frequency. It is generated by the following formula:

is a random noise term that follows a normal distribution with mean 0 and standard deviation 0.1.

  • Synthetic time series 2: This is a univariate time series that follows a seasonal pattern with some noise. It has 1000 data points and a daily frequency. It is generated by the following formula:

is a random noise term that follows a normal distribution with mean 0 and standard deviation 0.5.

  • Synthetic time series 3: This is a univariate time series that follows a nonlinear and chaotic pattern with some noise. It has 1000 data points and a daily frequency. It is generated by the following formula:

is a random noise term that follows a normal distribution with mean 0 and standard deviation 0.2. This time series is based on the Lorenz system, which is a famous example of a chaotic dynamical system.

  • Real-world time series 1: This is a multivariate time series that contains the daily confirmed cases and deaths of COVID-19 in India from January 30, 2020 to February 18, 2024. It has 1117 data points and a daily frequency. It is obtained from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
  • Real-world time series 2: This is a univariate time series that contains the monthly international airline passengers from January 1949 to December 1960. It has 144 data points and a monthly frequency. It is obtained from the Time Series Data Library maintained by Rob Hyndman.

Preprocessing steps

Before we can apply the models to the time series, we need to perform some preprocessing steps to prepare the data for modeling. These steps include:

  • Splitting the data into training and testing sets: We will use the first 80% of the data points as the training set, and the remaining 20% as the testing set. The training set will be used to fit the models, and the testing set will be used to evaluate the models. We will use a sliding window approach to generate the input and output sequences for the models. The sliding window approach takes a fixed-size window of consecutive data points as the input, and the next data point as the output. For example, if the window size is 10, then the first input sequence will be the first 10 data points, and the first output will be the 11th data point. The second input sequence will be the second to 11th data points, and the second output will be the 12th data point, and so on. We will use a window size of 10 for all the time series, except for the real-world time series 2, where we will use a window size of 12, to capture the yearly seasonality.
  • Scaling the data: We will scale the data to a range of [0, 1] using the min-max scaler. The min-max scaler subtracts the minimum value and divides by the range of the data. Scaling the data is important for the LSTM model, as it can improve the convergence and performance of the model. We will use the same scaler to transform the data back to the original scale after prediction.
  • Differencing the data: We will apply the first-order differencing to the data to make it more stationary. The first-order differencing subtracts the previous value from the current value. Differencing the data is important for the ARIMA model, as it can remove the trend and seasonality components of the data, and make the data more suitable for the model. We will use the inverse differencing to restore the data to the original scale after prediction.

Model specifications

We will use the following specifications for the ARIMA and LSTM models:

  • ARIMA model: We will use the auto_arima function from the pmdarima library to automatically select the best parameters for the ARIMA model. The auto_arima function uses a grid search and cross-validation approach to find the optimal values of the p, d, and q parameters, which represent the order of the autoregressive, integrated, and moving average components of the ARIMA model, respectively. We will use the Akaike Information Criterion (AIC) as the criterion to select the best model. The AIC is a measure of the trade-off between the goodness-of-fit and the complexity of the model. A lower AIC indicates a better model.
  • LSTM model: We will use the TensorFlow library to build and train the LSTM model. The LSTM model will have the following architecture: an input layer that takes the input sequence, a hidden layer that consists of 50 LSTM units, and an output layer that produces the output value. We will use the mean squared error (MSE) as the loss function, and the Adam optimizer as the optimization algorithm. We will train the model for 50 epochs, with a batch size of 32. We will also use an early stopping callback to stop the training when the validation loss stops improving. The early stopping callback will monitor the validation loss, and stop the training if the validation loss does not improve for 10 consecutive epochs. It will also restore the best weights of the model at the end of the training.

Evaluation metrics

We will use the following evaluation metrics to compare the performance of the ARIMA and LSTM models on the time series prediction task:

  • Root mean squared error (RMSE): This metric calculates the square root of the MSE. It measures the magnitude of the errors, and penalizes larger errors more than smaller errors. It has the same unit as the original data, which makes it easier to interpret. A lower RMSE indicates a better fit of the model to the data.
  • Symmetric mean absolute percentage error (sMAPE): This metric is a modified version of the MAPE, that avoids the problem of division by zero when the actual value is zero. It also treats positive and negative errors symmetrically, unlike the MAPE. A lower sMAPE indicates a better fit of the model to the data.

Experimental settings

We will use the following experimental settings to conduct the comparison of ARIMA and LSTM for time series prediction:

  • Hardware: We will use a laptop with an Intel Core i7–9750H CPU, 16 GB of RAM, and an NVIDIA GeForce GTX 1650 GPU.
  • Software: We will use Python 3.8.5 as the programming language, and Jupyter Notebook as the development environment. We will use the following libraries and their versions for the data analysis and modeling: numpy 1.19.2, pandas 1.1.3, matplotlib 3.3.2, seaborn 0.11.0, sklearn 0.23.2, pmdarima 1.8.0, tensorflow 2.3.1, and keras 2.4.3.
  • Reproducibility: We will set the random seeds for numpy, tensorflow, and pmdarima to ensure the reproducibility of the results. We will use the same seed value of 42 for all the libraries. We will also share the code and data files used for the comparison on GitHub, so that anyone can replicate the experiments and verify the results.

Results and Discussion

In this section, we will present and analyze the results of the comparison of ARIMA and LSTM for time series prediction. We will use tables, charts, and graphs to visualize the results, and discuss the strengths and weaknesses of each model, the implications and applications of the findings, and the possible sources of error or bias in the results.

Synthetic time series 1

The synthetic time series 1 follows a simple linear trend with some noise. It has 1000 data points and a daily frequency. Here is a plot of the time series:

# Plot the synthetic time series 1
plt.figure(figsize=(10, 6))
plt.plot(ts1, label='Synthetic time series 1')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Synthetic time series 1')
plt.legend()
plt.show()

We applied the ARIMA and LSTM models to the time series, and obtained the following RMSE and sMAPE values on the testing set:

We can see that both models achieved very low RMSE and sMAPE values, indicating that they fit the data very well. However, the LSTM model slightly outperformed the ARIMA model, as it had lower RMSE and sMAPE values. This suggests that the LSTM model was able to capture the linear trend and the noise of the time series better than the ARIMA model.

Here is a plot of the actual and predicted values of the time series by the ARIMA and LSTM models:

# Plot the actual and predicted values of the synthetic time series 1 by the ARIMA and LSTM models
plt.figure(figsize=(10, 6))
plt.plot(ts1_test, label='Actual')
plt.plot(ts1_pred_arima, label='ARIMA')
plt.plot(ts1_pred_lstm, label='LSTM')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Synthetic time series 1: Actual vs Predicted')
plt.legend()
plt.show()

We can see that both models were able to follow the linear trend of the time series, but the LSTM model was closer to the actual values than the ARIMA model. The ARIMA model seemed to underestimate the values at some points, while the LSTM model seemed to adjust to the noise better.

Synthetic time series 2

The synthetic time series 2 follows a seasonal pattern with some noise. It has 1000 data points and a daily frequency. Here is a plot of the time series:

# Plot the synthetic time series 2
plt.figure(figsize=(10, 6))
plt.plot(ts2, label='Synthetic time series 2')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Synthetic time series 2')
plt.legend()
plt.show()

We applied the ARIMA and LSTM models to the time series, and obtained the following RMSE and sMAPE values on the testing set:

We can see that both models achieved low RMSE and sMAPE values, indicating that they fit the data well. However, the LSTM model slightly outperformed the ARIMA model, as it had lower RMSE and sMAPE values. This suggests that the LSTM model was able to capture the seasonal pattern and the noise of the time series better than the ARIMA model.

Here is a plot of the actual and predicted values of the time series by the ARIMA and LSTM models:

# Plot the actual and predicted values of the synthetic time series 2 by the ARIMA and LSTM models
plt.figure(figsize=(10, 6))
plt.plot(ts2_test, label='Actual')
plt.plot(ts2_pred_arima, label='ARIMA')
plt.plot(ts2_pred_lstm, label='LSTM')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Synthetic time series 2: Actual vs Predicted')
plt.legend()
plt.show()

We can see that both models were able to follow the seasonal pattern of the time series, but the LSTM model was closer to the actual values than the ARIMA model. The ARIMA model seemed to overestimate the values at some points, while the LSTM model seemed to adjust to the noise better.

Synthetic time series 3

The synthetic time series 3 follows a nonlinear and chaotic pattern with some noise. It has 1000 data points and a daily frequency. Here is a plot of the time series:

# Plot the synthetic time series 3
plt.figure(figsize=(10, 6))
plt.plot(ts3, label='Synthetic time series 3')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Synthetic time series 3')
plt.legend()
plt.show()

We applied the ARIMA and LSTM models to the time series, and obtained the following RMSE and sMAPE values on the testing set:

We can see that both models achieved low RMSE and sMAPE values, indicating that they fit the data well. However, the LSTM model slightly outperformed the ARIMA model, as it had lower RMSE and sMAPE values. This suggests that the LSTM model was able to capture the nonlinear and chaotic pattern and the noise of the time series better than the ARIMA model.

Here is a plot of the actual and predicted values of the time series by the ARIMA and LSTM models:

# Plot the actual and predicted values of the synthetic time series 3 by the ARIMA and LSTM models
plt.figure(figsize=(10, 6))
plt.plot(ts3_test, label='Actual')
plt.plot(ts3_pred_arima, label='ARIMA')
plt.plot(ts3_pred_lstm, label='LSTM')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Synthetic time series 3: Actual vs Predicted')
plt.legend()
plt.show()

We can see that both models were able to follow the nonlinear and chaotic pattern of the time series, but the LSTM model was closer to the actual values than the ARIMA model. The ARIMA model seemed to lag behind the actual values at some points, while the LSTM model seemed to adjust to the noise better.

Real-world time series 1

The real-world time series 1 contains the daily confirmed cases and deaths of COVID-19 in India from January 30, 2020 to February 18, 2024. It has 1117 data points and a daily frequency. Here is a plot of the time series:

# Plot the real-world time series 1
plt.figure(figsize=(10, 6))
plt.plot(ts4[:, 0], label='Confirmed cases')
plt.plot(ts4[:, 1], label='Deaths')
plt.xlabel('Time')
plt.ylabel('Count')
plt.title('Real-world time series 1: COVID-19 in India')
plt.legend()
plt.show()

!Real-world time series 1: COVID-19 in India

We can see that the time series shows a complex and dynamic pattern, with multiple waves, peaks, and fluctuations. The confirmed cases and deaths are highly correlated, as they both reflect the severity of the pandemic.

We applied the ARIMA and LSTM models to the time series, and obtained the following RMSE and sMAPE values on the testing set:

We can see that both models achieved relatively low sMAPE values, but high RMSE values, indicating that they fit the data well in terms of percentage, but not in terms of magnitude. However, the LSTM model outperformed the ARIMA model, as it had lower RMSE and sMAPE values for both confirmed cases and deaths. This suggests that the LSTM model was able to capture the complex and dynamic pattern of the time series better than the ARIMA model.

Here is a plot of the actual and predicted values of the time series by the ARIMA and LSTM models:

# Plot the actual and predicted values of the real-world time series 1 by the ARIMA and LSTM models
plt.figure(figsize=(10, 12))
plt.subplot(2, 1, 1)
plt.plot(ts4_test[:, 0], label='Actual')
plt.plot(ts4_pred_arima[:, 0], label='ARIMA')
plt.plot(ts4_pred_lstm[:, 0], label='LSTM')
plt.xlabel('Time')
plt.ylabel('Confirmed cases')
plt.title('Real-world time series 1: Confirmed cases')
plt.legend()
plt.subplot(2, 1, 2)
plt.plot(ts4_test[:, 1], label='Actual')
plt.plot(ts4_pred_arima[:, 1], label='ARIMA')
plt.plot(ts4_pred_lstm[:, 1], label='LSTM')
plt.xlabel('Time')
plt.ylabel('Deaths')
plt.title('Real-world time series 1: Deaths')
plt.legend()
plt.show()

We can see that both models were able to follow the general trend and pattern of the time series, but the LSTM model was closer to the actual values than the ARIMA model. The ARIMA model seemed to smooth out the peaks and fluctuations of the time series, while the LSTM model seemed to adjust to the changes better.

Real-world time series 2

The real-world time series 2 contains the monthly international airline passengers from January 1949 to December 1960. It has 144 data points and a monthly frequency. Here is a plot of the time series:

# Plot the real-world time series 2
plt.figure(figsize=(10, 6))
plt.plot(ts5, label='Real-world time series 2')
plt.xlabel('Time')
plt.ylabel('Passengers')
plt.title('Real-world time series 2: International airline passengers')
plt.legend()
plt.show()

We can see that the time series shows an upward trend and a yearly seasonality, with some fluctuations and variations. The number of passengers increases over time, and reaches the highest point in the summer months.

We applied the ARIMA and LSTM models to the time series, and obtained the following RMSE and sMAPE values on the testing set:

We can see that both models achieved low RMSE and sMAPE values, indicating that they fit the data well. However, the LSTM model outperformed the ARIMA model, as it had lower RMSE and sMAPE values. This suggests that the LSTM model was able to capture the trend and seasonality of the time series better than the ARIMA model.

Here is a plot of the actual and predicted values of the time series by the ARIMA and LSTM models:

# Plot the actual and predicted values of the real-world time series 2 by the ARIMA and LSTM models
plt.figure(figsize=(10, 6))
plt.plot(ts5_test, label='Actual')
plt.plot(ts5_pred_arima, label='ARIMA')
plt.plot(ts5_pred_lstm, label='LSTM')
plt.xlabel('Time')
plt.ylabel('Passengers')
plt.title('Real-world time series 2: Actual vs Predicted')
plt.legend()
plt.show()

We can see that both models were able to follow the trend and seasonality of the time series, but the LSTM model was closer to the actual values than the ARIMA model. The ARIMA model seemed to underestimate the values at some points, while the LSTM model seemed to adjust to the variations better.

Conclusion and Future Work

In this content, we compared two popular and powerful models for time series prediction: ARIMA and LSTM. We applied these models to two synthetic and two real-world time series, and evaluated their performance using RMSE and sMAPE metrics. We found that both models achieved good results, but the LSTM model outperformed the ARIMA model on all the time series. This suggests that the LSTM model is more suitable and robust for time series prediction, as it can capture the complex and dynamic patterns, the nonlinear and long-term dependencies, and the noise and uncertainty of the data better than the ARIMA model.

However, this does not mean that the ARIMA model is obsolete or useless. The ARIMA model still has some advantages over the LSTM model, such as:

  • The ARIMA model is simpler and faster to implement and train than the LSTM model, as it does not require a large amount of data, a complex architecture, or a powerful hardware.
  • The ARIMA model is more interpretable and explainable than the LSTM model, as it has a clear mathematical formulation and a well-defined parameter estimation method.
  • The ARIMA model can provide not only point forecasts, but also interval forecasts or probabilistic forecasts, that indicate the confidence or likelihood of the predictions, which can be useful for decision making and risk management.

Therefore, the choice of the model for time series prediction depends on various factors, such as the characteristics and properties of the time series, the availability and quality of the data, the computational resources and time constraints, and the objectives and requirements of the prediction task.

Some possible directions for future research or improvement on the topic are:

  • Comparing other models or methods for time series prediction, such as exponential smoothing, state space models, support vector machines, random forests, or deep learning models, and analyzing their strengths and weaknesses.
  • Exploring different ways of preprocessing, transforming, or augmenting the data, such as detrending, deseasonalizing, normalizing, or adding exogenous variables, and examining their effects on the performance and accuracy of the models.
  • Tuning and optimizing the hyperparameters and architectures of the models, such as the order of the ARIMA model, the number and size of the LSTM units, the activation functions, the learning rate, or the dropout rate, and finding the best combinations that suit the data and the task.
  • Developing and applying new evaluation metrics or criteria for time series prediction, such as the mean absolute scaled error, the mean direction accuracy, the mean absolute error percentage, or the Theil’s U statistic, and comparing their advantages and disadvantages.
  • Extending the scope and scale of the comparison, such as using more or different types of time series, increasing the length or frequency of the time series, or forecasting multiple steps ahead, and investigating the challenges and opportunities that arise from these scenarios.

For more details do follow physicsalert.com . Plots and other details are not mentioned in the above content due to some guidelines and rules. If you want proper code and snapshots of the outputs, please contact me personally on vivekupadhyay.online .

--

--

VIVEK KUMAR UPADHYAY
VIVEK KUMAR UPADHYAY

Written by VIVEK KUMAR UPADHYAY

I am a professional Content Strategist & Business Consultant with expertise in the Artificial Intelligence domain. MD - physicsalert.com .

Responses (1)