A time series is a series of data points that are listed in chronological order. It has equal spacing between two measurements that follow each other. Thus, a time series can be thought of as a sequence of discrete time data that covers a continuous time interval.
Forecasting data using time series analysis involves using a meaningful model to predict future findings based on known past results.
Traditionally, time series forecasting has been dominated by linear methods like ARIMA because they are well understood and effective on many problems. But these traditional methods also suffer from some limitations such as linearity of relationships and uni-variety of data.
Recently, Deep learning methods became very promising for time series forecasting, especially because they can learn lots of time series proprieties that presented challenges before; such as automatic learning of time dependence and automatic processing of temporal structures like trends and seasonality. These models are able to predict future values based on previously observed values due to their quality of extracting input data for long durations.
This article walks through the different deep learning models applied to time series forecasting. Here is an outline of the article:
- Introduction to Recurrent Neural Networks (RNNs).
- RNNs limitations in Time Series Forecasting
- Introduction to LSTM and GRU models
- LSTM and GRU for Time Series Forecasting
- Limitations of RNNs and LSTM/GRU
- A basic introduction to Transformers
- Transformers for Time Series Forecasting
In this part, we will discover Recurrent Neural networks as well as LSTM and GRU and their applications to time series.
This Article supposes that you have a prior Knowledge in Deep Learning and Time Series.
1- Recurrent Neural Networks Series
A recurrent neural network, or RNN, is a neural network that contains recurrent layers.
The main idea of recurrent neural networks is to use not only the input data xn , but also the previous outputs (x0 .. xn-1) to make the current prediction. Thus, we could build neural networks that transmit values over time.
This architecture adds state or memory to the network and allow it to learn and exploit the ordered nature of the observations in the input sequences.
Recurrent neural networks (RNNs) have always been used in sequence modeling, with good results for various natural language processing tasks. They are widely applied in Language Modeling and text generation, machine translation and Speech Recognition.
2- RNNs limitations in Time Series
In deep learning, the gradient calculation consists of performing a forward propagation pass through the unrolled graph, followed by a backward propagation pass. The execution time cannot be reduced by parallelization because the forward propagation graph is sequential in nature, i.e., each time step can only be calculated after the previous one. Therefore, back propagation for a recurrent model is called back propagation in time (BPTT).
During back propagation, recurrent neural networks suffer from the vanishing gradient problem. This Vanishing Gradient problem occurs especially when dealing with large time series data sets. If a gradient value becomes extremely small, it does not contribute much to learning.
Gradients are values used to update the weights of a neural network. The problem of gradient fading arises when the gradient gets smaller as it propagates through time.
This type of neural networks suffers also from short-term memory; If a sequence is long enough, they will have trouble carrying information from the first temporal steps to the next. Thus, layers that get a small gradient update stop learning. These are usually the oldest layers.
So, because these layers do not learn, RNNs can forget what they saw in the longer sequences, thus having a short-term memory.
3- GRU and LSTM as a solution
GRU (Greater Recurrent Units) or LSTM (Long Short Term Memory) are a very effective solution to the fading gradient problem and will allow our neural network to capture much longer range dependencies.
Both GRU and LSTM have a control mechanism to regulate the flow of information, such as storing the context over multiple time steps. They keep track of which information from the past can be retained and which can be forgotten.
- Long Short Term Memory – LSTM :