Introduction

Analysis of Machine Learning Methods for Predicting Stock Prices?

Oluwadurotimi Onibonoje

oluwadurotimi.onibonoje2@mail.dcu.ie 2

Kevin Djoussa

Mark Roantree

mark.roantree@dcu.ie 0 0 Insight Centre for Data Analytics, Dublin City University , Ireland 1 School of Computing, Dublin City University , Ireland 2 VistaMilk SFI Research Centre, Dublin City University , Ireland

In this research, we investigated the applicability of Long Short Term Memory (LSTM) and Convolutional Neural Networks (CNN) in forecasting the next day's closing price of four major stock indices and explored de-noising techniques to improve the performance of these models. Our experiments show the use of Kalman Filters with the LSTM model provide the best forecast accuracy, reducing forecast error by at least 30% in three of the four nancial time series used in this study.

Time Series Analysis Neural Networks Signal Processing

Introduction

Predicting the future price of stock indices has been shown to be an extremely challenging endeavor, largely due to the noisy and non-stationary characteristics of their time series [ 4 ]. Several approaches have been investigated in forecasting stock indices, eg. [ 3 ], [ 13 ]. Statistical approaches such as the linear autoregressive Integrated Moving Average (ARIMA) were used in forecasting the monthly stock price of the S&P 500 [ 2 ]. Deep learning architectures such as the LSTM implemented in [ 1 ] and Convolutional Neural Networks (CNN) as used in [ 11 ] are some of the non-linear models that have shown promise in this domain. These research initiatives provide evidence that using more sophisticated models can deliver better results.

Motivation. Stock market indices such as the Standard and Poor's 500 (S&P500) have been shown, through the Granger-causality test, to have predictive power as a leading indicator of the economy [ 6 ]. Therefore, accurate forecasts of stock market indices will help economic policy makers reach more informed conclusions as regards the right economic policies for desired economic outcomes. Institutional investors will also bene t from accurate forecasts of a stock market index, as these predictions will help inform the portfolio optimization process across di erent nancial asset classes.

Contribution. This project attempts to forecast the univariate time series of four major stock indices, namely, the S&P 500, Dow Jones Industrial Average (DJIA), Euro Stoxx 50 (Stoxx50E) and the National Association of Securities Dealers Automated Quotation (NASDAQ) exchange. We applied three deep learning algorithms, namely, LSTM, CNN and CNN-LSTM on the daily closing prices for each index. In addition to investigating and understanding the e cacy of neural network architectures in time series forecasting, our research attempts to examine the merits of using de-noising techniques such as wavelet transform and Kalman lters on these nancial time series. Novel ensemble approaches composed of these de-noising techniques and the neural network models were introduce in this paper were developed in a bid to improve the accuracy of the time series forecasts of our baseline model.

Paper Structure. The remainder of this paper is structured as follows: in section 2, we provide an overview of related research; in section 3, we examine the theoretical concepts and models that underpin our implementation models; in section 4, we present our methodology for detecting the best model con guration for day ahead forecasts of stock Indices; In section 5, we present our evaluation and discuss our ndings; and nally, in section 6, we conclude the paper. 2

Related Research

In [ 1 ], the authors applied Deep Neural Networks to predict one-month ahead stock returns of stocks in the MSCI Japan Index. Their approach used 25 fundamental analysis factors for each stock in the cross-section of the Japanese stock market. The experimental results, which were evaluated using Rank Correlation, Directional Accuracy and Mean Square Error (MSE), showed that Deep Neural Networks outperformed other models including Support Vector Regression (SVR), Random Forest(RF) and Shallow Neural Network models with a 30% average uplift using the rank correlation metric and a 2.6% reduction in MSE.

The e cacy of Deep Belief Networks (DBN) was investigated in [ 9 ] with Technical Analysis Indicators as features and a 2-Dimensional Principal Component Analysis (PCA) model in predicting the S&P 500. Three models were formulated and evaluated using the RMSE metric in order to properly evaluate the usefulness of the Technical Analysis Indicators. The rst model is composed of a Back Propagation Neural Network (BPNN) and the basic features in the raw dataset while the second model is composed of a DBN, basic features and extracted Technical Indicator features. The nal model adds the complexity of a 2-Dimensional PCA to the previous model. The experimental results indicate that Technical Indicators coupled with PCA can help improve the predictive power of Deep Learning Algorithms. The nal model had a 43.5% reduction in RMSE in comparison to the rst model and a 16.91% reduction in RMSE when compared to the second model.

Wavelet Transforms have been studied in di erent applications with nonstationary time series and signals [ 24 ]. Wavelet transforms decompose a signal into components of di erent time scales. Li and Tam used wavelets as a realtime de-noising technique for nancial time series of East Asian Stock Indices as seen in [ 17 ]. Their approach applies di erent mother wavelets with grid-searched hyper-parameters like the decomposition level and sliding window size on these stock indices to obtain a smooth time series, which will be processed by an LSTM Neural Network. The results of their research show that there is merit in using wavelets as a de-noising technique for neural network models. De-noised data improved directional accuracy of LSTM forecasts on original data inputs by an average of 7.6%.

Kalman Filter is a state space model [ 15 ], which uses an optimal recursive algorithm typically found in signal processing research. In [ 8 ], the authors compared the use of wavelet transforms and Kalman lters in de-noising signals. Their results show that the Coi et 2 wavelet transform outperforms Kalman Filters in signal de-noising. Ma and Teng predicted chaotic time series using a variation of the Kalman Filter known as Unscented Kalman Filter (UKF) [ 19 ]. Lima and Neto used Kalman lters in conjunction with wavelets to pre-process the time series of the Brazilian IBOVESPA index [ 18 ]. The pre-processed time series is fed into a Recurrent Neural Network (RNN) to generate forecasts. Their results indicate a Mean Absolute Percentage Error (MAPE) of 0.72%, beating other models including ARIMA.

Summary. From this review, we can interpret the most recent approaches taken by researchers in seeking better stock price predictions. Some approaches have included technical indicator features using Neural Networks [ 9 ], linear statistical models [ 2 ] and wavelet de-noising on the input time series as seen in [ 17 ]. While these approaches are richly varied in their methodologies, we have not found a study that attempted to focus solely on the input nancial series and compare the e ect of the de-noising techniques that we propose to improve the forecasting ability of Neural Network models. 3

Background Models

In this section, we provide a brief overview of the models used in our evaluation and the 2 denoising techniques used in an attempt to improve model performance. 3.1

Long Short Term Memory (LSTM)

Recurrent Neural Networks (RNN) maintain an internal loop that allows for information persistence. The output of a RNN is used in conjunction with the current element in the input tensor, to compute the next element in the output sequence[ 10 ]. In more simplistic RNN models, the memory unit or state of the RNN is often equivalent to the previous output while other complex models have di erent values for the state and the previous element in the output sequence.

Equation 1 [ 10 ] illustrates the computational model output in a RNN for a given time step t. In this equation, the activation function ' is a nonlinear hyperbolic tangent function and Wx; Wy are matrices, containing connection weights for the input of the current time step and the outputs for the previous time step respectively. The state matrix from the previous time step is h(t 1) while the bias vector b contains the bias term for each neuron.

Yt = '(X(t)Wx + h(t 1)Wy + b) (1) (2) (3) (4)

LSTMs [ 14 ] address the long term dependency problem of RNNs by introducing three gate structures namely the forget gate F , input gate I, and output gate O shown in Fig. 1. The forget gate function in equation 2 takes as input the previous state h(t 1) and the current input vector X(t) and passes these inputs into a sigmoid function which returns a value between 1 and 0 that represents the amount of information to ow through the gate.

Ft = (WF [h(t 1); xt] + bF ) The input gate I(t) function in equation 3, as with the forget state, takes as input the previous state h(t 1) and the current input vector X(t) and passes these inputs into a sigmoid function. The input gate helps to determine the value to be updated.

It = (WI [h(t 1); xt] + bI )

The output gate function O(t) shown in equation 4 determines those parts of the long-term state C(t) to be passed as output to H(t) for the current time step.

Ot = (WO[h(t 1); xt] + bO)

The matrices WF ; WI ; WO contain the connection weights for the gate layers while bF ; bI ; bO are the bias terms for these layers.

Convolutional Neural Network

Convolutional Neural Networks (CNNs) are a class of deep learning algorithms shown to be highly e ective in tasks relating to visual perception [ 16 ]. The architecture of a CNN incorporates the convolution layer and the pooling layer. These layers explain why CNNs outperform and are more e cient than traditional neural network architectures in computer vision tasks. In this research, we applied a 1D CNN to our nancial time series.

The convolutional layer allows the Neural Network to capture spatial and temporal dependencies in the input feature map by applying a convolutional kernel on this input tensor. The layer iteratively parses each element of the input feature map by applying a sliding convolution kernel to produce a convolved output feature map, which models the translation invariant nature of the input feature map. The size of the output feature map is in uenced by the size of the convolution kernel and the padding added to the input feature map to avoid losing the edge elements of the feature space.

The pooling layer down samples the convolved feature map by applying tensor operations with a nxn sliding window. There are principally two types of pooling operations namely: maximum pooling and average pooling. The maximum pooling operation computes the highest value in the current nxn window of the convolved feature map while the average pooling operation computes the mean. The pooling layer allows CNNs to better model spatial hierarchies present in the input feature map. The layer reduces the number of parameters of the input feature map hence, reducing the risk of over tting and giving CNNs generalization power. In 3.3

Wavelet Transformation

Wavelet Transformation is a signal processing technique that has been used in many applications such as image compression, time-frequency analysis and data de-noising. The Continuous Wavelet Transform (CWT) provides the timefrequency representation and overcomes the resolution problems of the Short Time Fourier Transform (STFT). The CWT di ers from the STFT by o ering a variable size window function for the spectral components. The wavelet function derived from the mother and father wavelet [ 4 ] can be expressed using equations 5 and 6, where a is the scale factor and b is the translation factor.

1 a;b(t) = pa

1 a;b(t) = pa (t (t a a b) b)

The formula for the CWT and the Inverse CWT [ 4 ] of a function is shown in equations 7 and 8.

W f(a;b) = pa

f (x) =

Here, we applied the Haar wavelet to de-noise our input nancial time series. The Haar wavelet is computationally more e cient than other mother wavelets and has shown to be capable of improving results in this domain [ 4 ]. Kalman Filters estimate the state of a system given measurements with expected errors. Their e ciency in making time series forecasts make them widely applied in time series analysis and real-time applications [ 21 ]. A linear Gaussian model for the state and observation of a measured process is shown in equations 12 and 13, where xt is the real value at a given time t for the measured system and ytis the measured value at t. (8) (9) (10) (11) (12) (13)

In order to determine the real state of the system at a given time t, there are three functional components. The rst component, F xt 1, shows the functional relationship between the value of the previous state xt 1 and the current state xt. The second component, B ut, is an external force term [ 21 ]. The third component, wt is a stochastic term which captures dynamics not present in the previous state. The measured value, yt, is determined by applying a function to the real value of the current state, A xt, and adding a white Gaussian noise vt. The Kalman lter forecasts the future value using equation 14, where Kt is the Kalman gain.

x^t = Kt yt + (1

Kt)

The Kalman Filter recursively iterates between the prediction and ltering phase [ 8 ] with the prediction phase described by equations 15, 16, and the ltering phase by equations: 17, 19, where Pt the estimate of the state covariance; R is the measurement error variance; and Q is a tunable hyper-parameter for improving the performance of the model. (15) (16) (17) (18) (19) x^t = F ^ P t = F x^t 1 + B

ut Pt 1

F T + Q x^t = x^t + Kt (yt

x^t ) Pt = (I

Pt The Kalman gain, Kt, attempts to determine the relative importance of the measured error of the estimate when compared to the error of the real value. The computation of the Kalman gain is described as follows [ 21 ] and shown in equation 19.

Kt = Pt

At (A

AT + R) 1 4

Methodology

In this research, with the exception of our baseline ARIMA model, we relax the stationarity condition for the nancial time series in our neural network models as in [ 15,24 ]. The implementation logic for our NN models was adapted from the approach presented in [ 5 ].

Data Preprocessing. The dataset used in this research was obtained from Yahoo! Finance from 01-01-2004 to 31-12-2019 for the following stock indices: NASDAQ 100 (NDX), S&P 500, Euro Stoxx 50 and the Dow Jones Industrial Average. The daily closing price series for each index was transformed into a 1D tensor composed of 100 successive daily values and mapping our independent variable x^ = xt+1; xt+2: : : xn to the corresponding dependent variable y^ = xn+1. The values of the independent variable were standardized using Min-Max normalization to have values within the range of (0,1). 4.1

Evaluation Models

Baseline Model ARIMA was selected as a baseline model where the result is a benchmark to measure the improvement in forecast accuracy from other more sophisticated models. The steps are as follows: { Step 1: The dataset is split into a training set and a test set using a 70:30 ratio; { Step 2: The hyper-parameters, p,d,q are grid searched on the training set to produce the optimal model with the lowest Mean Squared Error; { Step 3: predictions from the ARIMA model are compared with the values in the test set and evaluated using the MSE.

Wavelet Transform - Convolutional Neural Network (WT-CNN).

The speci cation of this sequential CNN architecture is as follows: Layer 1 is a 1-D convolutional layer with 3 lters and 3 kernels with the Recti ed Linear Unit (ReLU) as the activation function. Layer 2 is also a 1-D convolutional layer that has identical hyper-parameters to the preceeding layer. Layer 3 is a 1-D maximum pooling layer followed by a 4th layer which attens the two dimensional tensor into a vector fed into a fully connected layer, not unlike the data model approach taken in [ 12 ]. The loss function which the model optimizes using the Adam Optimizer is the Mean Squared Error (MSE). The following steps were followed for the implementation of the WT-CNN model: { Step 1: The time series is de-noised by Haar Wavelet with soft thresholding. { Step 2: The de-noised time series is scaled to the range f0,1g. This is to allow the CNN to converge faster. { Step 3: The dataset is divided into a training and test set using a 70:30 split. { Step 4: Both the training and test sets are converted to a supervised learning problem with 100 past sequences representing the independent variable used to predict the next value in the sequence. { Step 5: The training and test input tensors are reshaped to have the following dimensions: [samples; timesteps; f eatures]. { Step 6: The input tensors are fed into the CNN and the network is trained over 100 epochs. { Step 7: The predictions generated from the CNN are standardized to their normal range and then compared to the test set to generate values for the evaluation metrics.

WT-CNN-Long Short Term Memory (WT-CNN-LSTM). The spec

i cation of the sequential LSTM architecture is as follows: Layer 1 is a 1-D convolutional layer with 64 lters and a kernel with the Recti ed Linear Unit as the activation function. Layer 2 is a 1-D maximum pooling layer with a pooling size of 2, followed by a layer which attens the two dimensional tensor into a vector fed into a LSTM layer, with 50 neurons and a ReLu activation function. The LSTM layer outputs a tensor to a fully connected layer. Once again, the MSE loss function is optimized using the Adam Optimizer. The steps followed in the implementation of the WT-CNN-LSTM model mirror those of the WT-CNN model with the major di erences being the change in dimension of the input tensors from [samples; timesteps] into [samples; subsequences; timesteps; f eatures].

WT-LSTM. The speci cation of the sequential LSTM architecture is as follows: the rst three layers are LSTM layers with 50 neurons and the nal layer is a fully connected layer. The implementation logic of the WT-LSTM is not too di erent from previous models, where the major di erence being the dimension of the input tensors: [samples; timesteps].

Kalman Filter-LSTM (KF-LSTM). In this model, the input time series is rst passed into the Kalman lter as a 1-D tensor. The output of the de-noising process returns a 2-D tensor that is reshaped into a 1-D shape before it is fed into an LSTM network. 5

Evaluation

In this section, we present the results of our model implementations, evaluated using the RMSE and MAE metrics obtained for each stock index. These evaluation metrics measure the distance between predictions from the algorithm and the actual values in the test set.

Our experiments show LSTMs outperform other Neural Network architectures in forecasting the univariate time series of stock indices. With the exception of the NDX stock index, LSTMs improve the forecast performance of the One Dimensional CNN by over 50% for each stock index. When compared with the CNN-LSTM architecture, LSTMs marginally outperform in forecasting the time series of the S&P 500 and the Euro Stoxx50E index. There is a more widened gap between these two architectures, however, for the Dow Jones index and the NDX index.

From Table 2, we can see that the LSTM model con guration achieved superior RMSE scores when compared to the CNN and CNN-LSTM for both the S&P500 and the STOXX50E stock index. The CNN network had the worst performance in trying to predict all of the stock indices, with the exception of the NDX. Interestingly, Only the LSTM model for the STOXX50E outperformed the baseline model, ARIMA.

Using Table 3, we nd similar patterns to those which were obtained in Table 2 with respect to LSTMs outperforming other models for stock indices such as the S&P 500 and DJIA. For the best forming neural network models on stock indices such as the STOXX50E and S&P 500, wavelet transform delivered similar results with the ARIMA model. The worst model performance came from the CNN-LSTM model with the exception of the STOXX50E dataset. In many tasks involving time series analysis, CNNs generally outperform LSTMs but this was not the case for the evaluated results of the time series forecasts presented in Table 4. Our assumption is that this is due to the two-dimensional architecture of the network that allows it to capture both spatial and temporal information inherent in the time series.

We found that using wavelets to remove noise in our univariate time series to be inconclusive. On one hand, predictions using the Dow Jones index showed a 20% average reduction of in forecast error when de-noised using wavelet transforms. However, predictions using the NDX and Stoxx50E indices showed no substantial reduction in forecast error. We interpret that the application of Haar wavelet decomposition may not be suitable for all types of nancial time series as suggested by [ 17 ]. Indeed, each nancial time series may require a unique mother wavelet for its decomposition. With the exception of models that used Kalman ltering, no model could consistently beat the (baseline) ARIMA model in forecasting for each index. These results show that increasing model complexity does not guarantee improved performance. Many studies show that statistical forecasting techniques such as ARIMA can often outperform Neural Network architectures [ 5 ]. Kalman lters reduced the forecast error of the LSTM on the S&P 500 and Stoxx50E by 68%. The forecast errors in the Dow Jones index reduced by 53% while those in NDX reduced by 34%.

In summary, we found de-noising performance of Kalman Filters to outperform wavelets for the nancial time series in our evaluation. Our Neural Network models used the previous N = 100 samples to predict the next price in the sequence, with the choice of N , being arbitrary. 6

Conclusions

In this paper, we applied three deep learning algorithms, LSTM, CNN, and CNNLSTM to forecast the univariate time series of four stock indices; S&P 500, Dow Jones Industrial Average, Euro Stoxx 50 and the Nasdaq Exchange. Initially, we attempted to forecast the future prices of the stock indices using Neural Networks; we then investigated the e cacy of Discrete Wavelet Transforms, particularly the Haar Mother Wavelet to de-noise the input nancial time series; and nally, we investigated the use of Kalman Filters and discovered better performance when compared to the wavelet transform approach. Results were evaluated using the RMSE and MAE metrics. While our evaluation does provide support for ARIMA, for forecasting using time series, we believe that our results using some de-noising techniques suggest that other approaches may outperform ARIMA, given the appropriate experimental con gurations.

Our current work is focused on the exploration of other input features to enhance the performance of the neural network models: con guring the type of mother wavelet applied, decomposition level and window size may well deliver improved performance. We are also seeking to investigate the optimal con gurations of DWT for more frequent observations of these time series.

Onibonoje et al.

Abe and

Nakayama , "Deep Learning for Forecasting Stock Returns in the Cross-Section" , ArXiv , 2018 . [Accessed 14 August 2020 ].

Ariyo ,

Adewumi and

Ayo , "Stock Price Prediction Using the ARIMA Model" , 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation , 2014 . Available: 10 .1109/uksim. 2014 . 67 [Accessed 1 August 2020

Ken

Bailey , Mark Roantree,

Martin

Crane , and Andrew McCarren . Data Mining in Agri Warehouses Using MODWT Wavelet Analysis . 23rd Intl. Conf. on Information and Software Technologies , pp. 241 - 253 , Springer, 2017 .

Bao ,

Yue and

Rao , "A deep learning framework for nancial time series using stacked autoencoders and long-short term memory" , PLOS ONE , vol. 12 , no. 7 , p. e0180944 , 2017 .

Brownlee , Deep Learning for Time series Forecasting. 1st edn . 2020 .

Comincioli , "The Stock Market As A Leading Indicator: An Application Of Granger Causality" , University Avenue Undergraduate Journal of Economics , vol. 1 , no. 1 , 1996 . Available: https://digitalcommons.iwu.edu/uauje/vol1/iss1/1.

Dghais and

Ismail , "A study of stationarity in time series by using wavelet transform" , 2014 .

Erkan and

Bolat , "Comparison of Kalman Filter and Wavelet Filter for Denoising" , 2005 International Conference on Neural Networks and Brain , 2005 .

Gao ,

Li ,

Chai and

Tang , "Deep learning with stock indicators and twodimensional principal component analysis for closing price prediction system" , 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS) , 2016 .

10.

Geron , Hands-On Machine Learning with Scikit-Learn, Keras Tensor ow. 2nd edn. O'Reilly Media , Inc. Sebastopol( 2019 ).

11. H. Gunduz , Y.

Yaslan and Z.

Cataltepe , "Intraday prediction of Borsa Istanbul using convolutional neural networks and feature correlations", Knowledge-Based Systems , vol. 137 , pp. 138 - 148 , 2017 . Available: 10 .1016/j.knosys. 2017 . 09 .023.

12. Piotr

Habela

, Mark Roantree,

Kazimierz

Subieta . Flattening the metamodel for object databases . Proceedings of ADBIS, Lecture Notes in Computer Science vol. 2435 , pages 263 - 276 , Springer, 2002 .

13.

Henrique ,

Sobreiro and

Kimura , "Literature review: Machine learning techniques applied to nancial market prediction" , Expert Systems with Applications , vol. 124 , pp. 226 - 251 , 2019 . Available: 10 .1016/j.eswa. 2019 . 01 .012.

14.

Hochreiter and

Schmidhuber , "Long Short-Term Memory" , Neural Computation , vol. 9 , no. 8 , pp. 1735 - 1780 , 1997 . Available: 10 .1162/neco. 1997 . 9 .8. 1735

15.

Kalman , "A New Approach to Linear Filtering and Prediction Problems" , Journal of Basic Engineering , vol. 82 , no. 1 , pp. 35 - 45 , 1960 .

16.

Krizhevsky , I. Sutskever and

Hinton , "ImageNet classi cation with deep convolutional neural networks" , Communications of the ACM , vol. 60 , no. 6 , pp. 84 - 90 , 2017 .

17.

Li and

Tam , "Combining the real-time wavelet denoising and long-shortterm-memory neural network for predicting stock indexes" , 2017 IEEE Symposium Series on Computational Intelligence (SSCI) , 2017 .

18.

Lima and

Neto , "Combining Wavelet and Kalman Filters for Financial Time Series Forecasting" , Asian Economic and Financial Review , 2014 .

19. J. Ma and

Teng , "Predict chaotic time-series using unscented Kalman lter" , Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826) , 2004 .

20.

Moghar and

Hamiche , "Stock Market Prediction Using LSTM Recurrent Neural Network" , Procedia Computer Science , vol. 170 , pp. 1168 - 1173 , 2020 . Available: 10 .1016/j.procs. 2020 . 03 .049.

21.

Nielsen , Practical time series analysis. 1st edn. O'Reilly Media , Inc, Sebastopol( 2019 ).

22.

Polikar , Wavelet Tutorial - Part 2, http://users.rowan.edu/ polikar/WTpart2.html. Last accessed 01 Aug 2020

23. J. Rankin , "Kalman ltering approach to market price forecasting."

24. M. Rhif , A. Ben

Abbes , I.

Farah , B.

Mart nez and

Sang , "Wavelet Transform Application for/in Non-Stationary Time-Series Analysis: A Review" , Applied Sciences , vol. 9 , no. 7 , p. 1345 , 2019 .