1. Introduction

Machine Learning Methods for Equity Time Series Forecasting: A Compendium

Alberto Matuozzo

Paul D. Yoo

Alessandro Provetti

Maria H. Kim

1 0 Birkbeck College, University of London , Malet St, London, WC1E 7HX , United Kingdom 1 University of Wollongong , Wollongong, NSW 2522 , Australia

Machine learning is a method of building predictive models using a vast amount of data from diferent sources, capturing non-linear relationships between diferent variables. As a result, financial markets in general and stock markets in particular, ofer a promising ground for the application of such method. This survey examines machine learning methods for equity market forecasting, identifying gaps in current knowledge and suggesting potential avenues to pursue further research. Computer science-centred quantitative studies have focused mainly on algorithms, testing results mostly on US data on short time-frames, yet, feature engineering, and testing findings on diferent markets and diferent time horizons, appear to be under-explored. This study thus introduces the financial context for non-experts and moves to review diferent models and tools in the realm of statistical learning, and deep learning. We believe that this approach will prove to be efective in ifnancial practice to an interested reader without much prior knowledge of the finance literature. We survey the end-to-end deployment of machine learning to help readers from industry and academia to understand the peculiarities of applying these methods to equity market forecasting.

eol>Machine Learning Deep Learning Time Series Forecasting Equity market forecasting

1. Introduction

publications and original contributions from the field for the critical steps of the machine learning deployment Devising Equity markets forecasting is relevant not only process, from pre-processing to algorithm selection. We for the parties directly involved in price formation (e.g., cite papers that are now part of history of financial thecompanies and investors), but also for policymakers and ory for the benefit of non-experts. Research from other regulators. Central banks have shown a growing interest domains is included where it is reasonable to assume that in modelling equities to decide macro prudential policy, the related techniques could be ported to financial time assess investors’ attitude towards risk, and in some cases series forecasting. deploy capital in the market directly. For example, ac- The contributions of this study are as follows: cording to the 30th of June 2022 13F SEC filings 1 , the Swiss National Bank owns about $11.11 billions of Apple and $7.49 billions of Microsoft.

This study is driven by the need to review machine learning techniques applied to equity markets time series forecasting problems, with the objective to provide an overview for practitioners, highlight areas under explored and warranting further research.

This study explores research from finance and computer science, from industry and academia. As such, the selection of papers for this literature review is not purely bibliometric. We aim to cover a selection of high impact • We review, with the caveat that the field is evolving at a significant speed, the main machine learning solutions for equity market forecasting deployed in terms of features and algorithms. We observe that most findings have been corroborated on specific markets or on specific periods.

Data sampling and forecasting horizon are mostly daily. • We highlight potential directions for future research, emphasising the adoption of a more data–centric approach. Ensemble methods should be further researched as architectures to leverage the peculiarities of financial data.

2. Framing the problem The non-stationarity and non-linearity of financial variables are the primary attributes to estimate a forecasting model for equity markets. 2.1. The “Eficient Markets” Hypothesis Under the Eficient Markets Hypothesis (EMH) developed by Fama [1], the sequence of price changes must be unpredictable if they fully incorporate information and expectations of market participants.

Yet, in the real world, analysing past returns and processing publicly available information are instrumental to build a forecasting model. Jagadeesh and Titman [ 2 ], showed that it is possible to generate statistically significant abnormal returns, by buying the stocks that exhibited top decile returns in the past (i.e., winners) and selling stocks in the bottom decile, losers. Malkiel [ 3 ] documented several market anomalies that increase predictability; however, these ineficiencies do not persist.

If market participants firmly believed in EMH, it would be irrational to trade, hence there would not be a financial market. Grossman and Stiglitz [ 4 ] ofer the following solution: the information derived by skilled investors is not entirely reflected in the market. As a result, investment research is compensated. Bauer et al. [ 5 ] showed skill persistence amongst retail option traders: top decile performers over one year outperform individuals in the bottom decile the following year.

The emergence of big data gives rise to a paradigm shift in the field of financial engineering where the state-ofthe-art techniques and data processing capacity are of utmost importance. Not only newsflow, economic and stock market data are widely available in copious size, however, unstructured data (e.g., geo-location), can be accessed at any time, processed in diferent ways. As a result, investors equipped with the latest technological advancement can make use of an extensive set of features. Machine learning algorithms do not require data transformations to make a time series stationary, being able to learn complex patterns in high dimensions.

Quian [9] showed the outperformance of several ma2.2. The “Adaptive Markets” Hypothesis chine learning algorithms (logistic regression, Neural Traditional financial theories assume that market par- Network (NN), Support Vector Machines (SVM), denoisticipants are rational and with the same utility function. ing Autoencoders) compared to ARIMA. Gu et al. [10] When these assumptions are violated, behavioural biases compared ordinary least squared regression to tree-based such as dissimilar risk appetite and inconsistent probabil- methods and NNs on the task of predicting equity excess ity beliefs play a significant role to account for abnormal returns for US stocks. The authors ascribe the superior profit opportunities. Grinblatt et al. [ 6 ] showed that in- performance of machine learning techniques to the abildividuals’ personal traits and sentiment tend to afect ity to accommodate a large number of predictors and trading behavior. Bernstein [7] points out how financial learn non-linear relationships between covariates. markets undergo regime shifts through an evolutionary process, which explains the market dynamics. 3. Main Algorithm Deployed and

The Adaptive Market Hypothesis (AMH) [8] ofers an alternative framework: diferent agents interact and Features utilised adapt in response to ever changing market environment, competing to capture returns opportunities.

Machine learning, being adaptive in nature, has the set of tools to conduct research in the new paradigm: models are non-parametric in nature and have the capability to approximate complex non-linear, non-continuous functions.

In this section, diferent machine learning algorithms and their applications to stock market time series are presented. Figure 1 depicts the ML algorithms and their related references. Here the literature is reviewed focusing on the main steps required for an end-to-end machine learning application, rather than looking, almost exclusively, at the diferent algorithms deployed.

3.1. Peculiarities in Financial Timeseries index constituents.

An interesting feature engineering technique that tries Framing time series forecasting problems as regression to capture both past observations and market regime can might present an issue in the financial domain: the pre- be seen in [15]. The authors implement a relative change diction might be very close to the actual value and yet transformation of the predictors in each data window: induce the wrong action. For example, given a value of each element in a sample is subtracted and divided by the the S&P 500 of 4200, assume the 1-day forward forecast ifrst observation of the sequence. As a result, the network generated by two diferent models to be 4210 and 4150 re- will be learning from a price change that is put in the spectively, with the actual future level being 4190: while context of the short-term market environment in which the first model is more accurate, it will suggest entering it was generated. Possible avenues of research would be a long position causing a loss. The second model, despite to devise ad hoc methods to rescale data dynamically. being further away from the observed value, would trig- The overwhelming majority of the papers surveyed ger the correct (profitable) action, establishing a short in this study employs technical indicators [16] as adposition. ditional features. Patel et al. [17] ran experiments on

In several time series classification studies, the labeling Indian equities using diferent representations of techniis conditional on a gating value g: an instance is labelled cal indicators: in their original form and in discrete form. +1 if the response variable > , -1 otherwise. For The latter, called “trend-deterministic,” is obtained by example, Ballings et al. [11] compared the performance discretising continuous values of oscillators (e.g the Relof ensemble methods against single classifiers on the task ative Strength Index, RSI) in two categories: +1, -1. Their to group European shares based on whether they will study shows that the trend deterministic representation be up at least 25% in a year. Dixon et al. [12] framed improves results irrespective of the algorithm. securities time series problems as multiclass attributing Sentiment analysis is performed to take the pulse of +1 to a positive movement, -1 to a downward direction the market participants’ emotional state. Extracting sentiand 0 as flat market, setting a value to determine ‘no ment from social media, particularly for short time frames action’ in order to balance the classes. Labelling examples, applications has gained paramount importance given the using an arbitrary fixed threshold does not take into rise of retail traders, whose decisions combined can have account that returns are realised in a specific market a major impact on price formation. Sentiment indicators regime. Defining for example a threshold of ± 5% to can be the direct result of investors polls like for example, determine the bounds of the 0 class, might be appropriate the American Association of Individual Investors (AAII) for a quiet market environment, however, it could be survey. The Investopedia portal2 computes the Investopetoo small in a high volatility phase. Research should be dia Anxiety Index (IAI) based on their readers’ interests conducted to scale raw data in ways that would result in for topics such as macroeconomics, negative market and predictors that carry information content related to the credit disruptions. respective market regime. Schumaker and Chen [18] proposed a model to predict

Accuracy in time series classification problems might the intraday share price of the companies after a piece not reflect profitability if a model fails to capture the of news was released. They used as predictors the price correct direction for large moves. For example, consider forecast using a linear regression model and features the sequence of returns +0.8%, +1%, +0.5%, -3%, and two derived from text analysis of news articles. Kazemian binary classifiers (+1, -1): the first predicts four upwards et al. [19] deployed an SVM to classify the sentiment movements, it is therefore 75% accurate. The second extracted from financial news on US stocks. Bollen et classifier outputs -1, -1, +1, -1: it is 50% accurate. If the al. [20] showed that the emotional state, particularly the outputs were to be fed into a trading system, the first ‘Calm’ dimension, of the broad population has predictive classifier would generate a 0.7% loss while the second power on the daily changes in the DJIA. Othan et al. would yield a profit of 1.7%. An approach to mitigate [21] proposed a model to predict the direction of Turkish this issue, suggested by industry practitioners [13] is to stocks based exclusively on sentiment extracted from train a model to predict returns and then value it based Twitter posts. on direction accuracy (also called hit ratio). It is noteworthy that these sentiment centred techniques aim to output short term predictions. It would 3.2. Features be interesting to extend this research over longer time Despite the potential of machine learning algorithms to horizons, using texts from commentators with domain extract knowledge in many dimensions, several studies knowledge, for example analysing blog posts from the do not go beyond using lags of the target series as pre- likes of “Seeking Alpha”. dictors. For example, Fischer and Krauss [14] used only Cho et al. [22] simulated an investment strategy based percentage changes of adjusted prices of the S&P 500

2Please see www.investopedia.com

on the text characteristics of equity research reports published by brokerage houses. The authors deployed Part of Speech tagging to engineer features such as: numbers of nouns, adjectives, number of sentences. Classifiers, using the text extracted features, were trained on the task of recognising successful buy recommendations on Korean equities.

Fundamental analysis does much more than assessing the intrinsic value of an asset or company given that it includes both macro-econometrics and micro-econometrics approaches. As such it focuses on financial statements and macroeconomic data. Specific attention is warranted to avoid look-ahead bias: new data releases and revisions should enter the time series only when the updated information is disclosed. For example, quarterly GDP data, after the initial release, are later revised twice to get to the final figure.

Only a limited number of studies in the field adopt fundamental data; this is probably due, on one hand to the aforementioned issues, and on the other to the low frequency nature. Olson and Mossman [23] studied predictive modelling of Canadian stock excess returns and direction using as inputs 61 accounting ratios. Tsai et al. [24] studied the application of ensemble learning methods to Taiwanese stock market quarterly direction forecast. Predictors included company financial ratios and macroeconomic indicators.

Fundamental reported data are backward-looking. Alberg et al. [25] and Chauhan et al. [26] while studying neural network applications to factor investing [27] in US stocks, built a predictive model of companies’ fundamental metrics. Stocks were then ranked and picked to form portfolios based on the forecasted fundamental values. These studies show that investing based on predicted company data outperforms stock selection based on reported information. An alternative solution could be using, as predictors consensus forecast figures from the financial analyst community. This approach would also allow sampling data with a frequency higher than the quarterly earnings cycle.

As summarised in Table 1, technical features can be easily retrieved from data vendors or calculated from price series. They are mostly used by practitioners to have short term timing indications, albeit with mixed results. Studying sentiment allows to read the emotional state of market participants. Sentiment data is mostly unstructured therefore, requires specific Natural Language Processing (NLP) tasks. Fundamental data are low frequency and potentially subject to the dangers of lookahead bias. However, this kind of data reveals the state of the economy. A predictive model aiming to generate a medium- or long-term forecast should take fundamental variables in consideration.

The vast majority of research consulted for this study used only one or two types of features, as summarised in Feature type Technical Sentiment Fundamental

Examples RSI Momentum AAII Twitter Data GDP P/E

Pros Easy to compute Capture emotional st.

Medium- or Long-term

Cons Mixed results Requires NLP Look-ahead bias

3.3. Ensemble Learning

Ensemble methods prescribe combing multiple algorithms to achieve better performance than inducing only the best member. Ensemble architectures are determined by how a population of diverse models is deployed and by the choice of aggregation mechanism to derive the features’ values, and then feed this output into a diferent model to produce the estimated index close price.

SVM incurs in high computational costs with a large dataset. The adoption of deep learning has grown in popularity thanks to the capability to overcome SVM scalability problems without sacrificing performance.

3.5. Deep Learning Feed forward neural networks extend linear models

by composing diferent functions that might carry an Figure 4: Markets used for experiments. Global encompasses element of non-linearity. As a result, this is equivaresearch using securities from multiple geographic areas. Oth- lent to expanding the linear problem = + to ers refer to securities of diferent asset classes, e.g., futures on = ((, )) + . An example of activation function indices and commodities. is the well known Rectified Linear Unit (ReLU). In this study the terms Deep Learning and Deep Neural Network (DNN) are used for architectures encompassing ifnal prediction. The Random Forest (RF) algorithm [ 28] more than 3 layers. represents a prototypical example. Chong et al. [35] run experiments on stocks listed on

In Krauss et al. [29], a statistical arbitrage strategy is the Korean market to make intraday returns prediction simulated after an ensemble architecture with majority (5 minutes). The authors considered a network to be voting (composed by a deep neural netwowk, gradient deep with just three hidden layers. Over the subsequent boosted tree and RF) has classified S&P 500 constituents few years, structures significantly more complex will according to the probability to outperform the index. be developed. The impact of the network depth on a Yuan et al. [30], deployed NN, SVM and RF, to classify ifnancial time series classification task performance was Chinese stocks based on belonging to the top 30% of studied in [36]. companies in generating excess return over one month.

The authors highlight an issue arising adopting stan- 3.5.1. Recurrent Neural Networks dard k–folds cross validation for time series problems: future data might be used to predict past observations, A Recurrent Neural Network (RNN) processes at any overstating the performance. They evaluate algorithms given stage as inputs information from that time step, using a sliding window train and test set instead. Evaluat- and a hidden state derived from the previous one. Dixon ing a model on periods prior to the training set might be et al. [37] show that a RNN, with linear activation, can desirable in certain cases: for example, for stress testing be assimilated to a Auto Regressive time series model. purposes. The output of a given time step can be written as: = ( + − 1 + ).

The Long Short Term Memory (LSTM) cell [38] over3.4. Support Vector Machines comes the RNNs limitations (exploding or vanishing graThe Support Vector Machine (SVM) [31] is a very efective dient). A LSTM neuron adds to the recurrent unit hidden algorithm able to solve non-linearly separable problems state (short term memory), a long-term cell state conby enlarging the feature space thanks to the “kernel trick”. trolled by diferent gates. LSTM network can therefore Using the kernel functions allow to solve the problem in handle long sequences. the original dimension without having to transform the A comparison of LSTM to RF, NNs can be seen in [39]: data in a more complex space. the task in question was the prediction of the direction of

SVMs were compared in [32] to linear and quadratic 5 components of the BOVESPA index in a high frequency discrimination analysis, and Elman network to forecast setting. Fischer and Krauss [14] used as feature only the the weekly direction of the Nikkei 225 index. Zbikowski return series of the past 240 observations to classify 1 [33] proposed to consider volume data by plugging this day forward returns, showing the ability of LSTM cells to information directly in the SVM formulation, multiplying learn long sequences. Alonso et al. [13] tackled predictthe hyper-parameter representing the cost of margin ing the 1 day ahead return and the related direction of violation, C, by a coeficient v based on the transaction 50 constituents of the S&P 500. This study showed that volume in a given security over the input window. Patel increasing the number of time steps to train the model et al. [34] deployed a 2 stages model with the task of from 1 to 10 improves the performance of the LSTM. predicting forward closing prices. Their architecture comprises a Support Vector Regressor (SVR) to predict the 3.5.2. Convolutional Neural Networks that have proven to be successful in other domains.

A recent stream of research leverages the success achieved in computer vision directly: time series data is transformed in order to assume the characteristics of pixel values and intensity. Cohen et al. [44] converts a time series classification problem (spotting technical patterns) as an image recognition task; Zeng et al. [45] ported a multivariate time series forecasting problem in US equities to the video prediction domain.

Convolutional Neural Networks (CNNs) have been particularly efective in computer vision. Convolution layers exhibit equivariance to translation; in time series data this property allows the algorithm, to detect patterns within a sample and recognise them, notwithstanding at which point in time they appear. Convolution filters over 1-dimension (1D) can be deployed to slide across time, deriving hidden patterns within samples. 1-dimensional convolution layers can be deployed as feature extractors before LSTMs, forming a CNN-LSTM architecture 3.5.3. Autoencoders proposed in [15]. Autoencoders (AE) are models conceived to replicate

The dilation rate d of a convolution layer indicates their inputs without supervision. The design usually prethe frequency of the number of elements in the input to scribes an input layer, an encoding layer with a smaller which apply a convolution. Researchers from DeepMind number of units and a decoding layer of the same size conceived an architecture called WaveNet [40], able to of the input, providing a reconstructed representation. achieve state-of-the-art performance in several text to Autoencoders can be applied to financial time series prespeech tasks without the use of RNNs. It is obtained by diction problems to reduce dimensions or noise. Troiano stacking several 1D convolution layers, doubling the dila- et al. [46] focused on the impact of the feature reduction tion rate at every layer. Thus, the receptive field is larger, stage (starting from 40 technical indicators). Denoising albeit without additional parameters. The initial layers Autoencoders were used in [9] in order to create a lalearn short term patterns, while as data progresses to- tent feature representation, adding a noise component wards the output, sequences of longer term are extracted. that randomly alters the raw data, forcing the algorithm

Borovykh et al. [41] adapted the Wavenet architec- to reconstruct robust inputs. Bao et al. [47] stacked 5 ture, using ReLU as activation function, for multivariate autoencoders connecting the final output to a LSTM netifnancial time series forecasting: 1D dilated causal con- work to predict the one step ahead level of several stock volutions run independently for each input time series market indices. to be combined before making the prediction. Experiments were run on several financial instruments, on 1 3.5.4. Generative Adversarial Networks day forward return prediction tasks. Wavenet and LSTM performed equally in terms of forecasting the direction, Generative Adversarial Networks (GANs) [48] are comhowever, Wavenet outputs more accurate point estima- posed by 2 competing models: the generator aims to tion. Borjesson et al. [42] proposed a WaveNet inspired approximate the data distribution and outputs synthetic model, using as activation function the Scaled exponen- samples; the discriminator takes either actual or syntial Linear Unit (SeLU) for the convolution layers. They thetic data and estimates the probability that the sample ran experiments with the goal of predicting the next day in question is genuine. The generator aims to maximise price and trend of the S&P 500. the probability that the discriminator will make a mistake.

Convolutions over two dimensions (2D) could be used Zhou et al. [49] applied GANs to make one step ahead to extract the most salient features and the most signifi- predictions in 42 Chinese stocks within a high frequency cant sub sequence in a sample. This approach, recently framework. The generator is an LSTM network, while a emerging in the literature, ofers the advantage to learn 1D-CNN plays the role of the discriminator. jointly patterns within each predictor time series, and Back testing is a common approach to measure the the relationship between features. This could be particu- profitability of a model against past trends in the precise larly advantageous with economic time series, where the order in which they occurred. GANs could be employed correlation between variables changes over time, albeit to provide synthetic test sets, as if engineers were forin a recurrent fashion. ward testing algorithms devised by researchers. A recent

Gudelek et al. [43] used 2D convolutions simulat- architecture fit for this purpose is Time–series Generative ing trading strategies on several Exchange Traded Fund Adversarial Network (TimeGAN) [50]. This framework (ETFs). Prices were diferenced once to mitigate the is- was conceived to capture the time dependent conditional sues of non-stationarity and transformed to be in a range distribution of data. In addition to the unsupervised adbetween +1 and –1. The authors interpret the prediction versarial loss, the model prescribes the use of a supervised in this range as confidence values. It would be interesting loss based on the original data. to conduct further research applying to financial time series tasks, architectures centred on 2D convolutions 3.5.5. Attention Attention mechanisms, pioneered by Bahdanau et al. [51], allow a decoder to focus at each step of a sequence on the most relevant (encoded) input. The decoder computes a weighted sum of the output of the encoder; the weights are learned by an attention layer, using as inputs the encoder output concatenated with the decoder previous hidden state. These techniques, developed for neural machine translation problems, have been applied to time series forecasting: attention mechanism weights diferently each time step of a sequence, this is then fed to a forecasting model to derive a prediction. Zhang et al. [52]proposed the AT-LSTM architecture: an attention mechanism using LSTM as encoder, assigns diferent weights to input features (time steps). Attention weighted sequences are than used as inputs into an LSTM network to output the prediction. A more complex version of this architecture can be seen in [53]: here the input series is ifrst encoded with an LSTM and an attention layer assigns weights to the features at each time step, obtaining an attention weighted features matrix. In the next stage another LSTM based attention mechanism weights the diferent hidden states across time steps. The final stage prescribes an LSTM bloc to make a prediction using as inputs the output of the previous stage and the target time series. This Dual-Stage Attention RNN (DA-RNN) aims to capture the most important features while learning time dependencies.

3.6. Underexplored Areas

So far we have discussed researchers tackling the problems of non-stationarity of financial data, diferent market regimes and low signal to noise ratio, deploying more and more complex algorithms. Further study should be conducted focusing on feature engineering: rescaling and labelling examples considering the related market environment, therefore avoiding a fixed arbitrary threshold to define classes, would put the data in relation to the context in which they were generated.

Framing the prediction problem in terms of classification is vulnerable to not identifying large movements. On the other hand, developing a model to predict future securities prices could have the pitfall that a fairly accurate point estimate, albeit in the wrong direction, could result in a loss-making course of action.

Research could be pursued in developing alternative approaches in terms of training, for example, conceiving models learning jointly regression and classification tasks.

In contrast with many of the studies reviewed that use as input the target series and technical indicators, further research could be conducted combining features from diferent domains. A diverse data pool could point towards unexplored avenues in economics research.

The literature reviewed in this study adopts input time series exclusively in tabular form, however, financial professional place considerable value in extracting knowledge from the relationship and interaction amongst different variables. Graph Neural Networks (GNNs) [54] are able to extract knowledge from the interplay between diferent nodes in a graph, therefore could represent a novel approach, worthy of further research in our field. An exploratory study related to Japanese equities can be seen in [55]. The authors motivated the use of GNNs to leverage inter-market and inter-company relationships.

The main paradigm adopted by both industry and academia when devising a machine learning solution has been to consider a dataset as fixed, focusing on algorithm development. This survey shows that there is potential for progress, keeping the algorithm fixed, placing the data at the centre of the research process instead.

Ensemble learning approaches combining neural networks have been the winning architecture in image recognition competitions. Similar ideas could be explored, applying diversified network ensembles to financial data. Diferent models could be trained on diferent market regimes. One way to deal with a changing environment, is to constantly discard old data and retrain the model with more recent examples. Nevertheless, often, market dynamics observed in the past occur in a similar way at a later stage. Conceiving a way to use past, however, relevant information is an interesting avenue to pursue further study.

While the value of machine learning methods to equity time series forecasting has been shown in the short term, it would be interesting to test further these techniques sampling data with lower frequency. Extending findings to monthly data would graduate machine learning methods to applications beyond the realm of trading.

This survey shows how the playground for machine learning experiment in equities is the S&P 500. Few studies try to corroborate findings extending the research to diferent countries. European Equities in particular, emerge as an area which remains rather under-explored.

Moreover, given that diferent algorithms are simulated on diferent data, it is dificult to assess what could be considered the state of the art. It would be very fruitful for the field if, perhaps as a cooperative project between industry and academia, diferent financial multivariate datasets ( e.g., one for US equities and one for European shares) would be engineered as standard, in order to provide a more objective common ground to conduct research.

4. Concluding Remarks

confidence, and trading activity, The Journal of Finance 64 (2009) 549–578. URL: http://www.jstor.

Researchers tackled the problem of equity market fore- org/stable/20487979. casting, initially deploying statistical time-series forecast- [7] P. L. Bernstein, Why the eficient market ofers ing techniques and then experimenting with complex hope to active management, Journal of Applied deep learning architectures. Corporate Finance 12 (1999) 129–136.

In particular, LSTM has been proven to be useful in [8] A. W. Lo, Adaptive Markets: Financial Evolution at solving the vanishing/exploding gradient problem while the Speed of Thought, Princeton University Press, ofering the advantage of modelling non-linear time se- 2017. URL: http://www.jstor.org/stable/j.ctvc77k3n. ries data. Dilated convolutions, on their own, or as fea- [9] X.-Y. Qian, S. Gao, Financial series prediction: Comture extractors, constitute an efective technique when parison between precision of time series models dealing with long sequences. C2D centred architec- and machine learning methods, 2017. tures are an emerging method capable of extracting local [10] S. Gu, B. T. Kelly, D. Xiu, Empirical asset pricknowledge about the interaction of diferent features. ing via machine learning, 2018. URL: https://doi.

With this review, we advocate the pursuit of research org/10.2139%2Fssrn.3281018. doi:10.2139/ssrn. on every component of the machine learning value chain, 3281018. rather than focusing exclusively on the algorithmic core. [11] M. Ballings, D. V. den Poel, N. Hespeels, R. Gryp, Most studies show results only on a specific market or Evaluating multiple classifiers for stock price direcrelated to a specific period, thus, the general robustness tion prediction, Expert Syst. Appl. 42 (2015) 7046– of findings could be improved. Applying machine learn- 7056. URL: https://doi.org/10.1016/j.eswa.2015.05. ing to financial time series is a challenging, however, 013. doi:10.1016/j.eswa.2015.05.013. rewarding endeavour. Given the importance of the de- [12] M. Dixon, D. Klabjan, J. H. Bang, Classificationcisions based on equity market forecasts, even a small based ifnancial markets prediction using improvement in model performance can have a major deep neural networks, CoRR abs/1603.08604 impact. (2016). URL: http://arxiv.org/abs/1603.08604. arXiv:1603.08604.

References [13] T. Guida, Big Data and Machine Learning in Quantitative Investment, Wiley, 2018. [14] T. Fischer, C. Krauss, Deep learning with long short-term memory networks for financial market predictions, Eur. J. Oper. Res. 270 (2018) 654– 669. URL: https://doi.org/10.1016/j.ejor.2017.11.054.

doi:10.1016/j.ejor.2017.11.054. [15] J. Eapen, D. Bein, A. Verma, Novel deep learning model with CNN and bi-directional LSTM for improved stock market index prediction, in: IEEE 9th Annual Computing and Communication Workshop and Conference, CCWC 2019, Las Vegas, NV, USA, January 7-9, 2019, IEEE, 2019, pp. 264–270.

URL: https://doi.org/10.1109/CCWC.2019.8666592. doi:10.1109/CCWC.2019.8666592.

1462204. 031. doi:10.1016/j.ejor.2016.10.031. [19] S. Kazemian, S. Zhao, G. Penn, Evaluating senti- [30] X. Yuan, J. Yuan, T. Jiang, Q. U. Ain, Integrated ment analysis in the context of securities trading, in: long-term stock selection models based on feature Proceedings of the 54th Annual Meeting of the As- selection and machine learning algorithms for china sociation for Computational Linguistics, ACL 2016, stock market, IEEE Access 8 (2020) 22672–22685. August 7-12, 2016, Berlin, Germany, Volume 1: Long URL: https://doi.org/10.1109/ACCESS.2020.2969293. Papers, The Association for Computer Linguistics, doi:10.1109/ACCESS.2020.2969293. 2016. URL: https://doi.org/10.18653/v1/p16-1197. [31] C. Cortes, V. Vapnik, Support-vector netdoi:10.18653/v1/p16-1197. works, Mach. Learn. 20 (1995) 273–297. URL: [20] J. Bollen, H. Mao, X. Zeng, Twitter mood pre- https://doi.org/10.1007/BF00994018. doi:10.1007/ dicts the stock market, J. Comput. Sci. 2 (2011) BF00994018. 1–8. URL: https://doi.org/10.1016/j.jocs.2010.12.007. [32] W. Huang, Y. Nakamori, S.-Y. Wang, Forecasting doi:10.1016/j.jocs.2010.12.007. stock market movement direction with support vec[21] D. Othan, Z. H. Kilimci, M. Uysal, Financial sen- tor machine, Computers & operations research 32 timent analysis for predicting direction of stocks (2005) 2513–2522. using bidirectional encoder representations from [33] K. Zbikowski, Using volume weighted support vectransformers (bert) and deep learning models, in: tor machines with walk forward testing and feature Proc. Int. Conf. Innov. Intell. Technol., volume 2019, selection for the purpose of creating stock trad2019, pp. 30–35. ing strategy, Expert Syst. Appl. 42 (2015) 1797– [22] P. Cho, J. H. Park, J. W. Song, Equity research report- 1805. URL: https://doi.org/10.1016/j.eswa.2014.10. driven investment strategy in korea using binary 001. doi:10.1016/j.eswa.2014.10.001. classification on stock price direction, IEEE Access [34] J. Patel, S. Shah, P. Thakkar, K. Kotecha, Predicting 9 (2021) 46364–46373. stock market index using fusion of machine learn[23] D. Olson, C. Mossman, Neural network forecasts ing techniques, Expert Syst. Appl. 42 (2015) 2162– of canadian stock returns using accounting ratios, 2172. URL: https://doi.org/10.1016/j.eswa.2014.10. International Journal of Forecasting 19 (2003) 453– 031. doi:10.1016/j.eswa.2014.10.031. 465. [35] E. Chong, C. Han, F. C. Park, Deep learning net[24] C. Tsai, Y. Lin, D. C. Yen, Y. Chen, Predicting works for stock market analysis and prediction: stock returns by classifier ensembles, Appl. Soft Methodology, data representations, and case studComput. 11 (2011) 2452–2459. URL: https://doi.org/ ies, Expert Systems with Applications 83 (2017) 10.1016/j.asoc.2010.10.001. doi:10.1016/j.asoc. 187–205.

2010.10.001. [36] X. Zhong, D. Enke, Predicting the daily return [25] J. Alberg, Z. C. Lipton, Improving factor-based direction of the stock market using hybrid machine quantitative investing by forecasting company fun- learning algorithms, Financial Innovation 5 (2019) damentals, CoRR abs/1711.04837 (2017). URL: http: 1–20.

//arxiv.org/abs/1711.04837. arXiv:1711.04837. [37] M. F. Dixon, I. Halperin, P. Bilokon, Machine learn[26] L. Chauhan, J. Alberg, Z. C. Lipton, Uncertainty- ing in Finance, volume 1406, Springer, 2020. aware lookahead factor models for quantitative in- [38] S. Hochreiter, J. Schmidhuber, Long short-term vesting, in: Proceedings of the 37th International memory, Neural Comput. 9 (1997) 1735–1780. URL: Conference on Machine Learning, ICML 2020, 13- https://doi.org/10.1162/neco.1997.9.8.1735. doi:10. 18 July 2020, Virtual Event, volume 119 of Proceed- 1162/neco.1997.9.8.1735. ings of Machine Learning Research, PMLR, 2020, pp. [39] D. M. Nelson, A. C. Pereira, R. A. De Oliveira, Stock 1489–1499. URL: http://proceedings.mlr.press/v119/ market’s price movement prediction with lstm neuchauhan20a.html. ral networks, in: 2017 International joint confer[27] E. F. Fama, K. R. French, A five-factor asset pricing ence on neural networks (IJCNN), Ieee, 2017, pp. model, Journal of financial economics 116 (2015) 1419–1426.

1–22. [40] A. van den Oord, S. Dieleman, H. Zen, K. Si[28] L. Breiman, Random forests, Mach. monyan, O. Vinyals, A. Graves, N. KalchbrenLearn. 45 (2001) 5–32. URL: https://doi.org/ ner, A. W. Senior, K. Kavukcuoglu, Wavenet: A 10.1023/A:1010933404324. doi:10.1023/A: generative model for raw audio, in: The 9th 1010933404324. ISCA Speech Synthesis Workshop, Sunnyvale, CA, [29] C. Krauss, X. A. Do, N. Huck, Deep neural networks, USA, 13-15 September 2016, ISCA, 2016, p. 125. gradient-boosted trees, random forests: Statistical URL: http://www.isca-speech.org/archive/SSW_ arbitrage on the s&p 500, Eur. J. Oper. Res. 259 (2017) 2016/abstracts/ssw9_DS-4_van_den_Oord.html. 689–702. URL: https://doi.org/10.1016/j.ejor.2016.10. [41] A. Borovykh, S. M. Bohté, C. W. Oosterlee, Conditional time series forecasting with convolutional Information Processing Systems 32: Annual neural networks, 2017. Conference on Neural Information Processing [42] L. Börjesson, M. Singull, Forecasting financial time Systems 2019, NeurIPS 2019, December 8-14, 2019, series through causal and dilated convolutional Vancouver, BC, Canada, 2019, pp. 5509–5519. URL: neural networks, Entropy 22 (2020) 1094. URL: https://proceedings.neurips.cc/paper/2019/hash/ https://doi.org/10.3390/e22101094. doi:10.3390/ c9efe5f26cd17ba6216bbe2a7d26d490-Abstract. e22101094. html. [43] M. U. Gudelek, S. A. Boluk, A. M. Ozbayoglu, A deep [51] D. Bahdanau, K. Cho, Y. Bengio, Neural machine learning based stock trading model with 2-d CNN translation by jointly learning to align and transtrend detection, in: 2017 IEEE Symposium Series on late, in: Y. Bengio, Y. LeCun (Eds.), 3rd InternaComputational Intelligence, SSCI 2017, Honolulu, tional Conference on Learning Representations, HI, USA, November 27 - Dec. 1, 2017, IEEE, 2017, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, pp. 1–8. URL: https://doi.org/10.1109/SSCI.2017. Conference Track Proceedings, 2015. URL: http: 8285188. doi:10.1109/SSCI.2017.8285188. //arxiv.org/abs/1409.0473. [44] N. Cohen, T. Balch, M. Veloso, Trading via image [52] X. Zhang, X. Liang, A. Zhiyuli, S. Zhang, R. Xu, classification, in: T. Balch (Ed.), ICAIF ’20: The B. Wu, At-lstm: An attention-based lstm model for First ACM International Conference on AI in Fi- ifnancial time series prediction, in: IOP Conference nance, New York, NY, USA, October 15-16, 2020, Series: Materials Science and Engineering, volume ACM, 2020, pp. 53:1–53:6. URL: https://doi.org/ 569, IOP Publishing, 2019, p. 052037. 10.1145/3383455.3422544. doi:10.1145/3383455. [53] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, 3422544. G. W. Cottrell, A dual-stage attention-based re[45] Z. Zeng, T. Balch, M. Veloso, Deep video pre- current neural network for time series prediction, diction for time series forecasting, in: A. Cali- in: C. Sierra (Ed.), Proceedings of the Twentynescu, L. Szpruch (Eds.), ICAIF’21: 2nd ACM In- Sixth International Joint Conference on Artificial ternational Conference on AI in Finance, Virtual Intelligence, IJCAI 2017, Melbourne, Australia, AuEvent, November 3 - 5, 2021, ACM, 2021, pp. 39:1– gust 19-25, 2017, ijcai.org, 2017, pp. 2627–2633. 39:7. URL: https://doi.org/10.1145/3490354.3494404. URL: https://doi.org/10.24963/ijcai.2017/366. doi:10. doi:10.1145/3490354.3494404. 24963/ijcai.2017/366. [46] L. Troiano, E. Mejuto, P. Kriplani, On feature re- [54] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, duction using deep learning for trend prediction L. Wang, C. Li, M. Sun, Graph neural networks: in finance, CoRR abs/1704.03205 (2017). URL: http: A review of methods and applications, AI Open 1 //arxiv.org/abs/1704.03205. arXiv:1704.03205. (2020) 57–81. [47] W. Bao, J. Yue, Y. Rao, A deep learning framework [55] D. Matsunaga, T. Suzumura, T. Takahashi, Explorfor financial time series using stacked autoencoders ing graph neural networks for stock market predicand long-short term memory, PloS one 12 (2017) tions with rolling window analysis, arXiv preprint e0180944. arXiv:1909.10660 (2019). [48] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza,

B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, Y. Bengio, Generative adversarial nets, in: Z. Ghahramani, M. Welling, C. Cortes, N. D.

Lawrence, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, 2014, pp. 2672–2680. URL: https://proceedings.neurips.cc/paper/2014/hash/ 5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html. [49] X. Zhou, Z. Pan, G. Hu, S. Tang, C. Zhao, Stock market prediction on high-frequency data using generative adversarial nets., Mathematical Problems in Engineering (2018). [50] J. Yoon, D. Jarrett, M. van der Schaar, Time-series generative adversarial networks, in: H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, R. Garnett (Eds.), Advances in Neural

[1]

E. F.

Fama , Eficient capital markets: A review of theory and empirical work , The Journal of Finance 25 ( 1970 ) 383 - 417 . URL: http://www.jstor. org/stable/2325486.

[2]

Jegadeesh ,

Titman , Returns to buying winners and selling losers: Implications for stock market eficiency , The Journal of Finance 48 ( 1993 ) 65 - 91 . URL: https://onlinelibrary.wiley.com/doi/ abs/10.1111/j.1540- 6261 . 1993 .tb04702.x. doi:https: //doi.org/10.1111/j.1540- 6261 . 1993 . tb04702.x.

[3] B. G. Malkiel, The eficient market hypothesis and its critics , The Journal of Economic Perspectives 17 ( 2003 ) 59 - 82 . URL: http://www.jstor.org/stable/ [16]

J. J.

Murphy , Technical Analysis of the Futures Mar3216840. kets: A Comprehensive Guide to Trading Methods

[4]

S. J.

Grossman ,

J. E.

Stiglitz , On the impossibil- and Applications , Prentice Hall, 1986 . ity of informationally eficient markets , American [17]

Patel ,

Shah ,

Thakkar , K. Kotecha, PredictEconomic Review 72 ( 1982 ) 393 - 408 . doi:https: ing stock and stock price index movement using //doi .org/10.7916/D8765R99. trend deterministic data preparation and machine

[5]

Bauer ,

Cosemans ,

Eichholtz , Op- learning techniques, Expert Syst. Appl . 42 ( 2015 ) tion trading and individual investor performance , 259 - 268 . URL: https://doi.org/10.1016/j.eswa. 2014. Journal of Banking & Finance 33 ( 2009 ) 731 - 07 .040. doi: 10 .1016/j.eswa. 2014 . 07 .040. 746. URL: https://www.sciencedirect.com/science/ [18]

R. P.

Schumaker ,

Chen , Textual analysis of article/pii/S0378426608002720 . doi:https://doi. stock market prediction using breaking financial org/10 .1016/j.jbankfin. 2008 . 11 .005. news: The azfin text system , ACM Trans. Inf.

[6]

Grinblatt ,

Keloharju , Sensation seeking, over- Syst. 27 ( 2009 ) 12 : 1 - 12 : 19 . URL: https://doi.org/ 10.1145/1462198.1462204. doi: 10 .1145/1462198.