Machine Learning Methods for Equity Time Series Forecasting: A Compendium Alberto Matuozzo1 , Paul D. Yoo1 , Alessandro Provetti1 and Maria H. Kim2 1 Birkbeck College, University of London, Malet St, London, WC1E 7HX, United Kingdom 2 University of Wollongong, Wollongong, NSW 2522, Australia Abstract Machine learning is a method of building predictive models using a vast amount of data from different sources, capturing non-linear relationships between different variables. As a result, financial markets in general and stock markets in particular, offer a promising ground for the application of such method. This survey examines machine learning methods for equity market forecasting, identifying gaps in current knowledge and suggesting potential avenues to pursue further research. Computer science-centred quantitative studies have focused mainly on algorithms, testing results mostly on US data on short time-frames, yet, feature engineering, and testing findings on different markets and different time horizons, appear to be under-explored. This study thus introduces the financial context for non-experts and moves to review different models and tools in the realm of statistical learning, and deep learning. We believe that this approach will prove to be effective in financial practice to an interested reader without much prior knowledge of the finance literature. We survey the end-to-end deployment of machine learning to help readers from industry and academia to understand the peculiarities of applying these methods to equity market forecasting. Keywords Machine Learning, Deep Learning, Time Series Forecasting, Equity market forecasting 1. Introduction publications and original contributions from the field for the critical steps of the machine learning deployment Devising Equity markets forecasting is relevant not only process, from pre-processing to algorithm selection. We for the parties directly involved in price formation (e.g., cite papers that are now part of history of financial the- companies and investors), but also for policymakers and ory for the benefit of non-experts. Research from other regulators. Central banks have shown a growing interest domains is included where it is reasonable to assume that in modelling equities to decide macro prudential policy, the related techniques could be ported to financial time assess investors’ attitude towards risk, and in some cases series forecasting. deploy capital in the market directly. For example, ac- The contributions of this study are as follows: cording to the 30th of June 2022 13F SEC filings 1 , the Swiss National Bank owns about $11.11 billions of Apple • We review, with the caveat that the field is evolv- and $7.49 billions of Microsoft. ing at a significant speed, the main machine learn- This study is driven by the need to review machine ing solutions for equity market forecasting de- learning techniques applied to equity markets time se- ployed in terms of features and algorithms. We ries forecasting problems, with the objective to provide observe that most findings have been corrobo- an overview for practitioners, highlight areas under ex- rated on specific markets or on specific periods. plored and warranting further research. Data sampling and forecasting horizon are mostly This study explores research from finance and com- daily. puter science, from industry and academia. As such, the • We highlight potential directions for future re- selection of papers for this literature review is not purely search, emphasising the adoption of a more bibliometric. We aim to cover a selection of high impact data–centric approach. Ensemble methods should be further researched as architectures to leverage CIKM’22: Workshop on Applied Machine Learning Methods for Time Series Forecasting (AMLTS), October 21, 2022, Atlanta, GA, USA the peculiarities of financial data. * Corresponding author. $ amatuo01@mail.bbk.ac.uk (A. Matuozzo); p.yoo@bbk.ac.uk The rest of the paper is organised as follows: Section 2 (P. D. Yoo); a.provetti@bbk.ac.uk (A. Provetti); outlines financial background knowledge for non-experts. mhykim@uow.edu.au (M. H. Kim) Section 3 examines features, main algorithms deployed  0000-0002-5614-3129 (A. Matuozzo); 0000-0001-7665-8616 and related financial applications, followed by a discus- (P. D. Yoo); 0000-0001-9542-4110 (A. Provetti); 0000-0002-6279-5836 (M. H. Kim) sion on gaps in the current knowledge. The final parts of © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License this study are dedicated to areas of future work and how Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) the gaps could be addressed. 1 Please see https://www.sec.gov/edgar.shtml 2. Framing the problem The non-stationarity and non-linearity of financial vari- ables are the primary attributes to estimate a forecasting model for equity markets. 2.1. The “Efficient Markets” Hypothesis Under the Efficient Markets Hypothesis (EMH) devel- oped by Fama [1], the sequence of price changes must be unpredictable if they fully incorporate information and expectations of market participants. Yet, in the real world, analysing past returns and pro- Figure 1: Machine learning methods for equity market fore- cessing publicly available information are instrumental casting; algorithms and reference papers. to build a forecasting model. Jagadeesh and Titman [2], showed that it is possible to generate statistically sig- nificant abnormal returns, by buying the stocks that ex- hibited top decile returns in the past (i.e., winners) and 2.3. Machine Learning in Finance selling stocks in the bottom decile, losers. Malkiel [3] The emergence of big data gives rise to a paradigm shift documented several market anomalies that increase pre- in the field of financial engineering where the state-of- dictability; however, these inefficiencies do not persist. the-art techniques and data processing capacity are of If market participants firmly believed in EMH, it would utmost importance. Not only newsflow, economic and be irrational to trade, hence there would not be a finan- stock market data are widely available in copious size, cial market. Grossman and Stiglitz [4] offer the following however, unstructured data (e.g., geo-location), can be solution: the information derived by skilled investors is accessed at any time, processed in different ways. As a not entirely reflected in the market. As a result, invest- result, investors equipped with the latest technological ment research is compensated. Bauer et al. [5] showed advancement can make use of an extensive set of fea- skill persistence amongst retail option traders: top decile tures. Machine learning algorithms do not require data performers over one year outperform individuals in the transformations to make a time series stationary, being bottom decile the following year. able to learn complex patterns in high dimensions. Quian [9] showed the outperformance of several ma- 2.2. The “Adaptive Markets” Hypothesis chine learning algorithms (logistic regression, Neural Network (NN), Support Vector Machines (SVM), denois- Traditional financial theories assume that market par- ing Autoencoders) compared to ARIMA. Gu et al. [10] ticipants are rational and with the same utility function. compared ordinary least squared regression to tree-based When these assumptions are violated, behavioural biases methods and NNs on the task of predicting equity excess such as dissimilar risk appetite and inconsistent probabil- returns for US stocks. The authors ascribe the superior ity beliefs play a significant role to account for abnormal performance of machine learning techniques to the abil- profit opportunities. Grinblatt et al. [6] showed that in- ity to accommodate a large number of predictors and dividuals’ personal traits and sentiment tend to affect learn non-linear relationships between covariates. trading behavior. Bernstein [7] points out how financial markets undergo regime shifts through an evolutionary process, which explains the market dynamics. 3. Main Algorithm Deployed and The Adaptive Market Hypothesis (AMH) [8] offers an alternative framework: different agents interact and Features utilised adapt in response to ever changing market environment, In this section, different machine learning algorithms competing to capture returns opportunities. and their applications to stock market time series are Machine learning, being adaptive in nature, has the set presented. Figure 1 depicts the ML algorithms and their of tools to conduct research in the new paradigm: models related references. Here the literature is reviewed focus- are non-parametric in nature and have the capability to ing on the main steps required for an end-to-end machine approximate complex non-linear, non-continuous func- learning application, rather than looking, almost exclu- tions. sively, at the different algorithms deployed. 3.1. Peculiarities in Financial Timeseries index constituents. An interesting feature engineering technique that tries Framing time series forecasting problems as regression to capture both past observations and market regime can might present an issue in the financial domain: the pre- be seen in [15]. The authors implement a relative change diction might be very close to the actual value and yet transformation of the predictors in each data window: induce the wrong action. For example, given a value of each element in a sample is subtracted and divided by the the S&P 500 of 4200, assume the 1-day forward forecast first observation of the sequence. As a result, the network generated by two different models to be 4210 and 4150 re- will be learning from a price change that is put in the spectively, with the actual future level being 4190: while context of the short-term market environment in which the first model is more accurate, it will suggest entering it was generated. Possible avenues of research would be a long position causing a loss. The second model, despite to devise ad hoc methods to rescale data dynamically. being further away from the observed value, would trig- The overwhelming majority of the papers surveyed ger the correct (profitable) action, establishing a short in this study employs technical indicators [16] as ad- position. ditional features. Patel et al. [17] ran experiments on In several time series classification studies, the labeling Indian equities using different representations of techni- is conditional on a gating value g: an instance is labelled cal indicators: in their original form and in discrete form. +1 if the response variable 𝑦 > 𝑔, -1 otherwise. For The latter, called “trend-deterministic,” is obtained by example, Ballings et al. [11] compared the performance discretising continuous values of oscillators (e.g the Rel- of ensemble methods against single classifiers on the task ative Strength Index, RSI) in two categories: +1, -1. Their to group European shares based on whether they will study shows that the trend deterministic representation be up at least 25% in a year. Dixon et al. [12] framed improves results irrespective of the algorithm. securities time series problems as multiclass attributing Sentiment analysis is performed to take the pulse of +1 to a positive movement, -1 to a downward direction the market participants’ emotional state. Extracting senti- and 0 as flat market, setting a value to determine ‘no ment from social media, particularly for short time frames action’ in order to balance the classes. Labelling examples, applications has gained paramount importance given the using an arbitrary fixed threshold does not take into rise of retail traders, whose decisions combined can have account that returns are realised in a specific market a major impact on price formation. Sentiment indicators regime. Defining for example a threshold of ±5% to can be the direct result of investors polls like for example, determine the bounds of the 0 class, might be appropriate the American Association of Individual Investors (AAII) for a quiet market environment, however, it could be survey. The Investopedia portal2 computes the Investope- too small in a high volatility phase. Research should be dia Anxiety Index (IAI) based on their readers’ interests conducted to scale raw data in ways that would result in for topics such as macroeconomics, negative market and predictors that carry information content related to the credit disruptions. respective market regime. Schumaker and Chen [18] proposed a model to predict Accuracy in time series classification problems might the intraday share price of the companies after a piece not reflect profitability if a model fails to capture the of news was released. They used as predictors the price correct direction for large moves. For example, consider forecast using a linear regression model and features the sequence of returns +0.8%, +1%, +0.5%, -3%, and two derived from text analysis of news articles. Kazemian binary classifiers (+1, -1): the first predicts four upwards et al. [19] deployed an SVM to classify the sentiment movements, it is therefore 75% accurate. The second extracted from financial news on US stocks. Bollen et classifier outputs -1, -1, +1, -1: it is 50% accurate. If the al. [20] showed that the emotional state, particularly the outputs were to be fed into a trading system, the first ‘Calm’ dimension, of the broad population has predictive classifier would generate a 0.7% loss while the second power on the daily changes in the DJIA. Othan et al. would yield a profit of 1.7%. An approach to mitigate [21] proposed a model to predict the direction of Turkish this issue, suggested by industry practitioners [13] is to stocks based exclusively on sentiment extracted from train a model to predict returns and then value it based Twitter posts. on direction accuracy (also called hit ratio). It is noteworthy that these sentiment centred tech- niques aim to output short term predictions. It would 3.2. Features be interesting to extend this research over longer time horizons, using texts from commentators with domain Despite the potential of machine learning algorithms to knowledge, for example analysing blog posts from the extract knowledge in many dimensions, several studies likes of “Seeking Alpha”. do not go beyond using lags of the target series as pre- Cho et al. [22] simulated an investment strategy based dictors. For example, Fischer and Krauss [14] used only percentage changes of adjusted prices of the S&P 500 2 Please see www.investopedia.com on the text characteristics of equity research reports pub- Feature type Examples Pros Cons Technical RSI Easy to Mixed lished by brokerage houses. The authors deployed Part of Momentum compute results Speech tagging to engineer features such as: numbers of Sentiment AAII Capture Requires nouns, adjectives, number of sentences. Classifiers, using Twitter Data emotional st. NLP the text extracted features, were trained on the task of Fundamental GDP Medium- or Look-ahead recognising successful buy recommendations on Korean P/E Long-term bias equities. Fundamental analysis does much more than assessing Table 1 Pros and Cons of different kinds of features. the intrinsic value of an asset or company given that it in- cludes both macro-econometrics and micro-econometrics approaches. As such it focuses on financial statements and macroeconomic data. Specific attention is warranted to avoid look-ahead bias: new data releases and revisions should enter the time series only when the updated in- formation is disclosed. For example, quarterly GDP data, after the initial release, are later revised twice to get to the final figure. Only a limited number of studies in the field adopt fundamental data; this is probably due, on one hand to the aforementioned issues, and on the other to the low Figure 2: Surveyed papers grouped by kinds of features frequency nature. Olson and Mossman [23] studied pre- adopted. The combination is composed by studies using at dictive modelling of Canadian stock excess returns and least two sources of inputs, e.g., technical & sentiment. direction using as inputs 61 accounting ratios. Tsai et al. [24] studied the application of ensemble learning methods to Taiwanese stock market quarterly direction forecast. Predictors included company financial ratios and macroeconomic indicators. Fundamental reported data are backward-looking. Al- berg et al. [25] and Chauhan et al. [26] while studying neural network applications to factor investing [27] in US stocks, built a predictive model of companies’ fun- damental metrics. Stocks were then ranked and picked to form portfolios based on the forecasted fundamental values. These studies show that investing based on pre- dicted company data outperforms stock selection based on reported information. An alternative solution could Figure 3: Surveyed papers grouped by Sampling frequency. be using, as predictors consensus forecast figures from the financial analyst community. This approach would also allow sampling data with a frequency higher than the quarterly earnings cycle. Figure 2. In light of this survey more attention should be As summarised in Table 1, technical features can be devoted to expanding the feature space, via both feature easily retrieved from data vendors or calculated from engineering based on domain knowledge, and combin- price series. They are mostly used by practitioners to ing predictors from different sources of data (technical, have short term timing indications, albeit with mixed fundamental and sentiment). The literature surveyed ap- results. Studying sentiment allows to read the emotional pears to be centred on short term forecasting as shown state of market participants. Sentiment data is mostly in Figure 3. Furthermore, tests are mostly related to US unstructured therefore, requires specific Natural Lan- markets ( Figure 4). guage Processing (NLP) tasks. Fundamental data are low frequency and potentially subject to the dangers of look- 3.3. Ensemble Learning ahead bias. However, this kind of data reveals the state of the economy. A predictive model aiming to generate a Ensemble methods prescribe combing multiple algo- medium- or long-term forecast should take fundamental rithms to achieve better performance than inducing only variables in consideration. the best member. Ensemble architectures are determined The vast majority of research consulted for this study by how a population of diverse models is deployed and used only one or two types of features, as summarised in by the choice of aggregation mechanism to derive the features’ values, and then feed this output into a different model to produce the estimated index close price. SVM incurs in high computational costs with a large dataset. The adoption of deep learning has grown in popularity thanks to the capability to overcome SVM scalability problems without sacrificing performance. 3.5. Deep Learning Feed forward neural networks extend linear models by composing different functions that might carry an Figure 4: Markets used for experiments. Global encompasses element of non-linearity. As a result, this is equiva- research using securities from multiple geographic areas. Oth- lent to expanding the linear problem 𝑦 = 𝑤𝑥 + 𝑏 to ers refer to securities of different asset classes, e.g., futures on 𝑦 = 𝑤(𝑔(𝑥, 𝜃)) + 𝑏. An example of activation function indices and commodities. is the well known Rectified Linear Unit (ReLU). In this study the terms Deep Learning and Deep Neural Net- work (DNN) are used for architectures encompassing final prediction. The Random Forest (RF) algorithm [28] more than 3 layers. represents a prototypical example. Chong et al. [35] run experiments on stocks listed on In Krauss et al. [29], a statistical arbitrage strategy is the Korean market to make intraday returns prediction simulated after an ensemble architecture with majority (5 minutes). The authors considered a network to be voting (composed by a deep neural netwowk, gradient deep with just three hidden layers. Over the subsequent boosted tree and RF) has classified S&P 500 constituents few years, structures significantly more complex will according to the probability to outperform the index. be developed. The impact of the network depth on a Yuan et al. [30], deployed NN, SVM and RF, to classify financial time series classification task performance was Chinese stocks based on belonging to the top 30% of studied in [36]. companies in generating excess return over one month. The authors highlight an issue arising adopting stan- 3.5.1. Recurrent Neural Networks dard k–folds cross validation for time series problems: A Recurrent Neural Network (RNN) processes at any future data might be used to predict past observations, given stage as inputs information from that time step, overstating the performance. They evaluate algorithms and a hidden state derived from the previous one. Dixon using a sliding window train and test set instead. Evaluat- et al. [37] show that a RNN, with linear activation, can ing a model on periods prior to the training set might be be assimilated to a Auto Regressive time series model. desirable in certain cases: for example, for stress testing The output of a given time step can be written as: 𝑌𝑡 = purposes. 𝑓 (𝑊𝑥 𝑥𝑡 + 𝑊𝑦 𝑦𝑡−1 + 𝑏). The Long Short Term Memory (LSTM) cell [38] over- 3.4. Support Vector Machines comes the RNNs limitations (exploding or vanishing gra- The Support Vector Machine (SVM) [31] is a very effective dient). A LSTM neuron adds to the recurrent unit hidden algorithm able to solve non-linearly separable problems state (short term memory), a long-term cell state con- by enlarging the feature space thanks to the “kernel trick”. trolled by different gates. LSTM network can therefore Using the kernel functions allow to solve the problem in handle long sequences. the original dimension without having to transform the A comparison of LSTM to RF, NNs can be seen in [39]: data in a more complex space. the task in question was the prediction of the direction of SVMs were compared in [32] to linear and quadratic 5 components of the BOVESPA index in a high frequency discrimination analysis, and Elman network to forecast setting. Fischer and Krauss [14] used as feature only the the weekly direction of the Nikkei 225 index. Zbikowski return series of the past 240 observations to classify 1 [33] proposed to consider volume data by plugging this day forward returns, showing the ability of LSTM cells to information directly in the SVM formulation, multiplying learn long sequences. Alonso et al. [13] tackled predict- the hyper-parameter representing the cost of margin ing the 1 day ahead return and the related direction of violation, C, by a coefficient v based on the transaction 50 constituents of the S&P 500. This study showed that volume in a given security over the input window. Patel increasing the number of time steps to train the model et al. [34] deployed a 2 stages model with the task of from 1 to 10 improves the performance of the LSTM. predicting forward closing prices. Their architecture comprises a Support Vector Regressor (SVR) to predict the 3.5.2. Convolutional Neural Networks that have proven to be successful in other domains. A recent stream of research leverages the success Convolutional Neural Networks (CNNs) have been par- achieved in computer vision directly: time series data ticularly effective in computer vision. Convolution layers is transformed in order to assume the characteristics of exhibit equivariance to translation; in time series data pixel values and intensity. Cohen et al. [44] converts this property allows the algorithm, to detect patterns a time series classification problem (spotting technical within a sample and recognise them, notwithstanding at patterns) as an image recognition task; Zeng et al. [45] which point in time they appear. Convolution filters over ported a multivariate time series forecasting problem in 1-dimension (1D) can be deployed to slide across time, US equities to the video prediction domain. deriving hidden patterns within samples. 1-dimensional convolution layers can be deployed as feature extrac- tors before LSTMs, forming a CNN-LSTM architecture 3.5.3. Autoencoders proposed in [15]. Autoencoders (AE) are models conceived to replicate The dilation rate d of a convolution layer indicates their inputs without supervision. The design usually pre- the frequency of the number of elements in the input to scribes an input layer, an encoding layer with a smaller which apply a convolution. Researchers from DeepMind number of units and a decoding layer of the same size conceived an architecture called WaveNet [40], able to of the input, providing a reconstructed representation. achieve state-of-the-art performance in several text to Autoencoders can be applied to financial time series pre- speech tasks without the use of RNNs. It is obtained by diction problems to reduce dimensions or noise. Troiano stacking several 1D convolution layers, doubling the dila- et al. [46] focused on the impact of the feature reduction tion rate at every layer. Thus, the receptive field is larger, stage (starting from 40 technical indicators). Denoising albeit without additional parameters. The initial layers Autoencoders were used in [9] in order to create a la- learn short term patterns, while as data progresses to- tent feature representation, adding a noise component wards the output, sequences of longer term are extracted. that randomly alters the raw data, forcing the algorithm Borovykh et al. [41] adapted the Wavenet architec- to reconstruct robust inputs. Bao et al. [47] stacked 5 ture, using ReLU as activation function, for multivariate autoencoders connecting the final output to a LSTM net- financial time series forecasting: 1D dilated causal con- work to predict the one step ahead level of several stock volutions run independently for each input time series market indices. to be combined before making the prediction. Experi- ments were run on several financial instruments, on 1 3.5.4. Generative Adversarial Networks day forward return prediction tasks. Wavenet and LSTM performed equally in terms of forecasting the direction, Generative Adversarial Networks (GANs) [48] are com- however, Wavenet outputs more accurate point estima- posed by 2 competing models: the generator aims to tion. Borjesson et al. [42] proposed a WaveNet inspired approximate the data distribution and outputs synthetic model, using as activation function the Scaled exponen- samples; the discriminator takes either actual or syn- tial Linear Unit (SeLU) for the convolution layers. They thetic data and estimates the probability that the sample ran experiments with the goal of predicting the next day in question is genuine. The generator aims to maximise price and trend of the S&P 500. the probability that the discriminator will make a mistake. Convolutions over two dimensions (2D) could be used Zhou et al. [49] applied GANs to make one step ahead to extract the most salient features and the most signifi- predictions in 42 Chinese stocks within a high frequency cant sub sequence in a sample. This approach, recently framework. The generator is an LSTM network, while a emerging in the literature, offers the advantage to learn 1D-CNN plays the role of the discriminator. jointly patterns within each predictor time series, and Back testing is a common approach to measure the the relationship between features. This could be particu- profitability of a model against past trends in the precise larly advantageous with economic time series, where the order in which they occurred. GANs could be employed correlation between variables changes over time, albeit to provide synthetic test sets, as if engineers were for- in a recurrent fashion. ward testing algorithms devised by researchers. A recent Gudelek et al. [43] used 2D convolutions simulat- architecture fit for this purpose is Time–series Generative ing trading strategies on several Exchange Traded Fund Adversarial Network (TimeGAN) [50]. This framework (ETFs). Prices were differenced once to mitigate the is- was conceived to capture the time dependent conditional sues of non-stationarity and transformed to be in a range distribution of data. In addition to the unsupervised ad- between +1 and –1. The authors interpret the prediction versarial loss, the model prescribes the use of a supervised in this range as confidence values. It would be interesting loss based on the original data. to conduct further research applying to financial time series tasks, architectures centred on 2D convolutions 3.5.5. Attention towards unexplored avenues in economics research. The literature reviewed in this study adopts input time Attention mechanisms, pioneered by Bahdanau et al. [51], series exclusively in tabular form, however, financial pro- allow a decoder to focus at each step of a sequence on fessional place considerable value in extracting knowl- the most relevant (encoded) input. The decoder com- edge from the relationship and interaction amongst dif- putes a weighted sum of the output of the encoder; the ferent variables. Graph Neural Networks (GNNs) [54] are weights are learned by an attention layer, using as in- able to extract knowledge from the interplay between puts the encoder output concatenated with the decoder different nodes in a graph, therefore could represent a previous hidden state. These techniques, developed for novel approach, worthy of further research in our field. neural machine translation problems, have been applied An exploratory study related to Japanese equities can be to time series forecasting: attention mechanism weights seen in [55]. The authors motivated the use of GNNs to differently each time step of a sequence, this is then fed leverage inter-market and inter-company relationships. to a forecasting model to derive a prediction. Zhang The main paradigm adopted by both industry and et al. [52]proposed the AT-LSTM architecture: an atten- academia when devising a machine learning solution has tion mechanism using LSTM as encoder, assigns different been to consider a dataset as fixed, focusing on algorithm weights to input features (time steps). Attention weighted development. This survey shows that there is potential sequences are than used as inputs into an LSTM network for progress, keeping the algorithm fixed, placing the to output the prediction. A more complex version of this data at the centre of the research process instead. architecture can be seen in [53]: here the input series is Ensemble learning approaches combining neural net- first encoded with an LSTM and an attention layer as- works have been the winning architecture in image recog- signs weights to the features at each time step, obtaining nition competitions. Similar ideas could be explored, ap- an attention weighted features matrix. In the next stage plying diversified network ensembles to financial data. another LSTM based attention mechanism weights the Different models could be trained on different market different hidden states across time steps. The final stage regimes. One way to deal with a changing environment, prescribes an LSTM bloc to make a prediction using as in- is to constantly discard old data and retrain the model puts the output of the previous stage and the target time with more recent examples. Nevertheless, often, market series. This Dual-Stage Attention RNN (DA-RNN) aims dynamics observed in the past occur in a similar way to capture the most important features while learning at a later stage. Conceiving a way to use past, however, time dependencies. relevant information is an interesting avenue to pursue further study. 3.6. Underexplored Areas While the value of machine learning methods to equity time series forecasting has been shown in the short term, So far we have discussed researchers tackling the prob- it would be interesting to test further these techniques lems of non-stationarity of financial data, different mar- sampling data with lower frequency. Extending findings ket regimes and low signal to noise ratio, deploying more to monthly data would graduate machine learning meth- and more complex algorithms. Further study should be ods to applications beyond the realm of trading. conducted focusing on feature engineering: rescaling and This survey shows how the playground for machine labelling examples considering the related market envi- learning experiment in equities is the S&P 500. Few stud- ronment, therefore avoiding a fixed arbitrary threshold ies try to corroborate findings extending the research to define classes, would put the data in relation to the to different countries. European Equities in particular, context in which they were generated. emerge as an area which remains rather under-explored. Framing the prediction problem in terms of classifi- Moreover, given that different algorithms are simu- cation is vulnerable to not identifying large movements. lated on different data, it is difficult to assess what could On the other hand, developing a model to predict future be considered the state of the art. It would be very fruitful securities prices could have the pitfall that a fairly accu- for the field if, perhaps as a cooperative project between rate point estimate, albeit in the wrong direction, could industry and academia, different financial multivariate result in a loss-making course of action. datasets ( e.g., one for US equities and one for European Research could be pursued in developing alternative shares) would be engineered as standard, in order to approaches in terms of training, for example, conceiv- provide a more objective common ground to conduct ing models learning jointly regression and classification research. tasks. In contrast with many of the studies reviewed that use as input the target series and technical indicators, further research could be conducted combining features from different domains. A diverse data pool could point 4. Concluding Remarks confidence, and trading activity, The Journal of Finance 64 (2009) 549–578. URL: http://www.jstor. Researchers tackled the problem of equity market fore- org/stable/20487979. casting, initially deploying statistical time-series forecast- [7] P. L. Bernstein, Why the efficient market offers ing techniques and then experimenting with complex hope to active management, Journal of Applied deep learning architectures. Corporate Finance 12 (1999) 129–136. In particular, LSTM has been proven to be useful in [8] A. W. Lo, Adaptive Markets: Financial Evolution at solving the vanishing/exploding gradient problem while the Speed of Thought, Princeton University Press, offering the advantage of modelling non-linear time se- 2017. URL: http://www.jstor.org/stable/j.ctvc77k3n. ries data. Dilated convolutions, on their own, or as fea- [9] X.-Y. Qian, S. Gao, Financial series prediction: Com- ture extractors, constitute an effective technique when parison between precision of time series models dealing with long sequences. C2D centred architec- and machine learning methods, 2017. tures are an emerging method capable of extracting local [10] S. Gu, B. T. Kelly, D. Xiu, Empirical asset pric- knowledge about the interaction of different features. ing via machine learning, 2018. URL: https://doi. With this review, we advocate the pursuit of research org/10.2139%2Fssrn.3281018. doi:10.2139/ssrn. on every component of the machine learning value chain, 3281018. rather than focusing exclusively on the algorithmic core. [11] M. Ballings, D. V. den Poel, N. Hespeels, R. Gryp, Most studies show results only on a specific market or Evaluating multiple classifiers for stock price direc- related to a specific period, thus, the general robustness tion prediction, Expert Syst. Appl. 42 (2015) 7046– of findings could be improved. Applying machine learn- 7056. URL: https://doi.org/10.1016/j.eswa.2015.05. ing to financial time series is a challenging, however, 013. doi:10.1016/j.eswa.2015.05.013. rewarding endeavour. Given the importance of the de- [12] M. Dixon, D. Klabjan, J. H. Bang, Classiffication- cisions based on equity market forecasts, even a small based financial markets prediction using improvement in model performance can have a major deep neural networks, CoRR abs/1603.08604 impact. (2016). URL: http://arxiv.org/abs/1603.08604. arXiv:1603.08604. [13] T. Guida, Big Data and Machine Learning in Quan- References titative Investment, Wiley, 2018. [1] E. F. Fama, Efficient capital markets: A review [14] T. Fischer, C. Krauss, Deep learning with long of theory and empirical work, The Journal of Fi- short-term memory networks for financial mar- nance 25 (1970) 383–417. URL: http://www.jstor. ket predictions, Eur. J. Oper. Res. 270 (2018) 654– org/stable/2325486. 669. URL: https://doi.org/10.1016/j.ejor.2017.11.054. [2] N. Jegadeesh, S. Titman, Returns to buying doi:10.1016/j.ejor.2017.11.054. winners and selling losers: Implications for stock [15] J. Eapen, D. Bein, A. Verma, Novel deep learning market efficiency, The Journal of Finance 48 (1993) model with CNN and bi-directional LSTM for im- 65–91. URL: https://onlinelibrary.wiley.com/doi/ proved stock market index prediction, in: IEEE abs/10.1111/j.1540-6261.1993.tb04702.x. doi:https: 9th Annual Computing and Communication Work- //doi.org/10.1111/j.1540-6261.1993. shop and Conference, CCWC 2019, Las Vegas, NV, tb04702.x. USA, January 7-9, 2019, IEEE, 2019, pp. 264–270. [3] B. G. Malkiel, The efficient market hypothesis and URL: https://doi.org/10.1109/CCWC.2019.8666592. its critics, The Journal of Economic Perspectives doi:10.1109/CCWC.2019.8666592. 17 (2003) 59–82. URL: http://www.jstor.org/stable/ [16] J. J. Murphy, Technical Analysis of the Futures Mar- 3216840. kets: A Comprehensive Guide to Trading Methods [4] S. J. Grossman, J. E. Stiglitz, On the impossibil- and Applications, Prentice Hall, 1986. ity of informationally efficient markets, American [17] J. Patel, S. Shah, P. Thakkar, K. Kotecha, Predict- Economic Review 72 (1982) 393–408. doi:https: ing stock and stock price index movement using //doi.org/10.7916/D8765R99. trend deterministic data preparation and machine [5] R. Bauer, M. Cosemans, P. Eichholtz, Op- learning techniques, Expert Syst. Appl. 42 (2015) tion trading and individual investor performance, 259–268. URL: https://doi.org/10.1016/j.eswa.2014. Journal of Banking & Finance 33 (2009) 731– 07.040. doi:10.1016/j.eswa.2014.07.040. 746. URL: https://www.sciencedirect.com/science/ [18] R. P. Schumaker, H. Chen, Textual analysis of article/pii/S0378426608002720. doi:https://doi. stock market prediction using breaking financial org/10.1016/j.jbankfin.2008.11.005. news: The azfin text system, ACM Trans. Inf. [6] M. Grinblatt, M. Keloharju, Sensation seeking, over- Syst. 27 (2009) 12:1–12:19. URL: https://doi.org/ 10.1145/1462198.1462204. doi:10.1145/1462198. 1462204. 031. doi:10.1016/j.ejor.2016.10.031. [19] S. Kazemian, S. Zhao, G. Penn, Evaluating senti- [30] X. Yuan, J. Yuan, T. Jiang, Q. U. Ain, Integrated ment analysis in the context of securities trading, in: long-term stock selection models based on feature Proceedings of the 54th Annual Meeting of the As- selection and machine learning algorithms for china sociation for Computational Linguistics, ACL 2016, stock market, IEEE Access 8 (2020) 22672–22685. August 7-12, 2016, Berlin, Germany, Volume 1: Long URL: https://doi.org/10.1109/ACCESS.2020.2969293. Papers, The Association for Computer Linguistics, doi:10.1109/ACCESS.2020.2969293. 2016. URL: https://doi.org/10.18653/v1/p16-1197. [31] C. Cortes, V. Vapnik, Support-vector net- doi:10.18653/v1/p16-1197. works, Mach. Learn. 20 (1995) 273–297. URL: [20] J. Bollen, H. Mao, X. Zeng, Twitter mood pre- https://doi.org/10.1007/BF00994018. doi:10.1007/ dicts the stock market, J. Comput. Sci. 2 (2011) BF00994018. 1–8. URL: https://doi.org/10.1016/j.jocs.2010.12.007. [32] W. Huang, Y. Nakamori, S.-Y. Wang, Forecasting doi:10.1016/j.jocs.2010.12.007. stock market movement direction with support vec- [21] D. Othan, Z. H. Kilimci, M. Uysal, Financial sen- tor machine, Computers & operations research 32 timent analysis for predicting direction of stocks (2005) 2513–2522. using bidirectional encoder representations from [33] K. Zbikowski, Using volume weighted support vec- transformers (bert) and deep learning models, in: tor machines with walk forward testing and feature Proc. Int. Conf. Innov. Intell. Technol., volume 2019, selection for the purpose of creating stock trad- 2019, pp. 30–35. ing strategy, Expert Syst. Appl. 42 (2015) 1797– [22] P. Cho, J. H. Park, J. W. Song, Equity research report- 1805. URL: https://doi.org/10.1016/j.eswa.2014.10. driven investment strategy in korea using binary 001. doi:10.1016/j.eswa.2014.10.001. classification on stock price direction, IEEE Access [34] J. Patel, S. Shah, P. Thakkar, K. Kotecha, Predicting 9 (2021) 46364–46373. stock market index using fusion of machine learn- [23] D. Olson, C. Mossman, Neural network forecasts ing techniques, Expert Syst. Appl. 42 (2015) 2162– of canadian stock returns using accounting ratios, 2172. URL: https://doi.org/10.1016/j.eswa.2014.10. International Journal of Forecasting 19 (2003) 453– 031. doi:10.1016/j.eswa.2014.10.031. 465. [35] E. Chong, C. Han, F. C. Park, Deep learning net- [24] C. Tsai, Y. Lin, D. C. Yen, Y. Chen, Predicting works for stock market analysis and prediction: stock returns by classifier ensembles, Appl. Soft Methodology, data representations, and case stud- Comput. 11 (2011) 2452–2459. URL: https://doi.org/ ies, Expert Systems with Applications 83 (2017) 10.1016/j.asoc.2010.10.001. doi:10.1016/j.asoc. 187–205. 2010.10.001. [36] X. Zhong, D. Enke, Predicting the daily return [25] J. Alberg, Z. C. Lipton, Improving factor-based direction of the stock market using hybrid machine quantitative investing by forecasting company fun- learning algorithms, Financial Innovation 5 (2019) damentals, CoRR abs/1711.04837 (2017). URL: http: 1–20. //arxiv.org/abs/1711.04837. arXiv:1711.04837. [37] M. F. Dixon, I. Halperin, P. Bilokon, Machine learn- [26] L. Chauhan, J. Alberg, Z. C. Lipton, Uncertainty- ing in Finance, volume 1406, Springer, 2020. aware lookahead factor models for quantitative in- [38] S. Hochreiter, J. Schmidhuber, Long short-term vesting, in: Proceedings of the 37th International memory, Neural Comput. 9 (1997) 1735–1780. URL: Conference on Machine Learning, ICML 2020, 13- https://doi.org/10.1162/neco.1997.9.8.1735. doi:10. 18 July 2020, Virtual Event, volume 119 of Proceed- 1162/neco.1997.9.8.1735. ings of Machine Learning Research, PMLR, 2020, pp. [39] D. M. Nelson, A. C. Pereira, R. A. De Oliveira, Stock 1489–1499. URL: http://proceedings.mlr.press/v119/ market’s price movement prediction with lstm neu- chauhan20a.html. ral networks, in: 2017 International joint confer- [27] E. F. Fama, K. R. French, A five-factor asset pricing ence on neural networks (IJCNN), Ieee, 2017, pp. model, Journal of financial economics 116 (2015) 1419–1426. 1–22. [40] A. van den Oord, S. Dieleman, H. Zen, K. Si- [28] L. Breiman, Random forests, Mach. monyan, O. Vinyals, A. Graves, N. Kalchbren- Learn. 45 (2001) 5–32. URL: https://doi.org/ ner, A. W. Senior, K. Kavukcuoglu, Wavenet: A 10.1023/A:1010933404324. doi:10.1023/A: generative model for raw audio, in: The 9th 1010933404324. ISCA Speech Synthesis Workshop, Sunnyvale, CA, [29] C. Krauss, X. A. Do, N. Huck, Deep neural networks, USA, 13-15 September 2016, ISCA, 2016, p. 125. gradient-boosted trees, random forests: Statistical URL: http://www.isca-speech.org/archive/SSW_ arbitrage on the s&p 500, Eur. J. Oper. Res. 259 (2017) 2016/abstracts/ssw9_DS-4_van_den_Oord.html. 689–702. URL: https://doi.org/10.1016/j.ejor.2016.10. [41] A. Borovykh, S. M. Bohté, C. W. Oosterlee, Condi- tional time series forecasting with convolutional Information Processing Systems 32: Annual neural networks, 2017. Conference on Neural Information Processing [42] L. Börjesson, M. Singull, Forecasting financial time Systems 2019, NeurIPS 2019, December 8-14, 2019, series through causal and dilated convolutional Vancouver, BC, Canada, 2019, pp. 5509–5519. URL: neural networks, Entropy 22 (2020) 1094. URL: https://proceedings.neurips.cc/paper/2019/hash/ https://doi.org/10.3390/e22101094. doi:10.3390/ c9efe5f26cd17ba6216bbe2a7d26d490-Abstract. e22101094. html. [43] M. U. Gudelek, S. A. Boluk, A. M. Ozbayoglu, A deep [51] D. Bahdanau, K. Cho, Y. Bengio, Neural machine learning based stock trading model with 2-d CNN translation by jointly learning to align and trans- trend detection, in: 2017 IEEE Symposium Series on late, in: Y. Bengio, Y. LeCun (Eds.), 3rd Interna- Computational Intelligence, SSCI 2017, Honolulu, tional Conference on Learning Representations, HI, USA, November 27 - Dec. 1, 2017, IEEE, 2017, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, pp. 1–8. URL: https://doi.org/10.1109/SSCI.2017. Conference Track Proceedings, 2015. URL: http: 8285188. doi:10.1109/SSCI.2017.8285188. //arxiv.org/abs/1409.0473. [44] N. Cohen, T. Balch, M. Veloso, Trading via image [52] X. Zhang, X. Liang, A. Zhiyuli, S. Zhang, R. Xu, classification, in: T. Balch (Ed.), ICAIF ’20: The B. Wu, At-lstm: An attention-based lstm model for First ACM International Conference on AI in Fi- financial time series prediction, in: IOP Conference nance, New York, NY, USA, October 15-16, 2020, Series: Materials Science and Engineering, volume ACM, 2020, pp. 53:1–53:6. URL: https://doi.org/ 569, IOP Publishing, 2019, p. 052037. 10.1145/3383455.3422544. doi:10.1145/3383455. [53] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, 3422544. G. W. Cottrell, A dual-stage attention-based re- [45] Z. Zeng, T. Balch, M. Veloso, Deep video pre- current neural network for time series prediction, diction for time series forecasting, in: A. Cali- in: C. Sierra (Ed.), Proceedings of the Twenty- nescu, L. Szpruch (Eds.), ICAIF’21: 2nd ACM In- Sixth International Joint Conference on Artificial ternational Conference on AI in Finance, Virtual Intelligence, IJCAI 2017, Melbourne, Australia, Au- Event, November 3 - 5, 2021, ACM, 2021, pp. 39:1– gust 19-25, 2017, ijcai.org, 2017, pp. 2627–2633. 39:7. URL: https://doi.org/10.1145/3490354.3494404. URL: https://doi.org/10.24963/ijcai.2017/366. doi:10. doi:10.1145/3490354.3494404. 24963/ijcai.2017/366. [46] L. Troiano, E. Mejuto, P. Kriplani, On feature re- [54] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, duction using deep learning for trend prediction L. Wang, C. Li, M. Sun, Graph neural networks: in finance, CoRR abs/1704.03205 (2017). URL: http: A review of methods and applications, AI Open 1 //arxiv.org/abs/1704.03205. arXiv:1704.03205. (2020) 57–81. [47] W. Bao, J. Yue, Y. Rao, A deep learning framework [55] D. Matsunaga, T. Suzumura, T. Takahashi, Explor- for financial time series using stacked autoencoders ing graph neural networks for stock market predic- and long-short term memory, PloS one 12 (2017) tions with rolling window analysis, arXiv preprint e0180944. arXiv:1909.10660 (2019). [48] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, Y. Bengio, Generative adversarial nets, in: Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, 2014, pp. 2672–2680. URL: https://proceedings.neurips.cc/paper/2014/hash/ 5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html. [49] X. Zhou, Z. Pan, G. Hu, S. Tang, C. Zhao, Stock market prediction on high-frequency data using generative adversarial nets., Mathematical Prob- lems in Engineering (2018). [50] J. Yoon, D. Jarrett, M. van der Schaar, Time-series generative adversarial networks, in: H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, R. Garnett (Eds.), Advances in Neural