Machine Learning Methods for Equity Time Series
Forecasting: A Compendium
Alberto Matuozzo1 , Paul D. Yoo1 , Alessandro Provetti1 and Maria H. Kim2
1
    Birkbeck College, University of London, Malet St, London, WC1E 7HX, United Kingdom
2
    University of Wollongong, Wollongong, NSW 2522, Australia


                                           Abstract
                                           Machine learning is a method of building predictive models using a vast amount of data from different sources, capturing
                                           non-linear relationships between different variables. As a result, financial markets in general and stock markets in particular,
                                           offer a promising ground for the application of such method. This survey examines machine learning methods for equity
                                           market forecasting, identifying gaps in current knowledge and suggesting potential avenues to pursue further research.
                                           Computer science-centred quantitative studies have focused mainly on algorithms, testing results mostly on US data on
                                           short time-frames, yet, feature engineering, and testing findings on different markets and different time horizons, appear to
                                           be under-explored. This study thus introduces the financial context for non-experts and moves to review different models
                                           and tools in the realm of statistical learning, and deep learning. We believe that this approach will prove to be effective in
                                           financial practice to an interested reader without much prior knowledge of the finance literature. We survey the end-to-end
                                           deployment of machine learning to help readers from industry and academia to understand the peculiarities of applying these
                                           methods to equity market forecasting.

                                           Keywords
                                           Machine Learning, Deep Learning, Time Series Forecasting, Equity market forecasting


1. Introduction                                                publications and original contributions from the field for
                                                               the critical steps of the machine learning deployment
Devising Equity markets forecasting is relevant not only process, from pre-processing to algorithm selection. We
for the parties directly involved in price formation (e.g., cite papers that are now part of history of financial the-
companies and investors), but also for policymakers and ory for the benefit of non-experts. Research from other
regulators. Central banks have shown a growing interest domains is included where it is reasonable to assume that
in modelling equities to decide macro prudential policy, the related techniques could be ported to financial time
assess investors’ attitude towards risk, and in some cases series forecasting.
deploy capital in the market directly. For example, ac-          The contributions of this study are as follows:
cording to the 30th of June 2022 13F SEC filings 1 , the
Swiss National Bank owns about $11.11 billions of Apple             • We review, with the caveat that the field is evolv-
and $7.49 billions of Microsoft.                                       ing at a significant speed, the main machine learn-
   This study is driven by the need to review machine                  ing solutions for equity market forecasting de-
learning techniques applied to equity markets time se-                 ployed in terms of features and algorithms. We
ries forecasting problems, with the objective to provide               observe that most findings have been corrobo-
an overview for practitioners, highlight areas under ex-               rated on specific markets or on specific periods.
plored and warranting further research.                                Data sampling and forecasting horizon are mostly
   This study explores research from finance and com-                  daily.
puter science, from industry and academia. As such, the             • We highlight potential directions for future re-
selection of papers for this literature review is not purely           search, emphasising the adoption of a more
bibliometric. We aim to cover a selection of high impact               data–centric approach. Ensemble methods should
                                                                       be further researched as architectures to leverage
CIKM’22: Workshop on Applied Machine Learning Methods for Time
Series Forecasting (AMLTS), October 21, 2022, Atlanta, GA, USA         the peculiarities of financial data.
*
 Corresponding author.
$ amatuo01@mail.bbk.ac.uk (A. Matuozzo); p.yoo@bbk.ac.uk                                                                               The rest of the paper is organised as follows: Section 2
(P. D. Yoo); a.provetti@bbk.ac.uk (A. Provetti);                                                                                    outlines financial background knowledge for non-experts.
mhykim@uow.edu.au (M. H. Kim)                                                                                                       Section 3 examines features, main algorithms deployed
 0000-0002-5614-3129 (A. Matuozzo); 0000-0001-7665-8616
                                                                                                                                    and related financial applications, followed by a discus-
(P. D. Yoo); 0000-0001-9542-4110 (A. Provetti); 0000-0002-6279-5836
(M. H. Kim)                                                                                                                         sion on gaps in the current knowledge. The final parts of
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License this study are dedicated to areas of future work and how
                                       Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                                      the gaps could be addressed.
1
    Please see https://www.sec.gov/edgar.shtml
2. Framing the problem
The non-stationarity and non-linearity of financial vari-
ables are the primary attributes to estimate a forecasting
model for equity markets.

2.1. The “Efficient Markets” Hypothesis
Under the Efficient Markets Hypothesis (EMH) devel-
oped by Fama [1], the sequence of price changes must be
unpredictable if they fully incorporate information and
expectations of market participants.
   Yet, in the real world, analysing past returns and pro-
                                                              Figure 1: Machine learning methods for equity market fore-
cessing publicly available information are instrumental
                                                              casting; algorithms and reference papers.
to build a forecasting model. Jagadeesh and Titman [2],
showed that it is possible to generate statistically sig-
nificant abnormal returns, by buying the stocks that ex-
hibited top decile returns in the past (i.e., winners) and    2.3. Machine Learning in Finance
selling stocks in the bottom decile, losers. Malkiel [3]
                                                              The emergence of big data gives rise to a paradigm shift
documented several market anomalies that increase pre-
                                                              in the field of financial engineering where the state-of-
dictability; however, these inefficiencies do not persist.
                                                              the-art techniques and data processing capacity are of
   If market participants firmly believed in EMH, it would
                                                              utmost importance. Not only newsflow, economic and
be irrational to trade, hence there would not be a finan-
                                                              stock market data are widely available in copious size,
cial market. Grossman and Stiglitz [4] offer the following
                                                              however, unstructured data (e.g., geo-location), can be
solution: the information derived by skilled investors is
                                                              accessed at any time, processed in different ways. As a
not entirely reflected in the market. As a result, invest-
                                                              result, investors equipped with the latest technological
ment research is compensated. Bauer et al. [5] showed
                                                              advancement can make use of an extensive set of fea-
skill persistence amongst retail option traders: top decile
                                                              tures. Machine learning algorithms do not require data
performers over one year outperform individuals in the
                                                              transformations to make a time series stationary, being
bottom decile the following year.
                                                              able to learn complex patterns in high dimensions.
                                                                 Quian [9] showed the outperformance of several ma-
2.2. The “Adaptive Markets” Hypothesis                        chine learning algorithms (logistic regression, Neural
                                                              Network (NN), Support Vector Machines (SVM), denois-
Traditional financial theories assume that market par-
                                                              ing Autoencoders) compared to ARIMA. Gu et al. [10]
ticipants are rational and with the same utility function.
                                                              compared ordinary least squared regression to tree-based
When these assumptions are violated, behavioural biases
                                                              methods and NNs on the task of predicting equity excess
such as dissimilar risk appetite and inconsistent probabil-
                                                              returns for US stocks. The authors ascribe the superior
ity beliefs play a significant role to account for abnormal
                                                              performance of machine learning techniques to the abil-
profit opportunities. Grinblatt et al. [6] showed that in-
                                                              ity to accommodate a large number of predictors and
dividuals’ personal traits and sentiment tend to affect
                                                              learn non-linear relationships between covariates.
trading behavior. Bernstein [7] points out how financial
markets undergo regime shifts through an evolutionary
process, which explains the market dynamics.                  3. Main Algorithm Deployed and
   The Adaptive Market Hypothesis (AMH) [8] offers
an alternative framework: different agents interact and          Features utilised
adapt in response to ever changing market environment,
                                                              In this section, different machine learning algorithms
competing to capture returns opportunities.
                                                              and their applications to stock market time series are
   Machine learning, being adaptive in nature, has the set
                                                              presented. Figure 1 depicts the ML algorithms and their
of tools to conduct research in the new paradigm: models
                                                              related references. Here the literature is reviewed focus-
are non-parametric in nature and have the capability to
                                                              ing on the main steps required for an end-to-end machine
approximate complex non-linear, non-continuous func-
                                                              learning application, rather than looking, almost exclu-
tions.
                                                              sively, at the different algorithms deployed.
3.1. Peculiarities in Financial Timeseries                       index constituents.
                                                                    An interesting feature engineering technique that tries
Framing time series forecasting problems as regression
                                                                 to capture both past observations and market regime can
might present an issue in the financial domain: the pre-
                                                                 be seen in [15]. The authors implement a relative change
diction might be very close to the actual value and yet
                                                                 transformation of the predictors in each data window:
induce the wrong action. For example, given a value of
                                                                 each element in a sample is subtracted and divided by the
the S&P 500 of 4200, assume the 1-day forward forecast
                                                                 first observation of the sequence. As a result, the network
generated by two different models to be 4210 and 4150 re-
                                                                 will be learning from a price change that is put in the
spectively, with the actual future level being 4190: while
                                                                 context of the short-term market environment in which
the first model is more accurate, it will suggest entering
                                                                 it was generated. Possible avenues of research would be
a long position causing a loss. The second model, despite
                                                                 to devise ad hoc methods to rescale data dynamically.
being further away from the observed value, would trig-
                                                                    The overwhelming majority of the papers surveyed
ger the correct (profitable) action, establishing a short
                                                                 in this study employs technical indicators [16] as ad-
position.
                                                                 ditional features. Patel et al. [17] ran experiments on
   In several time series classification studies, the labeling
                                                                 Indian equities using different representations of techni-
is conditional on a gating value g: an instance is labelled
                                                                 cal indicators: in their original form and in discrete form.
+1 if the response variable 𝑦 > 𝑔, -1 otherwise. For
                                                                 The latter, called “trend-deterministic,” is obtained by
example, Ballings et al. [11] compared the performance
                                                                 discretising continuous values of oscillators (e.g the Rel-
of ensemble methods against single classifiers on the task
                                                                 ative Strength Index, RSI) in two categories: +1, -1. Their
to group European shares based on whether they will
                                                                 study shows that the trend deterministic representation
be up at least 25% in a year. Dixon et al. [12] framed
                                                                 improves results irrespective of the algorithm.
securities time series problems as multiclass attributing
                                                                    Sentiment analysis is performed to take the pulse of
+1 to a positive movement, -1 to a downward direction
                                                                 the market participants’ emotional state. Extracting senti-
and 0 as flat market, setting a value to determine ‘no
                                                                 ment from social media, particularly for short time frames
action’ in order to balance the classes. Labelling examples,
                                                                 applications has gained paramount importance given the
using an arbitrary fixed threshold does not take into
                                                                 rise of retail traders, whose decisions combined can have
account that returns are realised in a specific market
                                                                 a major impact on price formation. Sentiment indicators
regime. Defining for example a threshold of ±5% to
                                                                 can be the direct result of investors polls like for example,
determine the bounds of the 0 class, might be appropriate
                                                                 the American Association of Individual Investors (AAII)
for a quiet market environment, however, it could be
                                                                 survey. The Investopedia portal2 computes the Investope-
too small in a high volatility phase. Research should be
                                                                 dia Anxiety Index (IAI) based on their readers’ interests
conducted to scale raw data in ways that would result in
                                                                 for topics such as macroeconomics, negative market and
predictors that carry information content related to the
                                                                 credit disruptions.
respective market regime.
                                                                    Schumaker and Chen [18] proposed a model to predict
   Accuracy in time series classification problems might
                                                                 the intraday share price of the companies after a piece
not reflect profitability if a model fails to capture the
                                                                 of news was released. They used as predictors the price
correct direction for large moves. For example, consider
                                                                 forecast using a linear regression model and features
the sequence of returns +0.8%, +1%, +0.5%, -3%, and two
                                                                 derived from text analysis of news articles. Kazemian
binary classifiers (+1, -1): the first predicts four upwards
                                                                 et al. [19] deployed an SVM to classify the sentiment
movements, it is therefore 75% accurate. The second
                                                                 extracted from financial news on US stocks. Bollen et
classifier outputs -1, -1, +1, -1: it is 50% accurate. If the
                                                                 al. [20] showed that the emotional state, particularly the
outputs were to be fed into a trading system, the first
                                                                 ‘Calm’ dimension, of the broad population has predictive
classifier would generate a 0.7% loss while the second
                                                                 power on the daily changes in the DJIA. Othan et al.
would yield a profit of 1.7%. An approach to mitigate
                                                                 [21] proposed a model to predict the direction of Turkish
this issue, suggested by industry practitioners [13] is to
                                                                 stocks based exclusively on sentiment extracted from
train a model to predict returns and then value it based
                                                                 Twitter posts.
on direction accuracy (also called hit ratio).
                                                                    It is noteworthy that these sentiment centred tech-
                                                                 niques aim to output short term predictions. It would
3.2. Features                                                    be interesting to extend this research over longer time
                                                                 horizons, using texts from commentators with domain
Despite the potential of machine learning algorithms to
                                                                 knowledge, for example analysing blog posts from the
extract knowledge in many dimensions, several studies
                                                                 likes of “Seeking Alpha”.
do not go beyond using lags of the target series as pre-
                                                                    Cho et al. [22] simulated an investment strategy based
dictors. For example, Fischer and Krauss [14] used only
percentage changes of adjusted prices of the S&P 500
                                                                 2
                                                                     Please see www.investopedia.com
on the text characteristics of equity research reports pub-     Feature type      Examples        Pros            Cons
                                                                Technical         RSI             Easy to         Mixed
lished by brokerage houses. The authors deployed Part of
                                                                                  Momentum        compute         results
Speech tagging to engineer features such as: numbers of
                                                                Sentiment         AAII            Capture         Requires
nouns, adjectives, number of sentences. Classifiers, using                        Twitter Data    emotional st.   NLP
the text extracted features, were trained on the task of        Fundamental       GDP             Medium- or      Look-ahead
recognising successful buy recommendations on Korean                              P/E             Long-term       bias
equities.
   Fundamental analysis does much more than assessing          Table 1
                                                               Pros and Cons of different kinds of features.
the intrinsic value of an asset or company given that it in-
cludes both macro-econometrics and micro-econometrics
approaches. As such it focuses on financial statements
and macroeconomic data. Specific attention is warranted
to avoid look-ahead bias: new data releases and revisions
should enter the time series only when the updated in-
formation is disclosed. For example, quarterly GDP data,
after the initial release, are later revised twice to get to
the final figure.
   Only a limited number of studies in the field adopt
fundamental data; this is probably due, on one hand to
the aforementioned issues, and on the other to the low
                                                               Figure 2: Surveyed papers grouped by kinds of features
frequency nature. Olson and Mossman [23] studied pre-
                                                               adopted. The combination is composed by studies using at
dictive modelling of Canadian stock excess returns and         least two sources of inputs, e.g., technical & sentiment.
direction using as inputs 61 accounting ratios. Tsai et
al. [24] studied the application of ensemble learning
methods to Taiwanese stock market quarterly direction
forecast. Predictors included company financial ratios
and macroeconomic indicators.
   Fundamental reported data are backward-looking. Al-
berg et al. [25] and Chauhan et al. [26] while studying
neural network applications to factor investing [27] in
US stocks, built a predictive model of companies’ fun-
damental metrics. Stocks were then ranked and picked
to form portfolios based on the forecasted fundamental
values. These studies show that investing based on pre-
dicted company data outperforms stock selection based
on reported information. An alternative solution could
                                                               Figure 3: Surveyed papers grouped by Sampling frequency.
be using, as predictors consensus forecast figures from
the financial analyst community. This approach would
also allow sampling data with a frequency higher than
the quarterly earnings cycle.                                  Figure 2. In light of this survey more attention should be
   As summarised in Table 1, technical features can be         devoted to expanding the feature space, via both feature
easily retrieved from data vendors or calculated from          engineering based on domain knowledge, and combin-
price series. They are mostly used by practitioners to         ing predictors from different sources of data (technical,
have short term timing indications, albeit with mixed          fundamental and sentiment). The literature surveyed ap-
results. Studying sentiment allows to read the emotional       pears to be centred on short term forecasting as shown
state of market participants. Sentiment data is mostly         in Figure 3. Furthermore, tests are mostly related to US
unstructured therefore, requires specific Natural Lan-         markets ( Figure 4).
guage Processing (NLP) tasks. Fundamental data are low
frequency and potentially subject to the dangers of look-      3.3. Ensemble Learning
ahead bias. However, this kind of data reveals the state
of the economy. A predictive model aiming to generate a        Ensemble methods prescribe combing multiple algo-
medium- or long-term forecast should take fundamental          rithms to achieve better performance than inducing only
variables in consideration.                                    the best member. Ensemble architectures are determined
   The vast majority of research consulted for this study      by how a population of diverse models is deployed and
used only one or two types of features, as summarised in       by the choice of aggregation mechanism to derive the
                                                                 features’ values, and then feed this output into a different
                                                                 model to produce the estimated index close price.
                                                                    SVM incurs in high computational costs with a large
                                                                 dataset. The adoption of deep learning has grown in
                                                                 popularity thanks to the capability to overcome SVM
                                                                 scalability problems without sacrificing performance.

                                                                 3.5. Deep Learning
                                                                     Feed forward neural networks extend linear models
                                                                     by composing different functions that might carry an
Figure 4: Markets used for experiments. Global encompasses element of non-linearity. As a result, this is equiva-
research using securities from multiple geographic areas. Oth- lent to expanding the linear problem 𝑦 = 𝑤𝑥 + 𝑏 to
ers refer to securities of different asset classes, e.g., futures on 𝑦 = 𝑤(𝑔(𝑥, 𝜃)) + 𝑏. An example of activation function
indices and commodities.                                             is the well known Rectified Linear Unit (ReLU). In this
                                                                     study the terms Deep Learning and Deep Neural Net-
                                                                     work (DNN) are used for architectures encompassing
final prediction. The Random Forest (RF) algorithm [28] more than 3 layers.
represents a prototypical example.                                      Chong et al. [35] run experiments on stocks listed on
   In Krauss et al. [29], a statistical arbitrage strategy is        the  Korean market to make intraday returns prediction
simulated after an ensemble architecture with majority (5 minutes). The authors considered a network to be
voting (composed by a deep neural netwowk, gradient deep with just three hidden layers. Over the subsequent
boosted tree and RF) has classified S&P 500 constituents few years, structures significantly more complex will
according to the probability to outperform the index. be developed. The impact of the network depth on a
Yuan et al. [30], deployed NN, SVM and RF, to classify financial time series classification task performance was
Chinese stocks based on belonging to the top 30% of studied in [36].
companies in generating excess return over one month.
   The authors highlight an issue arising adopting stan- 3.5.1. Recurrent Neural Networks
dard k–folds cross validation for time series problems:
                                                                     A Recurrent Neural Network (RNN) processes at any
future data might be used to predict past observations,
                                                                     given stage as inputs information from that time step,
overstating the performance. They evaluate algorithms
                                                                     and a hidden state derived from the previous one. Dixon
using a sliding window train and test set instead. Evaluat-
                                                                     et al. [37] show that a RNN, with linear activation, can
ing a model on periods prior to the training set might be
                                                                     be assimilated to a Auto Regressive time series model.
desirable in certain cases: for example, for stress testing
                                                                     The output of a given time step can be written as: 𝑌𝑡 =
purposes.
                                                                     𝑓 (𝑊𝑥 𝑥𝑡 + 𝑊𝑦 𝑦𝑡−1 + 𝑏).
                                                                        The Long Short Term Memory (LSTM) cell [38] over-
3.4. Support Vector Machines                                         comes the RNNs limitations (exploding or vanishing gra-
The Support Vector Machine (SVM) [31] is a very effective dient). A LSTM neuron adds to the recurrent unit hidden
algorithm able to solve non-linearly separable problems state (short term memory), a long-term cell state con-
by enlarging the feature space thanks to the “kernel trick”. trolled by different gates. LSTM network can therefore
Using the kernel functions allow to solve the problem in handle long sequences.
the original dimension without having to transform the                  A comparison of LSTM to RF, NNs can be seen in [39]:
data in a more complex space.                                        the task in question was the prediction of the direction of
   SVMs were compared in [32] to linear and quadratic 5 components of the BOVESPA index in a high frequency
discrimination analysis, and Elman network to forecast setting. Fischer and Krauss [14] used as feature only the
the weekly direction of the Nikkei 225 index. Zbikowski return series of the past 240 observations to classify 1
[33] proposed to consider volume data by plugging this day forward returns, showing the ability of LSTM cells to
information directly in the SVM formulation, multiplying learn long sequences. Alonso et al. [13] tackled predict-
the hyper-parameter representing the cost of margin ing the 1 day ahead return and the related direction of
violation, C, by a coefficient v based on the transaction 50 constituents of the S&P 500. This study showed that
volume in a given security over the input window. Patel increasing the number of time steps to train the model
et al. [34] deployed a 2 stages model with the task of from 1 to 10 improves the performance of the LSTM.
predicting forward closing prices. Their architecture
comprises a Support Vector Regressor (SVR) to predict the
3.5.2. Convolutional Neural Networks                           that have proven to be successful in other domains.
                                                                  A recent stream of research leverages the success
Convolutional Neural Networks (CNNs) have been par-
                                                               achieved in computer vision directly: time series data
ticularly effective in computer vision. Convolution layers
                                                               is transformed in order to assume the characteristics of
exhibit equivariance to translation; in time series data
                                                               pixel values and intensity. Cohen et al. [44] converts
this property allows the algorithm, to detect patterns
                                                               a time series classification problem (spotting technical
within a sample and recognise them, notwithstanding at
                                                               patterns) as an image recognition task; Zeng et al. [45]
which point in time they appear. Convolution filters over
                                                               ported a multivariate time series forecasting problem in
1-dimension (1D) can be deployed to slide across time,
                                                               US equities to the video prediction domain.
deriving hidden patterns within samples. 1-dimensional
convolution layers can be deployed as feature extrac-
tors before LSTMs, forming a CNN-LSTM architecture 3.5.3. Autoencoders
proposed in [15].                                              Autoencoders (AE) are models conceived to replicate
   The dilation rate d of a convolution layer indicates their inputs without supervision. The design usually pre-
the frequency of the number of elements in the input to scribes an input layer, an encoding layer with a smaller
which apply a convolution. Researchers from DeepMind number of units and a decoding layer of the same size
conceived an architecture called WaveNet [40], able to of the input, providing a reconstructed representation.
achieve state-of-the-art performance in several text to Autoencoders can be applied to financial time series pre-
speech tasks without the use of RNNs. It is obtained by diction problems to reduce dimensions or noise. Troiano
stacking several 1D convolution layers, doubling the dila- et al. [46] focused on the impact of the feature reduction
tion rate at every layer. Thus, the receptive field is larger, stage (starting from 40 technical indicators). Denoising
albeit without additional parameters. The initial layers Autoencoders were used in [9] in order to create a la-
learn short term patterns, while as data progresses to- tent feature representation, adding a noise component
wards the output, sequences of longer term are extracted. that randomly alters the raw data, forcing the algorithm
   Borovykh et al. [41] adapted the Wavenet architec- to reconstruct robust inputs. Bao et al. [47] stacked 5
ture, using ReLU as activation function, for multivariate autoencoders connecting the final output to a LSTM net-
financial time series forecasting: 1D dilated causal con- work to predict the one step ahead level of several stock
volutions run independently for each input time series market indices.
to be combined before making the prediction. Experi-
ments were run on several financial instruments, on 1 3.5.4. Generative Adversarial Networks
day forward return prediction tasks. Wavenet and LSTM
performed equally in terms of forecasting the direction, Generative Adversarial Networks (GANs) [48] are com-
however, Wavenet outputs more accurate point estima- posed by 2 competing models: the generator aims to
tion. Borjesson et al. [42] proposed a WaveNet inspired approximate the data distribution and outputs synthetic
model, using as activation function the Scaled exponen- samples; the discriminator takes either actual or syn-
tial Linear Unit (SeLU) for the convolution layers. They thetic data and estimates the probability that the sample
ran experiments with the goal of predicting the next day in question is genuine. The generator aims to maximise
price and trend of the S&P 500.                                the probability that the discriminator will make a mistake.
   Convolutions over two dimensions (2D) could be used Zhou et al. [49] applied GANs to make one step ahead
to extract the most salient features and the most signifi- predictions in 42 Chinese stocks within a high frequency
cant sub sequence in a sample. This approach, recently framework. The generator is an LSTM network, while a
emerging in the literature, offers the advantage to learn 1D-CNN plays the role of the discriminator.
jointly patterns within each predictor time series, and           Back testing is a common approach to measure the
the relationship between features. This could be particu- profitability of a model against past trends in the precise
larly advantageous with economic time series, where the order in which they occurred. GANs could be employed
correlation between variables changes over time, albeit to provide synthetic test sets, as if engineers were for-
in a recurrent fashion.                                        ward testing algorithms devised by researchers. A recent
   Gudelek et al. [43] used 2D convolutions simulat- architecture fit for this purpose is Time–series Generative
ing trading strategies on several Exchange Traded Fund Adversarial Network (TimeGAN) [50]. This framework
(ETFs). Prices were differenced once to mitigate the is- was conceived to capture the time dependent conditional
sues of non-stationarity and transformed to be in a range distribution of data. In addition to the unsupervised ad-
between +1 and –1. The authors interpret the prediction versarial loss, the model prescribes the use of a supervised
in this range as confidence values. It would be interesting loss based on the original data.
to conduct further research applying to financial time
series tasks, architectures centred on 2D convolutions
3.5.5. Attention                                               towards unexplored avenues in economics research.
                                                                  The literature reviewed in this study adopts input time
Attention mechanisms, pioneered by Bahdanau et al. [51],
                                                               series exclusively in tabular form, however, financial pro-
allow a decoder to focus at each step of a sequence on
                                                               fessional place considerable value in extracting knowl-
the most relevant (encoded) input. The decoder com-
                                                               edge from the relationship and interaction amongst dif-
putes a weighted sum of the output of the encoder; the
                                                               ferent variables. Graph Neural Networks (GNNs) [54] are
weights are learned by an attention layer, using as in-
                                                               able to extract knowledge from the interplay between
puts the encoder output concatenated with the decoder
                                                               different nodes in a graph, therefore could represent a
previous hidden state. These techniques, developed for
                                                               novel approach, worthy of further research in our field.
neural machine translation problems, have been applied
                                                               An exploratory study related to Japanese equities can be
to time series forecasting: attention mechanism weights
                                                               seen in [55]. The authors motivated the use of GNNs to
differently each time step of a sequence, this is then fed
                                                               leverage inter-market and inter-company relationships.
to a forecasting model to derive a prediction. Zhang
                                                                  The main paradigm adopted by both industry and
et al. [52]proposed the AT-LSTM architecture: an atten-
                                                               academia when devising a machine learning solution has
tion mechanism using LSTM as encoder, assigns different
                                                               been to consider a dataset as fixed, focusing on algorithm
weights to input features (time steps). Attention weighted
                                                               development. This survey shows that there is potential
sequences are than used as inputs into an LSTM network
                                                               for progress, keeping the algorithm fixed, placing the
to output the prediction. A more complex version of this
                                                               data at the centre of the research process instead.
architecture can be seen in [53]: here the input series is
                                                                  Ensemble learning approaches combining neural net-
first encoded with an LSTM and an attention layer as-
                                                               works have been the winning architecture in image recog-
signs weights to the features at each time step, obtaining
                                                               nition competitions. Similar ideas could be explored, ap-
an attention weighted features matrix. In the next stage
                                                               plying diversified network ensembles to financial data.
another LSTM based attention mechanism weights the
                                                               Different models could be trained on different market
different hidden states across time steps. The final stage
                                                               regimes. One way to deal with a changing environment,
prescribes an LSTM bloc to make a prediction using as in-
                                                               is to constantly discard old data and retrain the model
puts the output of the previous stage and the target time
                                                               with more recent examples. Nevertheless, often, market
series. This Dual-Stage Attention RNN (DA-RNN) aims
                                                               dynamics observed in the past occur in a similar way
to capture the most important features while learning
                                                               at a later stage. Conceiving a way to use past, however,
time dependencies.
                                                               relevant information is an interesting avenue to pursue
                                                               further study.
3.6. Underexplored Areas                                          While the value of machine learning methods to equity
                                                               time series forecasting has been shown in the short term,
So far we have discussed researchers tackling the prob-
                                                               it would be interesting to test further these techniques
lems of non-stationarity of financial data, different mar-
                                                               sampling data with lower frequency. Extending findings
ket regimes and low signal to noise ratio, deploying more
                                                               to monthly data would graduate machine learning meth-
and more complex algorithms. Further study should be
                                                               ods to applications beyond the realm of trading.
conducted focusing on feature engineering: rescaling and
                                                                  This survey shows how the playground for machine
labelling examples considering the related market envi-
                                                               learning experiment in equities is the S&P 500. Few stud-
ronment, therefore avoiding a fixed arbitrary threshold
                                                               ies try to corroborate findings extending the research
to define classes, would put the data in relation to the
                                                               to different countries. European Equities in particular,
context in which they were generated.
                                                               emerge as an area which remains rather under-explored.
   Framing the prediction problem in terms of classifi-
                                                                  Moreover, given that different algorithms are simu-
cation is vulnerable to not identifying large movements.
                                                               lated on different data, it is difficult to assess what could
On the other hand, developing a model to predict future
                                                               be considered the state of the art. It would be very fruitful
securities prices could have the pitfall that a fairly accu-
                                                               for the field if, perhaps as a cooperative project between
rate point estimate, albeit in the wrong direction, could
                                                               industry and academia, different financial multivariate
result in a loss-making course of action.
                                                               datasets ( e.g., one for US equities and one for European
   Research could be pursued in developing alternative
                                                               shares) would be engineered as standard, in order to
approaches in terms of training, for example, conceiv-
                                                               provide a more objective common ground to conduct
ing models learning jointly regression and classification
                                                               research.
tasks.
   In contrast with many of the studies reviewed that
use as input the target series and technical indicators,
further research could be conducted combining features
from different domains. A diverse data pool could point
4. Concluding Remarks                                                 confidence, and trading activity, The Journal of
                                                                      Finance 64 (2009) 549–578. URL: http://www.jstor.
Researchers tackled the problem of equity market fore-                org/stable/20487979.
casting, initially deploying statistical time-series forecast-    [7] P. L. Bernstein, Why the efficient market offers
ing techniques and then experimenting with complex                    hope to active management, Journal of Applied
deep learning architectures.                                          Corporate Finance 12 (1999) 129–136.
   In particular, LSTM has been proven to be useful in            [8] A. W. Lo, Adaptive Markets: Financial Evolution at
solving the vanishing/exploding gradient problem while                the Speed of Thought, Princeton University Press,
offering the advantage of modelling non-linear time se-               2017. URL: http://www.jstor.org/stable/j.ctvc77k3n.
ries data. Dilated convolutions, on their own, or as fea-         [9] X.-Y. Qian, S. Gao, Financial series prediction: Com-
ture extractors, constitute an effective technique when               parison between precision of time series models
dealing with long sequences. C2D centred architec-                    and machine learning methods, 2017.
tures are an emerging method capable of extracting local         [10] S. Gu, B. T. Kelly, D. Xiu, Empirical asset pric-
knowledge about the interaction of different features.                ing via machine learning, 2018. URL: https://doi.
   With this review, we advocate the pursuit of research              org/10.2139%2Fssrn.3281018. doi:10.2139/ssrn.
on every component of the machine learning value chain,               3281018.
rather than focusing exclusively on the algorithmic core.        [11] M. Ballings, D. V. den Poel, N. Hespeels, R. Gryp,
Most studies show results only on a specific market or                Evaluating multiple classifiers for stock price direc-
related to a specific period, thus, the general robustness            tion prediction, Expert Syst. Appl. 42 (2015) 7046–
of findings could be improved. Applying machine learn-                7056. URL: https://doi.org/10.1016/j.eswa.2015.05.
ing to financial time series is a challenging, however,               013. doi:10.1016/j.eswa.2015.05.013.
rewarding endeavour. Given the importance of the de-             [12] M. Dixon, D. Klabjan, J. H. Bang, Classiffication-
cisions based on equity market forecasts, even a small                based financial markets prediction using
improvement in model performance can have a major                     deep neural networks,         CoRR abs/1603.08604
impact.                                                               (2016). URL: http://arxiv.org/abs/1603.08604.
                                                                      arXiv:1603.08604.
                                                                 [13] T. Guida, Big Data and Machine Learning in Quan-
References                                                            titative Investment, Wiley, 2018.
 [1] E. F. Fama, Efficient capital markets: A review             [14] T. Fischer, C. Krauss, Deep learning with long
     of theory and empirical work, The Journal of Fi-                 short-term memory networks for financial mar-
     nance 25 (1970) 383–417. URL: http://www.jstor.                  ket predictions, Eur. J. Oper. Res. 270 (2018) 654–
     org/stable/2325486.                                              669. URL: https://doi.org/10.1016/j.ejor.2017.11.054.
 [2] N. Jegadeesh, S. Titman, Returns to buying                       doi:10.1016/j.ejor.2017.11.054.
     winners and selling losers: Implications for stock          [15] J. Eapen, D. Bein, A. Verma, Novel deep learning
     market efficiency, The Journal of Finance 48 (1993)              model with CNN and bi-directional LSTM for im-
     65–91. URL: https://onlinelibrary.wiley.com/doi/                 proved stock market index prediction, in: IEEE
     abs/10.1111/j.1540-6261.1993.tb04702.x. doi:https:               9th Annual Computing and Communication Work-
     //doi.org/10.1111/j.1540-6261.1993.                              shop and Conference, CCWC 2019, Las Vegas, NV,
     tb04702.x.                                                       USA, January 7-9, 2019, IEEE, 2019, pp. 264–270.
 [3] B. G. Malkiel, The efficient market hypothesis and               URL: https://doi.org/10.1109/CCWC.2019.8666592.
     its critics, The Journal of Economic Perspectives                doi:10.1109/CCWC.2019.8666592.
     17 (2003) 59–82. URL: http://www.jstor.org/stable/          [16] J. J. Murphy, Technical Analysis of the Futures Mar-
     3216840.                                                         kets: A Comprehensive Guide to Trading Methods
 [4] S. J. Grossman, J. E. Stiglitz, On the impossibil-               and Applications, Prentice Hall, 1986.
     ity of informationally efficient markets, American          [17] J. Patel, S. Shah, P. Thakkar, K. Kotecha, Predict-
     Economic Review 72 (1982) 393–408. doi:https:                    ing stock and stock price index movement using
     //doi.org/10.7916/D8765R99.                                      trend deterministic data preparation and machine
 [5] R. Bauer, M. Cosemans, P. Eichholtz,            Op-              learning techniques, Expert Syst. Appl. 42 (2015)
     tion trading and individual investor performance,                259–268. URL: https://doi.org/10.1016/j.eswa.2014.
     Journal of Banking & Finance 33 (2009) 731–                      07.040. doi:10.1016/j.eswa.2014.07.040.
     746. URL: https://www.sciencedirect.com/science/            [18] R. P. Schumaker, H. Chen, Textual analysis of
     article/pii/S0378426608002720. doi:https://doi.                  stock market prediction using breaking financial
     org/10.1016/j.jbankfin.2008.11.005.                              news: The azfin text system, ACM Trans. Inf.
 [6] M. Grinblatt, M. Keloharju, Sensation seeking, over-             Syst. 27 (2009) 12:1–12:19. URL: https://doi.org/
                                                                      10.1145/1462198.1462204. doi:10.1145/1462198.
     1462204.                                                       031. doi:10.1016/j.ejor.2016.10.031.
[19] S. Kazemian, S. Zhao, G. Penn, Evaluating senti-          [30] X. Yuan, J. Yuan, T. Jiang, Q. U. Ain, Integrated
     ment analysis in the context of securities trading, in:        long-term stock selection models based on feature
     Proceedings of the 54th Annual Meeting of the As-              selection and machine learning algorithms for china
     sociation for Computational Linguistics, ACL 2016,             stock market, IEEE Access 8 (2020) 22672–22685.
     August 7-12, 2016, Berlin, Germany, Volume 1: Long             URL: https://doi.org/10.1109/ACCESS.2020.2969293.
     Papers, The Association for Computer Linguistics,              doi:10.1109/ACCESS.2020.2969293.
     2016. URL: https://doi.org/10.18653/v1/p16-1197.          [31] C. Cortes, V. Vapnik,           Support-vector net-
     doi:10.18653/v1/p16-1197.                                      works, Mach. Learn. 20 (1995) 273–297. URL:
[20] J. Bollen, H. Mao, X. Zeng, Twitter mood pre-                  https://doi.org/10.1007/BF00994018. doi:10.1007/
     dicts the stock market, J. Comput. Sci. 2 (2011)               BF00994018.
     1–8. URL: https://doi.org/10.1016/j.jocs.2010.12.007.     [32] W. Huang, Y. Nakamori, S.-Y. Wang, Forecasting
     doi:10.1016/j.jocs.2010.12.007.                                stock market movement direction with support vec-
[21] D. Othan, Z. H. Kilimci, M. Uysal, Financial sen-              tor machine, Computers & operations research 32
     timent analysis for predicting direction of stocks             (2005) 2513–2522.
     using bidirectional encoder representations from          [33] K. Zbikowski, Using volume weighted support vec-
     transformers (bert) and deep learning models, in:              tor machines with walk forward testing and feature
     Proc. Int. Conf. Innov. Intell. Technol., volume 2019,         selection for the purpose of creating stock trad-
     2019, pp. 30–35.                                               ing strategy, Expert Syst. Appl. 42 (2015) 1797–
[22] P. Cho, J. H. Park, J. W. Song, Equity research report-        1805. URL: https://doi.org/10.1016/j.eswa.2014.10.
     driven investment strategy in korea using binary               001. doi:10.1016/j.eswa.2014.10.001.
     classification on stock price direction, IEEE Access      [34] J. Patel, S. Shah, P. Thakkar, K. Kotecha, Predicting
     9 (2021) 46364–46373.                                          stock market index using fusion of machine learn-
[23] D. Olson, C. Mossman, Neural network forecasts                 ing techniques, Expert Syst. Appl. 42 (2015) 2162–
     of canadian stock returns using accounting ratios,             2172. URL: https://doi.org/10.1016/j.eswa.2014.10.
     International Journal of Forecasting 19 (2003) 453–            031. doi:10.1016/j.eswa.2014.10.031.
     465.                                                      [35] E. Chong, C. Han, F. C. Park, Deep learning net-
[24] C. Tsai, Y. Lin, D. C. Yen, Y. Chen, Predicting                works for stock market analysis and prediction:
     stock returns by classifier ensembles, Appl. Soft              Methodology, data representations, and case stud-
     Comput. 11 (2011) 2452–2459. URL: https://doi.org/             ies, Expert Systems with Applications 83 (2017)
     10.1016/j.asoc.2010.10.001. doi:10.1016/j.asoc.                187–205.
     2010.10.001.                                              [36] X. Zhong, D. Enke, Predicting the daily return
[25] J. Alberg, Z. C. Lipton, Improving factor-based                direction of the stock market using hybrid machine
     quantitative investing by forecasting company fun-             learning algorithms, Financial Innovation 5 (2019)
     damentals, CoRR abs/1711.04837 (2017). URL: http:              1–20.
     //arxiv.org/abs/1711.04837. arXiv:1711.04837.             [37] M. F. Dixon, I. Halperin, P. Bilokon, Machine learn-
[26] L. Chauhan, J. Alberg, Z. C. Lipton, Uncertainty-              ing in Finance, volume 1406, Springer, 2020.
     aware lookahead factor models for quantitative in-        [38] S. Hochreiter, J. Schmidhuber, Long short-term
     vesting, in: Proceedings of the 37th International             memory, Neural Comput. 9 (1997) 1735–1780. URL:
     Conference on Machine Learning, ICML 2020, 13-                 https://doi.org/10.1162/neco.1997.9.8.1735. doi:10.
     18 July 2020, Virtual Event, volume 119 of Proceed-            1162/neco.1997.9.8.1735.
     ings of Machine Learning Research, PMLR, 2020, pp.        [39] D. M. Nelson, A. C. Pereira, R. A. De Oliveira, Stock
     1489–1499. URL: http://proceedings.mlr.press/v119/             market’s price movement prediction with lstm neu-
     chauhan20a.html.                                               ral networks, in: 2017 International joint confer-
[27] E. F. Fama, K. R. French, A five-factor asset pricing          ence on neural networks (IJCNN), Ieee, 2017, pp.
     model, Journal of financial economics 116 (2015)               1419–1426.
     1–22.                                                     [40] A. van den Oord, S. Dieleman, H. Zen, K. Si-
[28] L. Breiman,            Random forests,           Mach.         monyan, O. Vinyals, A. Graves, N. Kalchbren-
     Learn. 45 (2001) 5–32. URL: https://doi.org/                   ner, A. W. Senior, K. Kavukcuoglu, Wavenet: A
     10.1023/A:1010933404324.             doi:10.1023/A:            generative model for raw audio, in: The 9th
     1010933404324.                                                 ISCA Speech Synthesis Workshop, Sunnyvale, CA,
[29] C. Krauss, X. A. Do, N. Huck, Deep neural networks,            USA, 13-15 September 2016, ISCA, 2016, p. 125.
     gradient-boosted trees, random forests: Statistical            URL: http://www.isca-speech.org/archive/SSW_
     arbitrage on the s&p 500, Eur. J. Oper. Res. 259 (2017)        2016/abstracts/ssw9_DS-4_van_den_Oord.html.
     689–702. URL: https://doi.org/10.1016/j.ejor.2016.10.     [41] A. Borovykh, S. M. Bohté, C. W. Oosterlee, Condi-
     tional time series forecasting with convolutional           Information Processing Systems 32: Annual
     neural networks, 2017.                                      Conference on Neural Information Processing
[42] L. Börjesson, M. Singull, Forecasting financial time        Systems 2019, NeurIPS 2019, December 8-14, 2019,
     series through causal and dilated convolutional             Vancouver, BC, Canada, 2019, pp. 5509–5519. URL:
     neural networks, Entropy 22 (2020) 1094. URL:               https://proceedings.neurips.cc/paper/2019/hash/
     https://doi.org/10.3390/e22101094. doi:10.3390/             c9efe5f26cd17ba6216bbe2a7d26d490-Abstract.
     e22101094.                                                  html.
[43] M. U. Gudelek, S. A. Boluk, A. M. Ozbayoglu, A deep    [51] D. Bahdanau, K. Cho, Y. Bengio, Neural machine
     learning based stock trading model with 2-d CNN             translation by jointly learning to align and trans-
     trend detection, in: 2017 IEEE Symposium Series on          late, in: Y. Bengio, Y. LeCun (Eds.), 3rd Interna-
     Computational Intelligence, SSCI 2017, Honolulu,            tional Conference on Learning Representations,
     HI, USA, November 27 - Dec. 1, 2017, IEEE, 2017,            ICLR 2015, San Diego, CA, USA, May 7-9, 2015,
     pp. 1–8. URL: https://doi.org/10.1109/SSCI.2017.            Conference Track Proceedings, 2015. URL: http:
     8285188. doi:10.1109/SSCI.2017.8285188.                     //arxiv.org/abs/1409.0473.
[44] N. Cohen, T. Balch, M. Veloso, Trading via image       [52] X. Zhang, X. Liang, A. Zhiyuli, S. Zhang, R. Xu,
     classification, in: T. Balch (Ed.), ICAIF ’20: The          B. Wu, At-lstm: An attention-based lstm model for
     First ACM International Conference on AI in Fi-             financial time series prediction, in: IOP Conference
     nance, New York, NY, USA, October 15-16, 2020,              Series: Materials Science and Engineering, volume
     ACM, 2020, pp. 53:1–53:6. URL: https://doi.org/             569, IOP Publishing, 2019, p. 052037.
     10.1145/3383455.3422544. doi:10.1145/3383455.          [53] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang,
     3422544.                                                    G. W. Cottrell, A dual-stage attention-based re-
[45] Z. Zeng, T. Balch, M. Veloso, Deep video pre-               current neural network for time series prediction,
     diction for time series forecasting, in: A. Cali-           in: C. Sierra (Ed.), Proceedings of the Twenty-
     nescu, L. Szpruch (Eds.), ICAIF’21: 2nd ACM In-             Sixth International Joint Conference on Artificial
     ternational Conference on AI in Finance, Virtual            Intelligence, IJCAI 2017, Melbourne, Australia, Au-
     Event, November 3 - 5, 2021, ACM, 2021, pp. 39:1–           gust 19-25, 2017, ijcai.org, 2017, pp. 2627–2633.
     39:7. URL: https://doi.org/10.1145/3490354.3494404.         URL: https://doi.org/10.24963/ijcai.2017/366. doi:10.
     doi:10.1145/3490354.3494404.                                24963/ijcai.2017/366.
[46] L. Troiano, E. Mejuto, P. Kriplani, On feature re-     [54] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu,
     duction using deep learning for trend prediction            L. Wang, C. Li, M. Sun, Graph neural networks:
     in finance, CoRR abs/1704.03205 (2017). URL: http:          A review of methods and applications, AI Open 1
     //arxiv.org/abs/1704.03205. arXiv:1704.03205.               (2020) 57–81.
[47] W. Bao, J. Yue, Y. Rao, A deep learning framework      [55] D. Matsunaga, T. Suzumura, T. Takahashi, Explor-
     for financial time series using stacked autoencoders        ing graph neural networks for stock market predic-
     and long-short term memory, PloS one 12 (2017)              tions with rolling window analysis, arXiv preprint
     e0180944.                                                   arXiv:1909.10660 (2019).
[48] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza,
     B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville,
     Y. Bengio, Generative adversarial nets, in:
     Z. Ghahramani, M. Welling, C. Cortes, N. D.
     Lawrence, K. Q. Weinberger (Eds.), Advances in
     Neural Information Processing Systems 27: Annual
     Conference on Neural Information Processing
     Systems 2014, December 8-13 2014, Montreal,
     Quebec, Canada, 2014, pp. 2672–2680. URL:
     https://proceedings.neurips.cc/paper/2014/hash/
     5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html.
[49] X. Zhou, Z. Pan, G. Hu, S. Tang, C. Zhao, Stock
     market prediction on high-frequency data using
     generative adversarial nets., Mathematical Prob-
     lems in Engineering (2018).
[50] J. Yoon, D. Jarrett, M. van der Schaar, Time-series
     generative adversarial networks, in: H. M. Wallach,
     H. Larochelle, A. Beygelzimer, F. d’Alché-Buc,
     E. B. Fox, R. Garnett (Eds.), Advances in Neural