<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Machine Learning Methods for Equity Time Series Forecasting: A Compendium</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Matuozzo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul D. Yoo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Provetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria H. Kim</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Birkbeck College, University of London</institution>
          ,
          <addr-line>Malet St, London, WC1E 7HX</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Wollongong</institution>
          ,
          <addr-line>Wollongong, NSW 2522</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Machine learning is a method of building predictive models using a vast amount of data from diferent sources, capturing non-linear relationships between diferent variables. As a result, financial markets in general and stock markets in particular, ofer a promising ground for the application of such method. This survey examines machine learning methods for equity market forecasting, identifying gaps in current knowledge and suggesting potential avenues to pursue further research. Computer science-centred quantitative studies have focused mainly on algorithms, testing results mostly on US data on short time-frames, yet, feature engineering, and testing findings on diferent markets and diferent time horizons, appear to be under-explored. This study thus introduces the financial context for non-experts and moves to review diferent models and tools in the realm of statistical learning, and deep learning. We believe that this approach will prove to be efective in ifnancial practice to an interested reader without much prior knowledge of the finance literature. We survey the end-to-end deployment of machine learning to help readers from industry and academia to understand the peculiarities of applying these methods to equity market forecasting.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Machine Learning</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Time Series Forecasting</kwd>
        <kwd>Equity market forecasting</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>publications and original contributions from the field for
the critical steps of the machine learning deployment
Devising Equity markets forecasting is relevant not only process, from pre-processing to algorithm selection. We
for the parties directly involved in price formation (e.g., cite papers that are now part of history of financial
thecompanies and investors), but also for policymakers and ory for the benefit of non-experts. Research from other
regulators. Central banks have shown a growing interest domains is included where it is reasonable to assume that
in modelling equities to decide macro prudential policy, the related techniques could be ported to financial time
assess investors’ attitude towards risk, and in some cases series forecasting.
deploy capital in the market directly. For example, ac- The contributions of this study are as follows:
cording to the 30th of June 2022 13F SEC filings 1 , the
Swiss National Bank owns about $11.11 billions of Apple
and $7.49 billions of Microsoft.</p>
      <p>This study is driven by the need to review machine
learning techniques applied to equity markets time
series forecasting problems, with the objective to provide
an overview for practitioners, highlight areas under
explored and warranting further research.</p>
      <p>This study explores research from finance and
computer science, from industry and academia. As such, the
selection of papers for this literature review is not purely
bibliometric. We aim to cover a selection of high impact
• We review, with the caveat that the field is
evolving at a significant speed, the main machine
learning solutions for equity market forecasting
deployed in terms of features and algorithms. We
observe that most findings have been
corroborated on specific markets or on specific periods.</p>
      <p>Data sampling and forecasting horizon are mostly
daily.
• We highlight potential directions for future
research, emphasising the adoption of a more
data–centric approach. Ensemble methods should
be further researched as architectures to leverage
the peculiarities of financial data.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Framing the problem</title>
      <sec id="sec-2-1">
        <title>The non-stationarity and non-linearity of financial variables are the primary attributes to estimate a forecasting model for equity markets.</title>
        <sec id="sec-2-1-1">
          <title>2.1. The “Eficient Markets” Hypothesis</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Under the Eficient Markets Hypothesis (EMH) developed by Fama [1], the sequence of price changes must be unpredictable if they fully incorporate information and expectations of market participants.</title>
        <p>
          Yet, in the real world, analysing past returns and
processing publicly available information are instrumental
to build a forecasting model. Jagadeesh and Titman [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ],
showed that it is possible to generate statistically
significant abnormal returns, by buying the stocks that
exhibited top decile returns in the past (i.e., winners) and
selling stocks in the bottom decile, losers. Malkiel [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
documented several market anomalies that increase
predictability; however, these ineficiencies do not persist.
        </p>
        <p>
          If market participants firmly believed in EMH, it would
be irrational to trade, hence there would not be a
financial market. Grossman and Stiglitz [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] ofer the following
solution: the information derived by skilled investors is
not entirely reflected in the market. As a result,
investment research is compensated. Bauer et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] showed
skill persistence amongst retail option traders: top decile
performers over one year outperform individuals in the
bottom decile the following year.
        </p>
        <p>The emergence of big data gives rise to a paradigm shift
in the field of financial engineering where the
state-ofthe-art techniques and data processing capacity are of
utmost importance. Not only newsflow, economic and
stock market data are widely available in copious size,
however, unstructured data (e.g., geo-location), can be
accessed at any time, processed in diferent ways. As a
result, investors equipped with the latest technological
advancement can make use of an extensive set of
features. Machine learning algorithms do not require data
transformations to make a time series stationary, being
able to learn complex patterns in high dimensions.</p>
        <p>
          Quian [9] showed the outperformance of several
ma2.2. The “Adaptive Markets” Hypothesis chine learning algorithms (logistic regression, Neural
Traditional financial theories assume that market par- Network (NN), Support Vector Machines (SVM),
denoisticipants are rational and with the same utility function. ing Autoencoders) compared to ARIMA. Gu et al. [10]
When these assumptions are violated, behavioural biases compared ordinary least squared regression to tree-based
such as dissimilar risk appetite and inconsistent probabil- methods and NNs on the task of predicting equity excess
ity beliefs play a significant role to account for abnormal returns for US stocks. The authors ascribe the superior
profit opportunities. Grinblatt et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] showed that in- performance of machine learning techniques to the
abildividuals’ personal traits and sentiment tend to afect ity to accommodate a large number of predictors and
trading behavior. Bernstein [7] points out how financial learn non-linear relationships between covariates.
markets undergo regime shifts through an evolutionary
process, which explains the market dynamics. 3. Main Algorithm Deployed and
        </p>
        <p>The Adaptive Market Hypothesis (AMH) [8] ofers
an alternative framework: diferent agents interact and Features utilised
adapt in response to ever changing market environment,
competing to capture returns opportunities.</p>
        <p>Machine learning, being adaptive in nature, has the set
of tools to conduct research in the new paradigm: models
are non-parametric in nature and have the capability to
approximate complex non-linear, non-continuous
functions.</p>
        <p>In this section, diferent machine learning algorithms
and their applications to stock market time series are
presented. Figure 1 depicts the ML algorithms and their
related references. Here the literature is reviewed
focusing on the main steps required for an end-to-end machine
learning application, rather than looking, almost
exclusively, at the diferent algorithms deployed.</p>
        <sec id="sec-2-2-1">
          <title>3.1. Peculiarities in Financial Timeseries index constituents.</title>
          <p>An interesting feature engineering technique that tries
Framing time series forecasting problems as regression to capture both past observations and market regime can
might present an issue in the financial domain: the pre- be seen in [15]. The authors implement a relative change
diction might be very close to the actual value and yet transformation of the predictors in each data window:
induce the wrong action. For example, given a value of each element in a sample is subtracted and divided by the
the S&amp;P 500 of 4200, assume the 1-day forward forecast ifrst observation of the sequence. As a result, the network
generated by two diferent models to be 4210 and 4150 re- will be learning from a price change that is put in the
spectively, with the actual future level being 4190: while context of the short-term market environment in which
the first model is more accurate, it will suggest entering it was generated. Possible avenues of research would be
a long position causing a loss. The second model, despite to devise ad hoc methods to rescale data dynamically.
being further away from the observed value, would trig- The overwhelming majority of the papers surveyed
ger the correct (profitable) action, establishing a short in this study employs technical indicators [16] as
adposition. ditional features. Patel et al. [17] ran experiments on</p>
          <p>In several time series classification studies, the labeling Indian equities using diferent representations of
techniis conditional on a gating value g: an instance is labelled cal indicators: in their original form and in discrete form.
+1 if the response variable  &gt; , -1 otherwise. For The latter, called “trend-deterministic,” is obtained by
example, Ballings et al. [11] compared the performance discretising continuous values of oscillators (e.g the
Relof ensemble methods against single classifiers on the task ative Strength Index, RSI) in two categories: +1, -1. Their
to group European shares based on whether they will study shows that the trend deterministic representation
be up at least 25% in a year. Dixon et al. [12] framed improves results irrespective of the algorithm.
securities time series problems as multiclass attributing Sentiment analysis is performed to take the pulse of
+1 to a positive movement, -1 to a downward direction the market participants’ emotional state. Extracting
sentiand 0 as flat market, setting a value to determine ‘no ment from social media, particularly for short time frames
action’ in order to balance the classes. Labelling examples, applications has gained paramount importance given the
using an arbitrary fixed threshold does not take into rise of retail traders, whose decisions combined can have
account that returns are realised in a specific market a major impact on price formation. Sentiment indicators
regime. Defining for example a threshold of ± 5% to can be the direct result of investors polls like for example,
determine the bounds of the 0 class, might be appropriate the American Association of Individual Investors (AAII)
for a quiet market environment, however, it could be survey. The Investopedia portal2 computes the
Investopetoo small in a high volatility phase. Research should be dia Anxiety Index (IAI) based on their readers’ interests
conducted to scale raw data in ways that would result in for topics such as macroeconomics, negative market and
predictors that carry information content related to the credit disruptions.
respective market regime. Schumaker and Chen [18] proposed a model to predict</p>
          <p>Accuracy in time series classification problems might the intraday share price of the companies after a piece
not reflect profitability if a model fails to capture the of news was released. They used as predictors the price
correct direction for large moves. For example, consider forecast using a linear regression model and features
the sequence of returns +0.8%, +1%, +0.5%, -3%, and two derived from text analysis of news articles. Kazemian
binary classifiers (+1, -1): the first predicts four upwards et al. [19] deployed an SVM to classify the sentiment
movements, it is therefore 75% accurate. The second extracted from financial news on US stocks. Bollen et
classifier outputs -1, -1, +1, -1: it is 50% accurate. If the al. [20] showed that the emotional state, particularly the
outputs were to be fed into a trading system, the first ‘Calm’ dimension, of the broad population has predictive
classifier would generate a 0.7% loss while the second power on the daily changes in the DJIA. Othan et al.
would yield a profit of 1.7%. An approach to mitigate [21] proposed a model to predict the direction of Turkish
this issue, suggested by industry practitioners [13] is to stocks based exclusively on sentiment extracted from
train a model to predict returns and then value it based Twitter posts.
on direction accuracy (also called hit ratio). It is noteworthy that these sentiment centred
techniques aim to output short term predictions. It would
3.2. Features be interesting to extend this research over longer time
Despite the potential of machine learning algorithms to horizons, using texts from commentators with domain
extract knowledge in many dimensions, several studies knowledge, for example analysing blog posts from the
do not go beyond using lags of the target series as pre- likes of “Seeking Alpha”.
dictors. For example, Fischer and Krauss [14] used only Cho et al. [22] simulated an investment strategy based
percentage changes of adjusted prices of the S&amp;P 500</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2Please see www.investopedia.com</title>
        <p>on the text characteristics of equity research reports
published by brokerage houses. The authors deployed Part of
Speech tagging to engineer features such as: numbers of
nouns, adjectives, number of sentences. Classifiers, using
the text extracted features, were trained on the task of
recognising successful buy recommendations on Korean
equities.</p>
        <p>Fundamental analysis does much more than assessing
the intrinsic value of an asset or company given that it
includes both macro-econometrics and micro-econometrics
approaches. As such it focuses on financial statements
and macroeconomic data. Specific attention is warranted
to avoid look-ahead bias: new data releases and revisions
should enter the time series only when the updated
information is disclosed. For example, quarterly GDP data,
after the initial release, are later revised twice to get to
the final figure.</p>
        <p>Only a limited number of studies in the field adopt
fundamental data; this is probably due, on one hand to
the aforementioned issues, and on the other to the low
frequency nature. Olson and Mossman [23] studied
predictive modelling of Canadian stock excess returns and
direction using as inputs 61 accounting ratios. Tsai et
al. [24] studied the application of ensemble learning
methods to Taiwanese stock market quarterly direction
forecast. Predictors included company financial ratios
and macroeconomic indicators.</p>
        <p>Fundamental reported data are backward-looking.
Alberg et al. [25] and Chauhan et al. [26] while studying
neural network applications to factor investing [27] in
US stocks, built a predictive model of companies’
fundamental metrics. Stocks were then ranked and picked
to form portfolios based on the forecasted fundamental
values. These studies show that investing based on
predicted company data outperforms stock selection based
on reported information. An alternative solution could
be using, as predictors consensus forecast figures from
the financial analyst community. This approach would
also allow sampling data with a frequency higher than
the quarterly earnings cycle.</p>
        <p>As summarised in Table 1, technical features can be
easily retrieved from data vendors or calculated from
price series. They are mostly used by practitioners to
have short term timing indications, albeit with mixed
results. Studying sentiment allows to read the emotional
state of market participants. Sentiment data is mostly
unstructured therefore, requires specific Natural
Language Processing (NLP) tasks. Fundamental data are low
frequency and potentially subject to the dangers of
lookahead bias. However, this kind of data reveals the state
of the economy. A predictive model aiming to generate a
medium- or long-term forecast should take fundamental
variables in consideration.</p>
        <p>The vast majority of research consulted for this study
used only one or two types of features, as summarised in
Feature type
Technical
Sentiment
Fundamental</p>
        <p>Examples
RSI
Momentum
AAII
Twitter Data
GDP
P/E</p>
        <p>Pros
Easy to
compute
Capture
emotional st.</p>
        <p>Medium- or
Long-term</p>
        <p>Cons
Mixed
results
Requires
NLP
Look-ahead
bias</p>
        <sec id="sec-2-3-1">
          <title>3.3. Ensemble Learning</title>
          <p>Ensemble methods prescribe combing multiple
algorithms to achieve better performance than inducing only
the best member. Ensemble architectures are determined
by how a population of diverse models is deployed and
by the choice of aggregation mechanism to derive the
features’ values, and then feed this output into a diferent
model to produce the estimated index close price.</p>
          <p>SVM incurs in high computational costs with a large
dataset. The adoption of deep learning has grown in
popularity thanks to the capability to overcome SVM
scalability problems without sacrificing performance.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>3.5. Deep Learning</title>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Feed forward neural networks extend linear models</title>
        <p>by composing diferent functions that might carry an
Figure 4: Markets used for experiments. Global encompasses element of non-linearity. As a result, this is
equivaresearch using securities from multiple geographic areas. Oth- lent to expanding the linear problem  =  +  to
ers refer to securities of diferent asset classes, e.g., futures on  = ((,  )) + . An example of activation function
indices and commodities. is the well known Rectified Linear Unit (ReLU). In this
study the terms Deep Learning and Deep Neural
Network (DNN) are used for architectures encompassing
ifnal prediction. The Random Forest (RF) algorithm [ 28] more than 3 layers.
represents a prototypical example. Chong et al. [35] run experiments on stocks listed on</p>
        <p>In Krauss et al. [29], a statistical arbitrage strategy is the Korean market to make intraday returns prediction
simulated after an ensemble architecture with majority (5 minutes). The authors considered a network to be
voting (composed by a deep neural netwowk, gradient deep with just three hidden layers. Over the subsequent
boosted tree and RF) has classified S&amp;P 500 constituents few years, structures significantly more complex will
according to the probability to outperform the index. be developed. The impact of the network depth on a
Yuan et al. [30], deployed NN, SVM and RF, to classify ifnancial time series classification task performance was
Chinese stocks based on belonging to the top 30% of studied in [36].
companies in generating excess return over one month.</p>
        <p>The authors highlight an issue arising adopting stan- 3.5.1. Recurrent Neural Networks
dard k–folds cross validation for time series problems:
future data might be used to predict past observations, A Recurrent Neural Network (RNN) processes at any
overstating the performance. They evaluate algorithms given stage as inputs information from that time step,
using a sliding window train and test set instead. Evaluat- and a hidden state derived from the previous one. Dixon
ing a model on periods prior to the training set might be et al. [37] show that a RNN, with linear activation, can
desirable in certain cases: for example, for stress testing be assimilated to a Auto Regressive time series model.
purposes. The output of a given time step can be written as:  =
 ( + − 1 + ).</p>
        <p>The Long Short Term Memory (LSTM) cell [38]
over3.4. Support Vector Machines comes the RNNs limitations (exploding or vanishing
graThe Support Vector Machine (SVM) [31] is a very efective dient). A LSTM neuron adds to the recurrent unit hidden
algorithm able to solve non-linearly separable problems state (short term memory), a long-term cell state
conby enlarging the feature space thanks to the “kernel trick”. trolled by diferent gates. LSTM network can therefore
Using the kernel functions allow to solve the problem in handle long sequences.
the original dimension without having to transform the A comparison of LSTM to RF, NNs can be seen in [39]:
data in a more complex space. the task in question was the prediction of the direction of</p>
        <p>SVMs were compared in [32] to linear and quadratic 5 components of the BOVESPA index in a high frequency
discrimination analysis, and Elman network to forecast setting. Fischer and Krauss [14] used as feature only the
the weekly direction of the Nikkei 225 index. Zbikowski return series of the past 240 observations to classify 1
[33] proposed to consider volume data by plugging this day forward returns, showing the ability of LSTM cells to
information directly in the SVM formulation, multiplying learn long sequences. Alonso et al. [13] tackled
predictthe hyper-parameter representing the cost of margin ing the 1 day ahead return and the related direction of
violation, C, by a coeficient v based on the transaction 50 constituents of the S&amp;P 500. This study showed that
volume in a given security over the input window. Patel increasing the number of time steps to train the model
et al. [34] deployed a 2 stages model with the task of from 1 to 10 improves the performance of the LSTM.
predicting forward closing prices. Their architecture
comprises a Support Vector Regressor (SVR) to predict the
3.5.2. Convolutional Neural Networks
that have proven to be successful in other domains.</p>
        <p>A recent stream of research leverages the success
achieved in computer vision directly: time series data
is transformed in order to assume the characteristics of
pixel values and intensity. Cohen et al. [44] converts
a time series classification problem (spotting technical
patterns) as an image recognition task; Zeng et al. [45]
ported a multivariate time series forecasting problem in
US equities to the video prediction domain.</p>
        <p>Convolutional Neural Networks (CNNs) have been
particularly efective in computer vision. Convolution layers
exhibit equivariance to translation; in time series data
this property allows the algorithm, to detect patterns
within a sample and recognise them, notwithstanding at
which point in time they appear. Convolution filters over
1-dimension (1D) can be deployed to slide across time,
deriving hidden patterns within samples. 1-dimensional
convolution layers can be deployed as feature
extractors before LSTMs, forming a CNN-LSTM architecture 3.5.3. Autoencoders
proposed in [15]. Autoencoders (AE) are models conceived to replicate</p>
        <p>The dilation rate d of a convolution layer indicates their inputs without supervision. The design usually
prethe frequency of the number of elements in the input to scribes an input layer, an encoding layer with a smaller
which apply a convolution. Researchers from DeepMind number of units and a decoding layer of the same size
conceived an architecture called WaveNet [40], able to of the input, providing a reconstructed representation.
achieve state-of-the-art performance in several text to Autoencoders can be applied to financial time series
prespeech tasks without the use of RNNs. It is obtained by diction problems to reduce dimensions or noise. Troiano
stacking several 1D convolution layers, doubling the dila- et al. [46] focused on the impact of the feature reduction
tion rate at every layer. Thus, the receptive field is larger, stage (starting from 40 technical indicators). Denoising
albeit without additional parameters. The initial layers Autoencoders were used in [9] in order to create a
lalearn short term patterns, while as data progresses to- tent feature representation, adding a noise component
wards the output, sequences of longer term are extracted. that randomly alters the raw data, forcing the algorithm</p>
        <p>Borovykh et al. [41] adapted the Wavenet architec- to reconstruct robust inputs. Bao et al. [47] stacked 5
ture, using ReLU as activation function, for multivariate autoencoders connecting the final output to a LSTM
netifnancial time series forecasting: 1D dilated causal con- work to predict the one step ahead level of several stock
volutions run independently for each input time series market indices.
to be combined before making the prediction.
Experiments were run on several financial instruments, on 1 3.5.4. Generative Adversarial Networks
day forward return prediction tasks. Wavenet and LSTM
performed equally in terms of forecasting the direction, Generative Adversarial Networks (GANs) [48] are
comhowever, Wavenet outputs more accurate point estima- posed by 2 competing models: the generator aims to
tion. Borjesson et al. [42] proposed a WaveNet inspired approximate the data distribution and outputs synthetic
model, using as activation function the Scaled exponen- samples; the discriminator takes either actual or
syntial Linear Unit (SeLU) for the convolution layers. They thetic data and estimates the probability that the sample
ran experiments with the goal of predicting the next day in question is genuine. The generator aims to maximise
price and trend of the S&amp;P 500. the probability that the discriminator will make a mistake.</p>
        <p>Convolutions over two dimensions (2D) could be used Zhou et al. [49] applied GANs to make one step ahead
to extract the most salient features and the most signifi- predictions in 42 Chinese stocks within a high frequency
cant sub sequence in a sample. This approach, recently framework. The generator is an LSTM network, while a
emerging in the literature, ofers the advantage to learn 1D-CNN plays the role of the discriminator.
jointly patterns within each predictor time series, and Back testing is a common approach to measure the
the relationship between features. This could be particu- profitability of a model against past trends in the precise
larly advantageous with economic time series, where the order in which they occurred. GANs could be employed
correlation between variables changes over time, albeit to provide synthetic test sets, as if engineers were
forin a recurrent fashion. ward testing algorithms devised by researchers. A recent</p>
        <p>Gudelek et al. [43] used 2D convolutions simulat- architecture fit for this purpose is Time–series Generative
ing trading strategies on several Exchange Traded Fund Adversarial Network (TimeGAN) [50]. This framework
(ETFs). Prices were diferenced once to mitigate the is- was conceived to capture the time dependent conditional
sues of non-stationarity and transformed to be in a range distribution of data. In addition to the unsupervised
adbetween +1 and –1. The authors interpret the prediction versarial loss, the model prescribes the use of a supervised
in this range as confidence values. It would be interesting loss based on the original data.
to conduct further research applying to financial time
series tasks, architectures centred on 2D convolutions
3.5.5. Attention
Attention mechanisms, pioneered by Bahdanau et al. [51],
allow a decoder to focus at each step of a sequence on
the most relevant (encoded) input. The decoder
computes a weighted sum of the output of the encoder; the
weights are learned by an attention layer, using as
inputs the encoder output concatenated with the decoder
previous hidden state. These techniques, developed for
neural machine translation problems, have been applied
to time series forecasting: attention mechanism weights
diferently each time step of a sequence, this is then fed
to a forecasting model to derive a prediction. Zhang
et al. [52]proposed the AT-LSTM architecture: an
attention mechanism using LSTM as encoder, assigns diferent
weights to input features (time steps). Attention weighted
sequences are than used as inputs into an LSTM network
to output the prediction. A more complex version of this
architecture can be seen in [53]: here the input series is
ifrst encoded with an LSTM and an attention layer
assigns weights to the features at each time step, obtaining
an attention weighted features matrix. In the next stage
another LSTM based attention mechanism weights the
diferent hidden states across time steps. The final stage
prescribes an LSTM bloc to make a prediction using as
inputs the output of the previous stage and the target time
series. This Dual-Stage Attention RNN (DA-RNN) aims
to capture the most important features while learning
time dependencies.</p>
        <sec id="sec-2-4-1">
          <title>3.6. Underexplored Areas</title>
          <p>So far we have discussed researchers tackling the
problems of non-stationarity of financial data, diferent
market regimes and low signal to noise ratio, deploying more
and more complex algorithms. Further study should be
conducted focusing on feature engineering: rescaling and
labelling examples considering the related market
environment, therefore avoiding a fixed arbitrary threshold
to define classes, would put the data in relation to the
context in which they were generated.</p>
          <p>Framing the prediction problem in terms of
classification is vulnerable to not identifying large movements.
On the other hand, developing a model to predict future
securities prices could have the pitfall that a fairly
accurate point estimate, albeit in the wrong direction, could
result in a loss-making course of action.</p>
          <p>Research could be pursued in developing alternative
approaches in terms of training, for example,
conceiving models learning jointly regression and classification
tasks.</p>
          <p>In contrast with many of the studies reviewed that
use as input the target series and technical indicators,
further research could be conducted combining features
from diferent domains. A diverse data pool could point
towards unexplored avenues in economics research.</p>
          <p>The literature reviewed in this study adopts input time
series exclusively in tabular form, however, financial
professional place considerable value in extracting
knowledge from the relationship and interaction amongst
different variables. Graph Neural Networks (GNNs) [54] are
able to extract knowledge from the interplay between
diferent nodes in a graph, therefore could represent a
novel approach, worthy of further research in our field.
An exploratory study related to Japanese equities can be
seen in [55]. The authors motivated the use of GNNs to
leverage inter-market and inter-company relationships.</p>
          <p>The main paradigm adopted by both industry and
academia when devising a machine learning solution has
been to consider a dataset as fixed, focusing on algorithm
development. This survey shows that there is potential
for progress, keeping the algorithm fixed, placing the
data at the centre of the research process instead.</p>
          <p>Ensemble learning approaches combining neural
networks have been the winning architecture in image
recognition competitions. Similar ideas could be explored,
applying diversified network ensembles to financial data.
Diferent models could be trained on diferent market
regimes. One way to deal with a changing environment,
is to constantly discard old data and retrain the model
with more recent examples. Nevertheless, often, market
dynamics observed in the past occur in a similar way
at a later stage. Conceiving a way to use past, however,
relevant information is an interesting avenue to pursue
further study.</p>
          <p>While the value of machine learning methods to equity
time series forecasting has been shown in the short term,
it would be interesting to test further these techniques
sampling data with lower frequency. Extending findings
to monthly data would graduate machine learning
methods to applications beyond the realm of trading.</p>
          <p>This survey shows how the playground for machine
learning experiment in equities is the S&amp;P 500. Few
studies try to corroborate findings extending the research
to diferent countries. European Equities in particular,
emerge as an area which remains rather under-explored.</p>
          <p>Moreover, given that diferent algorithms are
simulated on diferent data, it is dificult to assess what could
be considered the state of the art. It would be very fruitful
for the field if, perhaps as a cooperative project between
industry and academia, diferent financial multivariate
datasets ( e.g., one for US equities and one for European
shares) would be engineered as standard, in order to
provide a more objective common ground to conduct
research.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Concluding Remarks</title>
      <p>confidence, and trading activity, The Journal of
Finance 64 (2009) 549–578. URL: http://www.jstor.</p>
      <p>Researchers tackled the problem of equity market fore- org/stable/20487979.
casting, initially deploying statistical time-series forecast- [7] P. L. Bernstein, Why the eficient market ofers
ing techniques and then experimenting with complex hope to active management, Journal of Applied
deep learning architectures. Corporate Finance 12 (1999) 129–136.</p>
      <p>In particular, LSTM has been proven to be useful in [8] A. W. Lo, Adaptive Markets: Financial Evolution at
solving the vanishing/exploding gradient problem while the Speed of Thought, Princeton University Press,
ofering the advantage of modelling non-linear time se- 2017. URL: http://www.jstor.org/stable/j.ctvc77k3n.
ries data. Dilated convolutions, on their own, or as fea- [9] X.-Y. Qian, S. Gao, Financial series prediction:
Comture extractors, constitute an efective technique when parison between precision of time series models
dealing with long sequences. C2D centred architec- and machine learning methods, 2017.
tures are an emerging method capable of extracting local [10] S. Gu, B. T. Kelly, D. Xiu, Empirical asset
pricknowledge about the interaction of diferent features. ing via machine learning, 2018. URL: https://doi.</p>
      <p>With this review, we advocate the pursuit of research org/10.2139%2Fssrn.3281018. doi:10.2139/ssrn.
on every component of the machine learning value chain, 3281018.
rather than focusing exclusively on the algorithmic core. [11] M. Ballings, D. V. den Poel, N. Hespeels, R. Gryp,
Most studies show results only on a specific market or Evaluating multiple classifiers for stock price
direcrelated to a specific period, thus, the general robustness tion prediction, Expert Syst. Appl. 42 (2015) 7046–
of findings could be improved. Applying machine learn- 7056. URL: https://doi.org/10.1016/j.eswa.2015.05.
ing to financial time series is a challenging, however, 013. doi:10.1016/j.eswa.2015.05.013.
rewarding endeavour. Given the importance of the de- [12] M. Dixon, D. Klabjan, J. H. Bang,
Classificationcisions based on equity market forecasts, even a small based ifnancial markets prediction using
improvement in model performance can have a major deep neural networks, CoRR abs/1603.08604
impact. (2016). URL: http://arxiv.org/abs/1603.08604.
arXiv:1603.08604.</p>
      <p>References [13] T. Guida, Big Data and Machine Learning in
Quantitative Investment, Wiley, 2018.
[14] T. Fischer, C. Krauss, Deep learning with long
short-term memory networks for financial
market predictions, Eur. J. Oper. Res. 270 (2018) 654–
669. URL: https://doi.org/10.1016/j.ejor.2017.11.054.</p>
      <p>doi:10.1016/j.ejor.2017.11.054.
[15] J. Eapen, D. Bein, A. Verma, Novel deep learning
model with CNN and bi-directional LSTM for
improved stock market index prediction, in: IEEE
9th Annual Computing and Communication
Workshop and Conference, CCWC 2019, Las Vegas, NV,
USA, January 7-9, 2019, IEEE, 2019, pp. 264–270.</p>
      <p>URL: https://doi.org/10.1109/CCWC.2019.8666592.
doi:10.1109/CCWC.2019.8666592.</p>
      <p>1462204. 031. doi:10.1016/j.ejor.2016.10.031.
[19] S. Kazemian, S. Zhao, G. Penn, Evaluating senti- [30] X. Yuan, J. Yuan, T. Jiang, Q. U. Ain, Integrated
ment analysis in the context of securities trading, in: long-term stock selection models based on feature
Proceedings of the 54th Annual Meeting of the As- selection and machine learning algorithms for china
sociation for Computational Linguistics, ACL 2016, stock market, IEEE Access 8 (2020) 22672–22685.
August 7-12, 2016, Berlin, Germany, Volume 1: Long URL: https://doi.org/10.1109/ACCESS.2020.2969293.
Papers, The Association for Computer Linguistics, doi:10.1109/ACCESS.2020.2969293.
2016. URL: https://doi.org/10.18653/v1/p16-1197. [31] C. Cortes, V. Vapnik, Support-vector
netdoi:10.18653/v1/p16-1197. works, Mach. Learn. 20 (1995) 273–297. URL:
[20] J. Bollen, H. Mao, X. Zeng, Twitter mood pre- https://doi.org/10.1007/BF00994018. doi:10.1007/
dicts the stock market, J. Comput. Sci. 2 (2011) BF00994018.
1–8. URL: https://doi.org/10.1016/j.jocs.2010.12.007. [32] W. Huang, Y. Nakamori, S.-Y. Wang, Forecasting
doi:10.1016/j.jocs.2010.12.007. stock market movement direction with support
vec[21] D. Othan, Z. H. Kilimci, M. Uysal, Financial sen- tor machine, Computers &amp; operations research 32
timent analysis for predicting direction of stocks (2005) 2513–2522.
using bidirectional encoder representations from [33] K. Zbikowski, Using volume weighted support
vectransformers (bert) and deep learning models, in: tor machines with walk forward testing and feature
Proc. Int. Conf. Innov. Intell. Technol., volume 2019, selection for the purpose of creating stock
trad2019, pp. 30–35. ing strategy, Expert Syst. Appl. 42 (2015) 1797–
[22] P. Cho, J. H. Park, J. W. Song, Equity research report- 1805. URL: https://doi.org/10.1016/j.eswa.2014.10.
driven investment strategy in korea using binary 001. doi:10.1016/j.eswa.2014.10.001.
classification on stock price direction, IEEE Access [34] J. Patel, S. Shah, P. Thakkar, K. Kotecha, Predicting
9 (2021) 46364–46373. stock market index using fusion of machine
learn[23] D. Olson, C. Mossman, Neural network forecasts ing techniques, Expert Syst. Appl. 42 (2015) 2162–
of canadian stock returns using accounting ratios, 2172. URL: https://doi.org/10.1016/j.eswa.2014.10.
International Journal of Forecasting 19 (2003) 453– 031. doi:10.1016/j.eswa.2014.10.031.
465. [35] E. Chong, C. Han, F. C. Park, Deep learning
net[24] C. Tsai, Y. Lin, D. C. Yen, Y. Chen, Predicting works for stock market analysis and prediction:
stock returns by classifier ensembles, Appl. Soft Methodology, data representations, and case
studComput. 11 (2011) 2452–2459. URL: https://doi.org/ ies, Expert Systems with Applications 83 (2017)
10.1016/j.asoc.2010.10.001. doi:10.1016/j.asoc. 187–205.</p>
      <p>2010.10.001. [36] X. Zhong, D. Enke, Predicting the daily return
[25] J. Alberg, Z. C. Lipton, Improving factor-based direction of the stock market using hybrid machine
quantitative investing by forecasting company fun- learning algorithms, Financial Innovation 5 (2019)
damentals, CoRR abs/1711.04837 (2017). URL: http: 1–20.</p>
      <p>//arxiv.org/abs/1711.04837. arXiv:1711.04837. [37] M. F. Dixon, I. Halperin, P. Bilokon, Machine
learn[26] L. Chauhan, J. Alberg, Z. C. Lipton, Uncertainty- ing in Finance, volume 1406, Springer, 2020.
aware lookahead factor models for quantitative in- [38] S. Hochreiter, J. Schmidhuber, Long short-term
vesting, in: Proceedings of the 37th International memory, Neural Comput. 9 (1997) 1735–1780. URL:
Conference on Machine Learning, ICML 2020, 13- https://doi.org/10.1162/neco.1997.9.8.1735. doi:10.
18 July 2020, Virtual Event, volume 119 of Proceed- 1162/neco.1997.9.8.1735.
ings of Machine Learning Research, PMLR, 2020, pp. [39] D. M. Nelson, A. C. Pereira, R. A. De Oliveira, Stock
1489–1499. URL: http://proceedings.mlr.press/v119/ market’s price movement prediction with lstm
neuchauhan20a.html. ral networks, in: 2017 International joint
confer[27] E. F. Fama, K. R. French, A five-factor asset pricing ence on neural networks (IJCNN), Ieee, 2017, pp.
model, Journal of financial economics 116 (2015) 1419–1426.</p>
      <p>1–22. [40] A. van den Oord, S. Dieleman, H. Zen, K.
Si[28] L. Breiman, Random forests, Mach. monyan, O. Vinyals, A. Graves, N.
KalchbrenLearn. 45 (2001) 5–32. URL: https://doi.org/ ner, A. W. Senior, K. Kavukcuoglu, Wavenet: A
10.1023/A:1010933404324. doi:10.1023/A: generative model for raw audio, in: The 9th
1010933404324. ISCA Speech Synthesis Workshop, Sunnyvale, CA,
[29] C. Krauss, X. A. Do, N. Huck, Deep neural networks, USA, 13-15 September 2016, ISCA, 2016, p. 125.
gradient-boosted trees, random forests: Statistical URL: http://www.isca-speech.org/archive/SSW_
arbitrage on the s&amp;p 500, Eur. J. Oper. Res. 259 (2017) 2016/abstracts/ssw9_DS-4_van_den_Oord.html.
689–702. URL: https://doi.org/10.1016/j.ejor.2016.10. [41] A. Borovykh, S. M. Bohté, C. W. Oosterlee,
Conditional time series forecasting with convolutional Information Processing Systems 32: Annual
neural networks, 2017. Conference on Neural Information Processing
[42] L. Börjesson, M. Singull, Forecasting financial time Systems 2019, NeurIPS 2019, December 8-14, 2019,
series through causal and dilated convolutional Vancouver, BC, Canada, 2019, pp. 5509–5519. URL:
neural networks, Entropy 22 (2020) 1094. URL: https://proceedings.neurips.cc/paper/2019/hash/
https://doi.org/10.3390/e22101094. doi:10.3390/ c9efe5f26cd17ba6216bbe2a7d26d490-Abstract.
e22101094. html.
[43] M. U. Gudelek, S. A. Boluk, A. M. Ozbayoglu, A deep [51] D. Bahdanau, K. Cho, Y. Bengio, Neural machine
learning based stock trading model with 2-d CNN translation by jointly learning to align and
transtrend detection, in: 2017 IEEE Symposium Series on late, in: Y. Bengio, Y. LeCun (Eds.), 3rd
InternaComputational Intelligence, SSCI 2017, Honolulu, tional Conference on Learning Representations,
HI, USA, November 27 - Dec. 1, 2017, IEEE, 2017, ICLR 2015, San Diego, CA, USA, May 7-9, 2015,
pp. 1–8. URL: https://doi.org/10.1109/SSCI.2017. Conference Track Proceedings, 2015. URL: http:
8285188. doi:10.1109/SSCI.2017.8285188. //arxiv.org/abs/1409.0473.
[44] N. Cohen, T. Balch, M. Veloso, Trading via image [52] X. Zhang, X. Liang, A. Zhiyuli, S. Zhang, R. Xu,
classification, in: T. Balch (Ed.), ICAIF ’20: The B. Wu, At-lstm: An attention-based lstm model for
First ACM International Conference on AI in Fi- ifnancial time series prediction, in: IOP Conference
nance, New York, NY, USA, October 15-16, 2020, Series: Materials Science and Engineering, volume
ACM, 2020, pp. 53:1–53:6. URL: https://doi.org/ 569, IOP Publishing, 2019, p. 052037.
10.1145/3383455.3422544. doi:10.1145/3383455. [53] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang,
3422544. G. W. Cottrell, A dual-stage attention-based
re[45] Z. Zeng, T. Balch, M. Veloso, Deep video pre- current neural network for time series prediction,
diction for time series forecasting, in: A. Cali- in: C. Sierra (Ed.), Proceedings of the
Twentynescu, L. Szpruch (Eds.), ICAIF’21: 2nd ACM In- Sixth International Joint Conference on Artificial
ternational Conference on AI in Finance, Virtual Intelligence, IJCAI 2017, Melbourne, Australia,
AuEvent, November 3 - 5, 2021, ACM, 2021, pp. 39:1– gust 19-25, 2017, ijcai.org, 2017, pp. 2627–2633.
39:7. URL: https://doi.org/10.1145/3490354.3494404. URL: https://doi.org/10.24963/ijcai.2017/366. doi:10.
doi:10.1145/3490354.3494404. 24963/ijcai.2017/366.
[46] L. Troiano, E. Mejuto, P. Kriplani, On feature re- [54] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu,
duction using deep learning for trend prediction L. Wang, C. Li, M. Sun, Graph neural networks:
in finance, CoRR abs/1704.03205 (2017). URL: http: A review of methods and applications, AI Open 1
//arxiv.org/abs/1704.03205. arXiv:1704.03205. (2020) 57–81.
[47] W. Bao, J. Yue, Y. Rao, A deep learning framework [55] D. Matsunaga, T. Suzumura, T. Takahashi,
Explorfor financial time series using stacked autoencoders ing graph neural networks for stock market
predicand long-short term memory, PloS one 12 (2017) tions with rolling window analysis, arXiv preprint
e0180944. arXiv:1909.10660 (2019).
[48] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza,</p>
      <p>B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville,
Y. Bengio, Generative adversarial nets, in:
Z. Ghahramani, M. Welling, C. Cortes, N. D.</p>
      <p>Lawrence, K. Q. Weinberger (Eds.), Advances in
Neural Information Processing Systems 27: Annual
Conference on Neural Information Processing
Systems 2014, December 8-13 2014, Montreal,
Quebec, Canada, 2014, pp. 2672–2680. URL:
https://proceedings.neurips.cc/paper/2014/hash/
5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html.
[49] X. Zhou, Z. Pan, G. Hu, S. Tang, C. Zhao, Stock
market prediction on high-frequency data using
generative adversarial nets., Mathematical
Problems in Engineering (2018).
[50] J. Yoon, D. Jarrett, M. van der Schaar, Time-series
generative adversarial networks, in: H. M. Wallach,
H. Larochelle, A. Beygelzimer, F. d’Alché-Buc,
E. B. Fox, R. Garnett (Eds.), Advances in Neural</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Fama</surname>
          </string-name>
          ,
          <article-title>Eficient capital markets: A review of theory and empirical work</article-title>
          ,
          <source>The Journal of Finance</source>
          <volume>25</volume>
          (
          <year>1970</year>
          )
          <fpage>383</fpage>
          -
          <lpage>417</lpage>
          . URL: http://www.jstor. org/stable/2325486.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jegadeesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Titman</surname>
          </string-name>
          ,
          <article-title>Returns to buying winners and selling losers: Implications for stock market eficiency</article-title>
          ,
          <source>The Journal of Finance</source>
          <volume>48</volume>
          (
          <year>1993</year>
          )
          <fpage>65</fpage>
          -
          <lpage>91</lpage>
          . URL: https://onlinelibrary.wiley.com/doi/ abs/10.1111/j.1540-
          <fpage>6261</fpage>
          .
          <year>1993</year>
          .tb04702.x. doi:https: //doi.org/10.1111/j.1540-
          <fpage>6261</fpage>
          .
          <year>1993</year>
          . tb04702.x.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>B. G. Malkiel,</surname>
          </string-name>
          <article-title>The eficient market hypothesis and its critics</article-title>
          ,
          <source>The Journal of Economic Perspectives</source>
          <volume>17</volume>
          (
          <year>2003</year>
          )
          <fpage>59</fpage>
          -
          <lpage>82</lpage>
          . URL: http://www.jstor.org/stable/ [16]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <article-title>Technical Analysis of the Futures Mar3216840. kets: A Comprehensive Guide to Trading Methods</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Grossman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Stiglitz</surname>
          </string-name>
          ,
          <article-title>On the impossibil-</article-title>
          and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          , Prentice Hall,
          <year>1986</year>
          .
          <article-title>ity of informationally eficient markets</article-title>
          ,
          <source>American</source>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Thakkar</surname>
          </string-name>
          , K. Kotecha,
          <source>PredictEconomic Review</source>
          <volume>72</volume>
          (
          <year>1982</year>
          )
          <fpage>393</fpage>
          -
          <lpage>408</lpage>
          . doi:https:
          <article-title>ing stock and stock price index movement using //doi</article-title>
          .org/10.7916/D8765R99.
          <article-title>trend deterministic data preparation and machine</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cosemans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Eichholtz</surname>
          </string-name>
          , Op- learning techniques,
          <source>Expert Syst. Appl</source>
          .
          <volume>42</volume>
          (
          <year>2015</year>
          )
          <article-title>tion trading and individual investor performance</article-title>
          ,
          <fpage>259</fpage>
          -
          <lpage>268</lpage>
          . URL: https://doi.org/10.1016/j.eswa.
          <source>2014. Journal of Banking &amp; Finance</source>
          <volume>33</volume>
          (
          <year>2009</year>
          )
          <fpage>731</fpage>
          -
          <lpage>07</lpage>
          .040. doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2014</year>
          .
          <volume>07</volume>
          .040. 746. URL: https://www.sciencedirect.com/science/ [18]
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Schumaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Textual analysis of article/pii/S0378426608002720</article-title>
          . doi:https://doi. stock
          <source>market prediction using breaking financial org/10</source>
          .1016/j.jbankfin.
          <year>2008</year>
          .
          <volume>11</volume>
          .005.
          <article-title>news: The azfin text system</article-title>
          ,
          <source>ACM Trans. Inf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grinblatt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Keloharju</surname>
          </string-name>
          , Sensation seeking, over- Syst.
          <volume>27</volume>
          (
          <year>2009</year>
          )
          <volume>12</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          :
          <fpage>19</fpage>
          . URL: https://doi.org/ 10.1145/1462198.1462204. doi:
          <volume>10</volume>
          .1145/1462198.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>