<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Forecasting the U.S. Stock Market via Levenberg-Marquardt and Haken Arti cial Neural Networks Using ICA&amp;PCA Pre-Processing Techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Golovachev Sergey</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University, Higher School of Economics, Moscow Department of World Economics and International Affairs</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Artificial neural networks (ANN) is an approach to solving different tasks. In this paper we forecast U.S. stock market movements using two types of artificial neural networks: a network based on the Levenberg-Marquardt learning mechanism and a synergetic network which was described by German scientist Herman Haken. The LevenbergMarquardt ANN is widely used for forecasting financial markets, while the Haken ANN is mostly known for the tasks of image recognition. In this paper we apply the Haken ANN for the prediction of the stock market movements. Furthermore, we introduce a novation concerning preprocessing of the input data in order to enhance the predicting power of the abovementioned networks. For this purpose we use Independent Component Analysis (ICA) and Principal Component Analysis (PCA). We also suggest using ANNs to reveal the “mean reversion” phenomenon in the stock returns. The results of the forecasting are compared with the forecasts of the simple auto-regression model and market index dynamics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Artificial neural networks are a modern approach to various problem-solving
tasks. For example, they are used for image recognition and in different
biophysics researches. One of the possible applications of ANNs is forecasting and
simulation of financial markets. The idea is the following: a researcher tries to
construct such an ANN, so that it can successfully imitate the decision-making
process of the “average” stock market participant. This hypothesis results from
the fact that ANNs, in turn, try to imitate the design of the biological neural
networks, in particular the ones which exist in human brain.</p>
      <p>A market participant is an investor whose individual actions have no influence
on the price fluctuations, for example a trader operating with insignificant sums
of money. Moreover, we argue that the market participant makes his decisions
solely on the analysis of the previous dynamics of the stock – thus we assume
endogenous price-making mechanism. Furthermore, we set the homogeneity of
the investors so that they all have the same decision-making algorithms (that is
why we call them “average”).</p>
      <p>While designing the Levenberg-Marquardt ANN it is essential to set some
of the key parameters of the network. Firstly, we must set the architecture of
the network (number of layers, number of neurons in each, including number of
input and output neurons). In our research we use simple three-layer ANN with
2 input neurons, 2 neurons in the hidden layer and 1 output neuron. The results
show that such architecture is quite effective while it does not lead to lengthy
computational procedure. Secondly, we determine the activation function in the
hidden layer which performs a non-linear transformation of the input data. We
use a standard logistic function with the range of values [0;1].</p>
      <p>The key feature of the Levenberg-Marquardt ANN is using of back-propagation
of the errors of the previous iterations as a learning mechanism. The idea of
backpropagation rests on the attempt of communicating the error of the network (of
the output neuron, in particular) to all other neurons of the network. As a
result, after a number of iterations the network optimizes the weights with which
neurons in different layers are connected, and the minimum of error is reached.
Propagation of the error through the network also requires usage of the Jacobian
matrix which contains first derivatives of the elements of the hidden and input
layers.</p>
      <p>
        The computational mechanism is as follows (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ):
wnew = wold
(ZT Z + λI) 1
      </p>
      <p>
        ZT
ε(wold),
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
where
wold– weight vector of the previous iteration;
wnew – weight vector of the current iterations;
Z – Jacobian matrix with the dimensionality m n; m – is the number of learning
examples for each iteration, n – total number of weights in the network;
λ – learning ratio;
I – identical matrix with the dimensionality n n;
ϵ- vector of n elements, which contains forecast errors for each learning example.
      </p>
      <p>To enhance the predicting power of our model we introduce here pre-processing
techniques of the Independent Component Analysis (ICA). This is a method of
identifying the key and important signals in the large, noisy data. ICA is often
compared with another useful processing tool – Principal Component Analysis
(PCA). However, the general difference of ICA from PCA is that we obtain
purely independent vectors on which a process can be decomposed, whereas
PCA requires only non-correlatedness of such vectors. Moreover, ICA allows
non-Gaussian distributions, which is quite useful and realistic assumption,
especially for financial data.</p>
      <p>
        The ICA stems from the so-called “cocktail party” problem in acoustics. The
problem is the following: assume that we have i number of people(s) talking in
the room and j number of microphones(x) which record their voices. For two
people and two microphones signals from the microphones are as follows in (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ):
Consequently, we should set mixing matrix A which transforms voices into the
recordings, (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ):
x2 = a21 s1 + a22 s2
A =
      </p>
      <p>When we apply ICA for the stock market we assume that the empirical stock
returns are the “recordings”, the noisy signals of the original “voices” which
determine the real process of price movements. Consequently, when we obtain
a de-mixing matrix A 1then we get a powerful tool for extracting the most
important information about the price movements. Furthermore, ICA allows us
to reduce the dimensionality of the empirical data without losing significant
information. It is very important while using ANNs, because, on the one hand
we should present the network as much relevant information as possible, but,
on the other hand, too much input information leads to lengthy computational
procedures and problems with convergence to a nontrivial solution.</p>
      <p>As it was mentioned above, we use two types of inputs in the
LevenbergMarquardt ANN. First input is the logarithmic return of the stock for the day
which precedes the day of the forecast. Second input is derived from the
processing of ten previous logarithmic returns with ICA algorithm: we get the de-mixing
matrix A 1 and the subsequent vector of independent components s. Then we
transform this vector to the scalar value considering the most influential
independent component.</p>
      <p>
        In the section “Results” we show that such pre-processing turns out to be
very useful for stock market forecasting. Moreover, it is worth mentioning that
ICA can be used as a self-sufficient forecasting tool for various financial markets.
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Haken network</title>
      <p>The second ANN which is used for forecasting U.S. stock market is quite different
from the Levenberg-Marquardt network. It is the network of Herman Haken,
German scientist and the founder of synergetics.</p>
      <p>This ANN is self-learning and uses a “library” of pre-set values which by
default represent all possible states of the process. Therefore, during a number
of iterations the network converges to one of these values. The Haken ANN is
widely used for image recognition. For example, when we set a task of recognizing
letters of the alphabet we use the whole alphabet as a pre-set “library”. It is
obvious, because any letter which is presented to the network is essentially a
part of the alphabet.</p>
      <p>However, we aim to apply the Haken ANN for the stock market forecasting,
and the situation here is much more complicated. We must choose the “library”
which contains all possible states of the market. To solve this task we resort
to two important assumptions. Firstly, we argue that all necessary information
which is needed for the forecast is contained in the returns of the stock during
ten trading days before the day for which the forecast is calculated. Secondly,
we assume that the using of processing techniques of the ICA and PCA, which
eventually reduces the information dimensionality, allows us to extract most
important and valuable information signals.</p>
      <p>Thus, to obtain the “library” of pre-set values we use the eigenvectors of the
covariance matrix of the subsequent empirical vectors of stock returns (PCA) or
the de-mixing matrix of the empirical vectors obtained from ICA.</p>
      <p>
        The network functions as follows, (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ):
q = q +
      </p>
      <p>M
∑ λkvkT qvk + B
k=1</p>
      <p>
        M
∑(vkT q)2(vkT q)vk + C(qT q)q,
k=1
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
where
q – vector of M elements which the network tries to optimize. Initially this vector
is deliberately made noisy to ignite the process of learning, thus we assume that
the real data on the stock market is also noisy in the similar way;
q - vector which the network finally reconstructs;
V – matrix which plays a role of the “library” and contains pre-set values which
are obtained from PCA or ICA;
λ – learning ratio;
B – computational parameters which calibrating has an influence on the
convergence of the network and the speed of learning.
      </p>
      <p>The final forecasting signal is obtained by subtracting the empirical vector
from the reconstructed one.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Trading rules</title>
      <p>Now we present trading rules which were used while working with the
LevenbergMarquardt and the Haken ANNs.</p>
      <p>Firstly, we should specify the data which we forecast. We predict price
movements of the 30 liquid stocks of the U.S. index S&amp;P 50011 in the period from
November, 7th, 2008 to May,2nd, 2010.</p>
      <p>For each trading day t we make forecast via our ANNs for each stock. When
the forecasts are made we range them according to their absolute value. The
final selection of the stocks in the virtual portfolio is based on two opposing
trading rules. According to the Rule A we select from 1 to 5 stocks with the
highest forecast value22 (note that at this step we do not know the real return
of day t which makes our forecast truly out-of-sample). According to the Rule
B we select from 1 to 5 stocks with the lowest forecast values.</p>
      <p>The reason for using Rule B is widely recognized phenomenon of “mean
reversion” in the financial data. Thus, if Rule B is successful, then our ANN is
capable of detecting this property of the market.</p>
      <p>
        The dynamics of our trade portfolio will be compared to the dynamics of
the S&amp;P 500 index and the dynamics of the portfolio if the decision-making was
based on the simple auto-regression model (while the trading rules A and B are
retained), (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ):
rt = αt + βt rt 1,
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
where
rt - forecast value of the logarithmic return of the stock for the trading day t,
αt, βt-auto-regression coefficients,
rt 1- logarithmic return of the stock for the trading day t-1.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>Now we present some of the results of the forecasting using the
LevenbergMarquardt and the Haken ANNs. Due to the limited space of this paper we
demonstrate here only most successful examples.</p>
      <p>Figure 1 demonstrates relative dynamics of our virtual portfolio (red line)
using the Levenberg-Marquardt ANN and trading Rule B (five stocks with “worst”
forecasts). Blue line is a portfolio when the decision-making is based on the
autoregression model. Blue line is the S&amp;P 500 index. The horizontal axis is time
and t indicates trading days. The vertical axis displays the value of the portfolio
with the initial value of 1.</p>
      <p>Figure 2 demonstrates relative dynamics of our virtual portfolio (red line)
using the Haken ANN with the PCA pre-processing and trading Rule B (one
stock with the “worst” forecast). Blue line is a portfolio when the
decisionmaking is based on the auto-regression model. Green line is the S&amp;P 500 index.
1 We use closing prices of the following stocks: ExxonMobil, Apple, Microsoft, General
Electric, Procter&amp;Gamble, Johnson&amp;Johnson, Bank of America, JPMorgan Chase,
Wells Fargo, IBM, Chevron, Sisco Systems, AT&amp;T, Pfizer, Google, Coca Cola, Intel,
Hewlett Packard, Wal Mart, Merck, PepsiCo, Oracle, Philip Morris International,
ConocoPhillips, Verizon Communications, Schlumberger, Abbott Labs, Goldman
Sachs, Mcdonalds, QUALCOMM.
2 2N otethatinthismodelweuseonlylongpositions; shortsellingisnotallowed:
The horizontal axis is time and t indicates trading days. The vertical axis displays
the value of the portfolio with the initial value of 1.
Using of pre-processing techniques of the ICA and PCA with the ANNs proved
to be a reliable decision-support mechanism for trading on the liquid stock
market. Dynamics of the subsequent portfolios outperform portfolios which follow
simple auto-regression forecast or linked to the stock index. Furthermore, the
Levenberg-Marquardt and Haken ANNs displayed the ability to reveal the “mean
reversion” phenomenon in the complex market data and use it for future
forecasts.</p>
      <p>However, despite the success of the Levenberg-Marquardt and the Haken
ANNs and proper pre-processing techniques we still face difficulties in making
up a strategy which will guarantee robust and stable growth of the portfolio
over continuous period of time. Moreover, more theoretical research is needed to
justify the argument that it is the neural network decision-making mechanism
which is used by traders in real life. It is also obvious that more in-depth study
is needed to explain the phenomenon of “mean reversion”. Some of these issues
will be the topics of the future research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Back</surname>
            <given-names>A.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weigend</surname>
            <given-names>A.S.</given-names>
          </string-name>
          <article-title>A First Application of Independent Component Analysis to Extracting Structure from</article-title>
          Stock Returns//
          <source>International Journal of Neural Systems</source>
          , Vol.
          <volume>8</volume>
          , No.
          <volume>5</volume>
          (
          <issue>October</issue>
          ,
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bishop C.M. Neural</surname>
          </string-name>
          <article-title>Networks for Pattern Recognition</article-title>
          . Oxford University Press,
          <fpage>1995</fpage>
          - 483 p.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bell</surname>
            <given-names>J.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sejnowsi</surname>
            <given-names>T. J.</given-names>
          </string-name>
          <article-title>An information-maximisation approach to blind separation and</article-title>
          blind deconvolution//Neural Computation,
          <volume>7</volume>
          ,
          <issue>6</issue>
          ,
          <fpage>1004</fpage>
          -
          <lpage>1034</lpage>
          (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Grriz</surname>
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puntonet</surname>
            <given-names>C.G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Moiss</given-names>
            <surname>Salmern</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.W. Lang</surname>
          </string-name>
          <article-title>Time Series Prediction using ICA Algorithms/</article-title>
          /IEEE International Workshop on Intelligent
          <source>Data Acquisition and Advanced Computing Systems: Technology and Applications</source>
          <volume>8</volume>
          -
          <issue>10</issue>
          <year>September 2003</year>
          , Lviv, Ukraine.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hyvrinen</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oja</surname>
            <given-names>E. Independent Component</given-names>
          </string-name>
          <article-title>Analysis: Algorithms</article-title>
          and Applications//Neural Networks,
          <volume>13</volume>
          (
          <issue>4-5</issue>
          ):
          <fpage>411</fpage>
          -
          <lpage>430</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Krse</surname>
            B.,
            <given-names>van der Smagt P.</given-names>
          </string-name>
          <string-name>
            <surname>An Introduction To Neural Networks</surname>
          </string-name>
          ,
          <source>Eight Edition</source>
          ,
          <year>November 1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lu C</surname>
          </string-name>
          .
          <article-title>-</article-title>
          J.,
          <string-name>
            <surname>Le T</surname>
          </string-name>
          .-S.,
          <string-name>
            <surname>Chiu C</surname>
          </string-name>
          .
          <article-title>-C. Financial time series forecasting using independent component analysis and support vector regression//Decision Support Systems 47 (</article-title>
          <year>2009</year>
          )
          <fpage>115</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>