<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Simulating Price Interactions by Mining Multivariate Financial Time Series</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luis Cavique</string-name>
          <email>lcavique@univ-ab.pt</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nuno Marques</string-name>
          <email>nmm@di.fct.unl.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bruno Silva</string-name>
          <email>bruno.silva@estsetubal.ips.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DI/FCT - UNL</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>DSI/ESTSet u ́bal, Instituto Polite ́cnico de Set u ́bal</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidade Aberta</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This position paper proposes a framework based on a feature clustering method using Emergent Self-Organizing Maps over streaming data (UbiSOM) and Ramex-Forum - a sequence pattern mining model for financial time series modeling based on observed instantaneous and long term relations over market data. The proposed framework aims at producing realistic monte-carlo based simulations of an entire portfolio behavior over distinct market scenarios, obtained from models generated by these two approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Grasping the apparently random nature of financial time
series has proven to be a difficult task and countless methods
of forecasting are presented in literature. Nowadays, this is
even more difficult due to a global economy with strong
interconnections. Most traders forecast future price using some
combination of fundamentals, indicators, patterns and
experience in the expectation that recent history will forecast the
probable future often enough to make a profit. Detecting
correlations between financial time series and being able to
simulate both short and long term interactions in virtual scenarios
using models extracted from observed market data can
provide an increasingly needed tool to minimize risk exposure
and volatility for a given portfolio of securities. This position
paper argues that feature clustering methods using Emergent
Self-Organizing Maps over streaming data (UbiSOM) [Silva
et al., 2012], [Silva and Marques, 2010b] can be conjoined
with Ramex-Forum – a sequence pattern mining model
[Marques and Cavique, 2013], for financial time series modeling
based on observed instantaneous and long term relations over
market data. Since the lower the correlation among the
individual securities, the lower the overall volatility of the entire
portfolio, this makes possible to propose a tool to minimize
risk exposure and volatility for a given portfolio of
securities. The proposed framework aims at producing more
realistic Monte Carlo-based simulations of the entire portfolio
behavior over distinct market scenarios, obtained from models
generated by these two approaches.
The proposed modular framework is depicted in Figure 1 and
consists of i) The UbiSOM, an ESOM algorithm tailored
from streaming data ii) The Ramex-Forum, a sequence
pattern mining model and iii) A Monte Carlo-based simulator.
The first two are fed with a stream of log-normalized raw
asset prices, which are then used by the third module to produce
future different and possible market scenarios, based on the
observed data. The UbiSOM can model instantaneous
shortterm correlations between the various assets and its
topological map (Section 2.1) can be used as a starting point to
generate alternate time-series based on a trajectory model (Section
3.1) by the simulation module. The input from the
RamexForum module should be useful to incorporate in the
simulations long-term dependencies between the assets to produce
more realistic market scenarios.
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Emergent Self-Organizing Maps</title>
      <p>Self-Organizing Maps [Kohonen, 1982] can use the ability of
neural networks to discover nonlinear relationships in input
data and to derive meaning from complicated or imprecise
data for modeling dynamic systems such as the stock
market. The Self-Organizing Map (SOM) is a single layer
feedforward network where the output neurons are arranged in
a 2-dimensional lattice, whose neurons become specifically
tuned to various input vectors (observations) in an orderly
fashion. Each input xd is connected to all output neurons
and attached to each neuron there is a weight vector wd with
the same dimensionality as the input vectors. These weights
represent prototypes of the input data. The topology
preservation of the SOM projection is extensively used by
focusing SOM on using larger maps – ESOM [Ultsch and
Herrmann, 2005]. A previous work [Silva and Marques, 2010a]
showed that ESOMs provide a way of representing
multivariate financial data on two dimensions and are a viable tool
to detect instantaneous short-term correlations between
timeseries. We illustrate this in Section 3, within our preliminary
results. Additionally, and supported by the detected
correlations, the topological ordered map can be used as a good
starting point to generate realistic multivariate financial data
based on the short-term relationships.
2.2</p>
    </sec>
    <sec id="sec-3">
      <title>The Ramex-Forum Algorithm</title>
      <p>
        Ramex-Forum solves the problem of huge number of rules
that avoid a global visualization in many pattern discovery
techniques
        <xref ref-type="bibr" rid="ref1">(e.g., [Agrawal and Srikant, 1995] )</xref>
        .
RamexForum is a sequential pattern mining algorithm that includes a
two-phase; the transformation phase and the search phase. In
the transformation phase the dataset is converted into a graph
where cycles are allowed. The raw data must be sorted in such
a way that each time interval can be identified. In the search
phase the maximum weighted spanning poly-tree is found. A
poly-tree is a direct acyclic graph with one path between any
pair of nodes at the most. The in-degree of any vertex of a
tree is 0 (the root) or 1. On the other hand, the in-degree of a
vertex can be greater than 1. A maximum weighted spanning
poly-tree is the spanning poly-tree with a weight that is
upper than or equal to the weight of every other spanning
polytree. The Ramex-Forum algorithm develops a new heuristic
inspired in Prim’s algorithm [Prim, 1957] and assures a new
way of visualization long term patterns in polynomial
execution time.
3
      </p>
      <sec id="sec-3-1">
        <title>Preliminary Results</title>
        <p>In this section we provide a proof-of-concept of the proposed
methodology within the framework. The proposed method is
illustrated with historical data representing the world
economy in the recent past (years 2006 to 2012) – Figure 2. The
huge economic changes during this period are good to show
the usefulness of data mining algorithms over financial data.
Top financial products such as average Indexes for
companies based in different countries (DJI – Dow Jones, in the
United States; BVSP – Bovespa, in Brazil; FCHI, Euronext
in Paris; N225, Nikkei in Japan; the HSI — Hang Seng Index,
in Hong Kong; and DAX, German Index) and relevant
commodities exchange-traded funds (ETF) such as United States
Oil Fund (USO) and GLD for a physically backed gold ETF,
were considered.</p>
        <p>Each time series is considered a feature of the training data,
i.e., observations are the prices of the financial products for
consecutive days. After performing a logarithmic
normalization of the values, so that the specific range of each asset price
is disregarded, the historical data forms the training dataset
that is fed into the UbiSOM and Ramex-Forum modules.</p>
        <p>The trained UbiSOM map contains a topologically
organized projection of the historical data. Correlations between
individual time-series are extracted through a visualization
technique for the UbiSOM map. By component plane
representation we can visualize the relative component
distributions of the input data. Component plane representation can
be thought as a sliced version of the UbiSOM. Each
component plane has the relative distribution of one data vector
component. In this representation, dark values represent relatively
large values while white values represent relatively small
values. By comparing component planes we can see if two
components correlate. If the outlook is similar, the components
strongly correlate. The component planes for the resulting
trained map are represented in Figure 3. Visual inspection
may suffice to detect correlations, but in [Silva and Marques,
2010a] we provided an automated algorithm to cluster
timeseries based on a distance metric computed for pairs of
component planes (Figure 4). However, this ability is only of
relevance in this paper to justify the use of ESOM maps to
generate multivariate time-series based on a trajectory model
(Section 3.1).</p>
        <p>Figure 5 presents a Ramex-Forum generated graph for this
financial data, considering interactions with a latency of up
to 160 (long-term) trading days over a period of 2000 days
– results presented in [Marques and Cavique, 2013]. Each
arc represents the number of synchronous positive price
tendencies (buying signals given by a moving average
indicator). During the studied period Hong Kong HSI Index has
a behavior that was preceded by 273 times by similar
variations in American Dow Jones (DJI) and 179 times by
German DAX. We should notice that USA DJ I is influencing
most major assets in the world. The only exception to this is
European German DAX , that strongly co-influences Chinese
H SI . Another correlation found is between H SI and GLD
tendencies in these long term dependencies. This is
something that UbiSOM cannot capture and can be incorporated
when generating more realistic market scenarios.
3.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Generating Scenarios</title>
      <p>By projecting again the historical data over the UbiSOM we
get a set of trajectories over the map that are used to generate
these alternate time series. A trajectory is formed by
projecting two consecutive observations from the training data and
storing the pair of neurons that were activated, in the form
of a trajectory (bare in mind that loops are frequent, because
two similar observations are prone to be projected in the same
neuron). Figure 6 depicts these trajectories in the form of a
directed graph in which each vertex represents a neuron and
the edges the obtained trajectories. The weight of the edge
indicates how many times the trajectory was followed in the
projection of the training data.</p>
      <p>Based on this trajectory model, we can generate alternate
time series using Monte Carlo simulations. Starting at a
vertex with edges and randomly choosing the next trajectory of
the model to follow we can create paths of arbitrary lengths
(dependent of the desired number of daily prices). Each
vertex/neuron contains the prototype of data that contains the set
of daily prices for similar observed days. The totality of the
path then gives us the multivariate time series. Details on how
to generate the path are currently being studied, it must not
be totally random, i.e., the weight on the edges must be taken
into account so as to give more importance to trajectories that
are more common. Also, when creating the trajectory model
we store at each vertex/neuron the statistical variation of the
training vectors that are projected on that particular neuron.
This allows generating a Gaussian around each prototype
vector component to introduce variability on a particular virtual
daily price. This is particularly important when loops are
being followed in the path, so that generated time series doesn’t
contain “flat” lines.</p>
      <p>Figure 7 depicts a sample of a generated outcome that can
be obtained from trajectories over the trained UbiSOM. It can
be seen that the multivariate time series maintain the observed
correlations in the original historical data. This can be very
useful in generating possible scenarios for risk estimation.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>Visual inspection of the similarity of component planes in
Figure 3 shows that the results of the UbiSOM model are
coherent, e.g., DJI and DAX are strongly correlated in their
historical behavior. GLD, as it was expected, is very far from any
other financial product. Its historical behavior is extremely
different in the analysis period, mainly always in a upward
movement. All the other assets maintain a significant
distance from the others, showing that the correlation is not that
strong. Interesting additional long term relations were found
by Ramex-Forum algorithm. The example shown in Figure 5,
presents the USA DJI index influencing most other indexes.</p>
      <p>Also China (HSI) is detected as a major player in world
economy and seems to be the major influence on the price of gold
(GLD). Indeed, during the analyzed period, People’s
Republic of China was one of the major buyers of gold in the world
and has the largest reserves of Gold and Foreign Exchange in
the world (CIA World Factbook, 2013).</p>
      <p>However, it is the conjunction of the long term relations
detected in Ramex-Forum with the magnitude of short time
multivariate dependencies of UbiSOM, that should be the
most interesting application. Different long term trajectories
can now be generated on the UbiSOM map, based on the long
term sequences detected by Ramex-Forum. E.g., neurons
corFigure 6: Trajectories generated for the projection of the historical training data over a 15
10 trained UbiSOM.
responding to highest increases in gold values can be easily
selected from a SOM map. The same could be done for high
values for the Chinese (HSI), or USA (DJI) economy. Highly
probable pathways should then be made among those
neurons. In practice these will encode the Ramex-Forum graph as
a probable pathway among distinct UbiSOM neurons. Then
for a given trading day (e.g. today) starting point, we can
then generate random walks in the map. Since each neuron
represents a possible market state, we can easily generate for
each neuron a possible virtual trading day that is strongly
related with observed data. However it will be the pathways
to provide the most interesting effect on this map. Indeed, in
average, virtual trading days will follow possible sequences
given (and measured) by Ramex-Forum graph.
4</p>
      <sec id="sec-5-1">
        <title>Conclusions</title>
        <p>Both algorithms should provide a realistic and easily usable
framework to study and simulate possible effects of either
economic or political decisions. Even extreme events, e.g.,
Acts of God, may them be given some probability. We
believe that such a time-series based model is a much needed
tool in todays strongly interdependent and complex world
where over-simplistic assumptions frequently lead to poor
decisions.</p>
        <p>
          On one hand UbiSOM provides the daily correlation
between products and can be made self-adjustable to
continuously changing streams of data
          <xref ref-type="bibr" rid="ref5 ref6 ref7">(i.e., both collaborative
learning [Silva and Marques, 2010b] and detecting concept drift
[Silva et al., 2012])</xref>
          . On the other hand Ramex-Forum graphs
shows sequences of the more representative events and can
be easily used to model the dynamic of the occurrences. So,
we believe that these complementary tools, one more static
and the other more dynamic, can intrinsically guarantee
realistic modeling on different scenarios and provide a major
breakthrough in decision support systems.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Agrawal and Srikant</source>
          , 1995]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          .
          <article-title>Mining sequential patterns</article-title>
          .
          <source>In Proceedings 11th International Conference Data Engineering</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>14</lpage>
          . IEEE Press,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Kohonen</source>
          , 1982]
          <string-name>
            <given-names>Teuvo</given-names>
            <surname>Kohonen</surname>
          </string-name>
          .
          <article-title>Self-organized formation of topologically correct feature maps</article-title>
          .
          <source>Biological cybernetics</source>
          ,
          <volume>43</volume>
          (
          <issue>1</issue>
          ):
          <fpage>59</fpage>
          -
          <lpage>69</lpage>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Marques and Cavique</source>
          , 2013]
          <string-name>
            <given-names>Nuno</given-names>
            <surname>Marques</surname>
          </string-name>
          and
          <string-name>
            <given-names>Luis</given-names>
            <surname>Cavique</surname>
          </string-name>
          .
          <article-title>Sequential pattern mining of price interactions</article-title>
          .
          <source>In Proceedings 16th Portuguese Conference on Artificial Intelligence (EPIA)</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Prim</source>
          , 1957]
          <article-title>Robert Clay Prim</article-title>
          .
          <article-title>Shortest connection networks and some generalizations</article-title>
          .
          <source>Bell system technical journal</source>
          ,
          <volume>36</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1389</fpage>
          -
          <lpage>1401</lpage>
          ,
          <year>1957</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Silva and Marques</source>
          , 2010a]
          <string-name>
            <given-names>B.</given-names>
            <surname>Silva</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Marques</surname>
          </string-name>
          .
          <article-title>Feature clustering with self-organizing maps and an application to financial time series portfolio selection</article-title>
          .
          <source>In International Conference on Neural Computation</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Silva and Marques</source>
          , 2010b]
          <string-name>
            <given-names>Bruno</given-names>
            <surname>Silva</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nuno C.</given-names>
            <surname>Marques</surname>
          </string-name>
          .
          <article-title>Ubiquitous data-mining with self-organizing maps</article-title>
          .
          <source>In Proceedings of the Ubiquitous Data Mining Workshop, ECAI</source>
          <year>2010</year>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Silva et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Bruno</given-names>
            <surname>Silva</surname>
          </string-name>
          , Nuno Marques, and
          <string-name>
            <given-names>Gisele</given-names>
            <surname>Panosso</surname>
          </string-name>
          .
          <article-title>Applying neural networks for concept drift detection in financial markets</article-title>
          .
          <source>In Workshop on Ubiquitous Data Mining, page 43</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Ultsch and Herrmann</source>
          , 2005]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ultsch</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Herrmann</surname>
          </string-name>
          .
          <article-title>The architecture of emergent self-organizing maps to reduce projection errors</article-title>
          .
          <source>In Proceedings of the European Symposium on Artificial Neural Networks (ESANN</source>
          <year>2005</year>
          ), pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . Verleysen M. (Eds),
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>