<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Event and Sentiment Detection in Financial Markets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Uta Hellinger</string-name>
          <email>hellinger@aifb.uni-karlsruhe.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AIFB, Universit ̈at Karlsruhe</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Today, traders in financial markets are confronted with the problem that information is distributed over diverse sources and that there is too much information available. In our work we develop methods and tools to help traders to overcome this information overload by enabling the integrated view on news from various sources, by filtering relevant news and by providing decision support for traders. Another goal of our work is to propose a formal model of the impact of news on asset prices and thus enable better predictions of stock prices than possible with purely text mining based approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Traders in financial markets are confronted with the problem that too much
information is available from various, heterogenuous sources like newswires,
forums, blogs and collaborative tools. In order to make accurate trading decisions,
traders have to filter the relevant information efficiently so that they are able to
react to new information in a timely manner.</p>
      <p>The focus of our work is the development of methods and tools to support
traders in this process. The goal is to provide an integrated view on news from
different sources, to filter those news that have significant market impact and
to help the users to decide how to react to newly published information. The
development of these methods and tools raises the following questions:
– How can information from various, heterogeneous sources be integrated?
News that are found in various sources differ in their content and in available
annotations: news published by newswires are annotated with standardized
metadata (which differ between newswires), blog posts are at the most tagged
with some keywords. The information published on different web sites has
to be collected and metadata has to be mapped to a single format such that
all news can be processed using the same algorithms.
– How can the important news be filtered? Users can not monitor all relevant
news services and process all the information that is published by these
services. Therefore, methods are needed to filter news that have significant
market impact. These methods have to detect important events, sentiments
and expectations concerning the market.
– How can the users’ trading decisions be supported? Price changes are caused
by changes in the expectations concerning a company. Therefore,
expectations and precise information (like the amount of the quarterly result) should
be used for the prediction of price changes. This requires mechanisms that
extract necessary information from texts, formalize it and make predictions
based on it.</p>
      <p>Although our work focuses on a specific application domain, its results will
be relevant for other applications, as we show how information from different
sources can be integrated and used to provide decision support.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Our work is related to research in the domains of text mining (especially event
and sentiment detection), information extraction, semantic web and finance.</p>
      <p>
        A variety of systems for the prediction of asset price developments based on
recently published news have been developed (see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for an overview). These
systems are based on text classification, where the target categories are derived
from financial data. Although they are closely related to our intended
application, they have two important weaknesses: (i) expectations, which heavily
influence the development of asset prices, and (ii) quantified information (like the
value of paid dividends or the amount of the annual profit), which enables the
quantification of the expected price change, are not considered in these systems.
      </p>
      <p>
        Online event detection methods have for example been developed by [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These methods only attempt to identify new events mostly using clustering
techniques without trying to formalize them semantically, which is required for
matching them against our expectation models.
      </p>
      <p>
        Work on sentiment detection in a finance context include [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. While
Das and Chen [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] use linguistic features to classify messages in negative and
positive ones and then examine the correlation with stock price changes, Koppel
et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] use stock price changes to identify positive and negative news from
which then describing features can be extracted.
      </p>
      <p>
        While to the best of our knowledge no method for modelling expectations
exists, Halaschek-Wiener and Hendler [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have proposed an OWL-based news
syndication framework to match publications and information needs. The
subscribers’ information needs which are described by conjunctive ABox queries are
matched against publications which are formalized as ABox assertions.
      </p>
      <p>
        Relation extraction is applied to populate ontologies from text - the problem
we have to solve for extracting information on events and expectations from
news. Bootstrapping usually is applied in these methods, where the web [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
or Wikipedia [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are used as corpora to find patterns for describing relations.
These methods have very low precision and recall in some applications which is
problematic for our application.
      </p>
      <p>
        Event studies study the impact of certain events on a firm’s value (see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for
an overview of the methodology). These will be useful to find events and aspects
that should be taken into account in our system.
      </p>
    </sec>
    <sec id="sec-3">
      <title>News Analysis Tool</title>
      <p>Our solution to the problems discussed above is a tool which offers support in
filtering important information and in decision-making. The planned
architecture can be seen in Fig. 1. The components of this framework and requirements
regarding each of them will be discussed in the following.</p>
      <p>The first component of our tool is responsible for the collection of news from
various, heterogenuous sources. This component will monitor a huge number
of relevant sources for new information and will make it available to the
Preprocessing data component. The latter maps the available metadata to a single
representation such that all data is processable in the same way in later steps. We
will define an ontology for each news source’s metadata and map these ontologies
to one general one which will be used for further processing. If no metadata is
available, the extraction of some annotations of the news’ content like named
entities becomes necessary to enable filtering of these news. However, only very
efficient methods can be applied here as information extraction is quite expensive
in general.</p>
      <p>The Analysis of meta-information/tags component will examine the
metadata to filter relevant news. It will decide whether a news item contains
expectations concerning future events and will thus be processed by the Sentiment
analysis component and whether it contains information on an actual event and
will thus be processed by the Event analysis component. It is possible that a
news item is processed by both components or that a news item is not important
and thus will not be processed further.</p>
      <p>Both the Event analysis and the Sentiment analysis component will apply
information extraction methods to extract formal descriptions of the news’
content. These descriptions will be used by the Market forecast component to predict
the impact of a news item on the market by quantifying the difference of the
actually published information from the expectations and the current status of the
domain as described in the ontology. The Ontology update component is
responsible for the integration of the changes that occur due to published expectations
and events into the ontology.</p>
      <p>The Domain ontology is the backbone of the three previously described
components. It describes the current status of the market and expectations
concerning future events. We currently try to identify the most predictive features of
news items. We develop a linear regression model that predicts market responses
based on text features. This is the technique of choice as the predicted impact
can be quantified and as the influence of each feature on the result can easily be
seen. The developed model will help us in identifying the information that should
be modelled in the domain ontology. It will also provide some prediction facility
that can serve as a base line for the evaluation of more elaborate methods.</p>
      <p>The tool will be personalisable and adaptable in the sense that users can
specify their preferences, e.g. companies that they are especially interested in.
This will be possible through the Personalizable and adaptable user interface.</p>
      <p>An important requirement for the whole process is that it has to be extremely
fast as significant price changes (in the short-term trade that we consider) can
only be observed within one minute after the publication of a news item by a
newswire.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>The framework presented in the previous section requires a set of different
evaluations. Firstly, an evaluation of the methods employed in each component has
to be done. This especially means that an evaluation of the classification and
information extraction methods in terms of precision and recall is required.</p>
      <p>As our goal is the prediction of price changes based on expectation changes,
the quality of the predictions serve as an evaluation of the domain ontology.</p>
      <p>Finally, a user study is necessary to see how well users are supported by this
kind of system and whether it helps them to make better trading decisions.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Work plan</title>
      <p>So far, we have acquired news published by Reuters and information on
intraday trades and quotes for over 240 markets in 2003 and we have developed
aggregation functions for the financial data such that it can be used as training
data for the methods we will develop. Given the huge amount of data we focus
our experiences on the German market.</p>
      <p>As mentioned in section 3 we currently work on the identification of the most
predictive features. The next steps will be to compare the features we find to
results of event studies available in the finance literature. In parallel to these
steps we will develop the metadata ontologies and mappings between them. We
will then build a classifier that filters the relevant news based on metadata. Once
these methods are available we will build our domain ontology that models events
and expectations, develop the necessary information extraction methods and
define how discrepancies between expectations and newly published information
can be quantified for predicting the associated asset price changes.</p>
      <p>The last component that we develop will be the user interface before we
finish our project with user experiments that will hopefully show the benefit of
the proposed tool.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>The goal of our work is the development of a news analysis tool that supports
traders in financial markets by filtering news and making predictions on the
impact of news on the market. The contributions of our work will be:
– refined information extraction methods for the analysis of financial news
– ontologies of the financial domain that allow the formalization of news and
their annotations as well as of expectations and events
– a method to quantify the distance of an event from expectations
7</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was funded by the German Research Foundation (DFG) in scope of
Graduate School Information Management and Market Engineering.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Mittermayer</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knolmayer</surname>
          </string-name>
          , G.:
          <article-title>Text mining systems for market response to news: A survey</article-title>
          . Working paper (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Ghahramani</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>A probabilistic model for online document clustering with application to novelty detection</article-title>
          .
          <source>In: Neural Information Processing Systems</source>
          Conf. (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Makkonen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahonen-Myka</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salmenkivi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Applying semantic classes in event detection and tracking</article-title>
          . In Sangal, R.,
          <string-name>
            <surname>Bendre</surname>
          </string-name>
          , S.M., eds.
          <source>: Proc. of Int. Conf. on Natural Language Process</source>
          . (
          <year>2002</year>
          )
          <fpage>175</fpage>
          -
          <lpage>183</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Yahoo! for amazon: Sentiment extraction from small talk on the web</article-title>
          .
          <source>Management Science</source>
          <volume>53</volume>
          (
          <issue>9</issue>
          ) (
          <year>2007</year>
          )
          <fpage>1375</fpage>
          -
          <lpage>1388</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shtrimberg</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Good news or bad news? let the market decide</article-title>
          .
          <source>Computing Attitude and Affect in Text: Theory and Applications</source>
          (
          <year>2006</year>
          )
          <fpage>297</fpage>
          -
          <lpage>301</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Halaschek-Wiener</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
          </string-name>
          , J.:
          <article-title>Toward expressive syndication on the web</article-title>
          .
          <source>In: WWW '07: Proc. of the 16th Int. Conf. on World Wide Web, ACM</source>
          (
          <year>2007</year>
          )
          <fpage>727</fpage>
          -
          <lpage>736</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pantel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennacchiotti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Espresso: leveraging generic patterns for automatically harvesting semantic relations</article-title>
          .
          <source>In: ACL '06: Proceedings of the 21st Int. Conf. on Computational Linguistics</source>
          and
          <article-title>the 44th annual meeting of the ACL</article-title>
          . (
          <year>2006</year>
          )
          <fpage>113</fpage>
          -
          <lpage>120</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Blohm</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Using the web to reduce data sparseness in pattern-based information extraction</article-title>
          .
          <source>In: Proc. of the 11th European Conf. on Principles and Practice of Knowledge Discovery in Databases</source>
          , Springer (
          <year>2007</year>
          ) pp.
          <fpage>18</fpage>
          -
          <lpage>29</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>MacKinlay</surname>
          </string-name>
          , A.C.
          <article-title>: Event studies in economics and finance</article-title>
          .
          <source>J. of Economic Literature</source>
          <volume>35</volume>
          (
          <issue>1</issue>
          ) (
          <year>March 1997</year>
          ) pp.
          <fpage>13</fpage>
          -
          <lpage>39</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>