<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic-Based Sentiment analysis in financial news</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juana María Ruiz-Martínez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Valencia-García</string-name>
          <email>valencia@um.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francisco García-Sánchez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Facultad de Informática. Universidad de Murcia. Campus de Espinardo. 30100 Espinardo (Murcia).</institution>
          <country>España Tel:</country>
        </aff>
      </contrib-group>
      <fpage>38</fpage>
      <lpage>51</lpage>
      <abstract>
        <p>Sentiment analysis deals with the computational treatment of opinions expressed in written texts. The addition of the already mature semantic technologies to this field has proven to increase the results accuracy. In this work, a semantically-enhanced methodology for the annotation of sentiment polarity in financial news is presented. The proposed methodology is based on an algorithm that combines several gazetteer lists and leverages an existing financial ontology. The financial-related news are obtained from RSS feeds and then automatically annotated with positive or negative markers. The outcome of the process is a set of news organized by their degree of positivity and negativity.</p>
      </abstract>
      <kwd-group>
        <kwd>opinion mining</kwd>
        <kwd>sentiment analysis</kwd>
        <kwd>financial news</kwd>
        <kwd>ontologies</kwd>
        <kwd>semantic web</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The success of Web 2.0 technologies along with the growth of social content
available online have stimulated and generated many opportunities for understanding
the opinions and trends, not only of the general public and consumers, but also of
companies, banks, and politics. Many business-related research questions can be
answered by analyzing the news and, for this reason, sentiment analysis and opinion
mining is a burning issue, specifically in the financial domain.</p>
      <p>
        Opinion mining, a subdiscipline within data mining and computational linguistics,
refers to the computational techniques for extracting, classifying, understanding, and
assessing the opinions expressed in various online news sources, social media
comments, and other user-generated content. Sentiment analysis is often used in
opinion mining to identify sentiment, affect, subjectivity, and other emotional states
in online texts [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Originally, the task of sentiment analysis was performed on product reviews by
processing the products’ attributes [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2-4</xref>
        ]. However, nowadays sentiment polarity
analysis is used in a wide range of domains such as for example the financial domain
[
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5-7</xref>
        ]. Millions of financial news are circulating daily on the Web and financial
markets are continuously changing and growing. In this scenario, as Ahmad et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
point out, the creation of a framework with which sentiments can be extracted without
relying on the intuition of the analysts as to what is good or bad news is both a
necessity and a challenge.
      </p>
      <p>In this paper, we present a semantic-based algorithm for opinion extraction applied
to the financial domain. The proposed methodology is supported by natural language
processing methods to annotate financial news in accordance with a financial
ontology. Then, the annotated financial news are analyzed by passing them through a
number of gazetteer lists, which results in two separate sets, one with positive
financial news and the other with negative financial news.</p>
      <p>The rest of paper is organized as follows. Some relevant related works are shown
in Section2. Section 3 presents the technological background necessary for the
development of the methodology. In Section 4, the platform and the way it works is
described in detail. In Section 5, the experimental results of the evaluation are shown.
Finally, some conclusions and future work are put forward in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related works</title>
      <p>
        In the literature, a number of methods for the automatic sentiment analysis from
financial news streams have been described. The proposal of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] uses theories of
lexical cohesion in order to create a computable metric to identify the sentiment
polarity of financial news texts. This metric is readapted in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to Chinese and Arabic
financial news. The analysis of financial news is a particularly relevant topic in the
prediction of the behaviour of stock markets. For example, in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] the authors use some
simple computational linguistic techniques, such as bag of words or named entities,
together with support vector machine and machine learning techniques to assist in
making stock market predictions. In fact, in real life, stock market analysts’
predictions are usually based on the opinions expressed in the news.
      </p>
      <p>
        Semantic technologies have been around for a while, offering a wide range of
benefits in the knowledge management field. They have revolutionized the way that
systems integrate and share data, enabling computational agents to reason about
information and infer new knowledge [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The accuracy results of opinion mining and
sentiment polarity analysis can be improved with the addition of semantic techniques,
as shown in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In that work, some semantic lexicons are created in order to identify
sentiment words in blog and news corpora. Then, a polarity value is attached to each
word in the lexicon and such polarity is revised when a modifier appears in the text.
      </p>
      <p>
        The FIRST project1 provides an information extraction, information integration
and decision making infrastructure for information management in the financial
domain. The decision making infrastructure includes a module responsible for the
sentiment annotation from financial news and blog posts. Its main aim is to classify
the polarity of sentiment with respect to a sentiment object of interest [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These
sentiment objects are classified by means of an ontology-guided and rule-based
information extraction approach. Even though the ontology contains the
financialdomain related relevant objects, the classification process is carried out entirely using
      </p>
      <sec id="sec-2-1">
        <title>1 http://project-first.eu/</title>
        <p>JAPE rules. Therefore, it can be concluded that this approach does not leverage the
reasoning capabilities of the ontology.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Technological background</title>
      <p>The methodology proposed here is based on two main elements, namely, ontologies
and natural language processing tools. In this section, the key features of these
technologies are pointed out.</p>
      <sec id="sec-3-1">
        <title>3.1 Ontologies and the Semantic Web</title>
        <p>
          Ontologies constitute the standard knowledge representation mechanism for the
Semantic Web [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The formal semantics underlying ontology languages enables the
automatic processing of the information and allows the use of semantic reasoners to
infer new knowledge. In this work, an ontology is seen as “a formal and explicit
specification of a shared conceptualization” [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Ontologies provide a formal,
structured knowledge representation, and have the advantage of being reusable and
shareable. They also provide a common vocabulary for a domain and define, with
different levels of formality, the meaning of the terms and the relations between them.
Knowledge in ontologies is mainly formalized using five kinds of components:
classes, relations, functions, axioms and instances [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>
          Ontologies are thus the key for the success of the Semantic Web vision. The use of
ontologies can overcome the limitations of traditional natural language processing
methods and they are also relevant in the scope of the mechanisms related, for
instance, with Information Retrieval [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], Semantic Search [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], Service Discovery
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] or Question Answering [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>Next, the financial ontology that has been developed for the purposes of this work
is described.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.1.1 Financial Ontology</title>
        <p>
          The financial domain is becoming a knowledge intensive domain, where a huge
number of businesses and companies hinge on, with a tremendous economic impact in
our society. Consequently, there is a need for more accurate and powerful strategies
for storing data and knowledge in the financial domain. In the last few years, several
finances-related ontologies have been developed. The BORO (Business Object
Reference Ontology) ontology is intended to be suitable as a basis for facilitating,
among other things, the semantic interoperability of enterprises' operational systems
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. On the other hand, the TOVE ontology (Toronto Virtual Enterprise) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ],
developed by the Enterprise Integration Laboratory from the Toronto University,
describes a standard organization company as their processes. A further example is
the financial ontology developed by the DIP (Data Information and Process
Integration) consortium, which is mainly focused on describing semantic web services
in the stock market domain [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Finally, the XBRL Ontology Specification Group,
developed a set of ontologies for describing financial and economical data in RDF for
sharing and interchanging data. This ontology is becoming an open standard means of
electronically communicating information among businesses, banks, and regulators
[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>As part of this work, a financial ontology has been developed on the basis of the
above referred ontologies, with the focus set on the stock exchange domain. The
ontology, created from scratch, has been defined in OWL 2. This ontology covers
three main financial concepts (see figure 1):
 A financial market is a mechanism that allows people to easily buy and sell
financial assets such us stocks, commodities and currencies, among others.
The main stock markets such as New York Stock Exchange, NASDAQ or
London Stock Exchange have been modelled in the ontology as subclasses of
the Stock_market class.
 The Financial Intermediary class represents the entities that typically invest
on the financial markets. Examples of such entities are banks, insurance
companies, brokers and financial advisers.
 The Asset class represents everything of value on which an Intermediary can
invest, such as stock market indexes, commodities, companies, currencies, to
mention a few. So, for instance, enterprises such as Apple Inc., General
Electric or Microsoft belong to the Company concept and currencies such as
US dollar or Euro are included as individuals of the Currency concept.
3.2</p>
        <p>
          Natural Language Processing and Sentiment Analysis
Sentiment annotation can be seen as the task of assign positive, negative or neutral
sentiment values to texts, sentences, and other linguistic units [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. In this work, the
values positive, negative and neutral have been assigned to general terms, which
express some kind of sentiment (e.g. ‘benefit’, ‘positive’, ‘danger’) and to financial
terms (e.g. ‘risk capital’, ‘rising stock’, ‘bankruptcy’). Moreover, terms pertaining to
the financial domain have been semantically annotated as ‘risk premium’, ‘capital
market’ or ‘Ibex35’ for example.
        </p>
        <p>The open source software GATE2 carries out sentiment and semantic annotation by
means of gazetteers lists. GATE is an infrastructure for developing and deploying
software components that process human language. One of the GATE’s key
components is gazetteer lists. A gazetteer list is a plain text file with one entry (a
term, a number a name, etc.), which permits to identify these entries in the text. In this
work, the lists have been developed using BWP Gazetteer3. This plugin provides an
approximate gazetteer for GATE, based on Levenshtein's Edit Distance for strings. Its
goal is to handle texts with noise and errors, in which GATE's default gazetteers may
have difficulties. The implemented lists are based on the linguistic particularities of
the financial domain.</p>
        <p>
          Grishan and Kittredge [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] define a sublanguage as the specialized form of a
natural language that is used within a particular domain or subject matter. A
sublanguage is characterized by a specialized vocabulary, semantic relationships, and
in many cases specialized syntax [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. The boundaries of financial news domain are
non very sharply defined [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. For example, “Euribor rates rise after ECB interest
warnings” or “Portugal needs the luck of Irish” are both headline of financial news,
although the second one does not contain any financial term or a particular syntactic
structure. Nevertheless, it is possible to define a wide set of financial specialized
vocabulary (e.g. ‘Euribor’, ‘Ibex35’, ‘investors’) which coexists with frequently used
non-specialized terms (e.g. ‘to rise’, ‘unemployed’, ‘construction’).
        </p>
        <p>In this work, the semantic and sentiment gazetteers developed are employed to
mark up all sentiment words and associated entities in our ontology. Six different
kinds of gazetteers have been developed on the basis of the common characteristics
and vocabulary of financial domain. The lists are used by the system in order to create
three different types of annotations, that is, semantic annotations, sentiment
annotations and modifier annotations. Semantic annotation refers to financial terms
that are present in the financial ontology. Sentiment annotation indicates the polarity
of selected terms. Modifiers annotation refers to elements that can invert or increase
the polarity of the previously annotated terms. For each kind of annotation a gazetteer
category has been created. Thus, semantic, sentiment and modifiers gazetteers have
been developed. Each gazetteer category consists of one or more gazetteer lists, as
explained below.
i.</p>
        <p>Semantic gazetteer
2 http://gate.ac.uk/
3 http://gate.ac.uk/gate/doc/plugins.html#bwp
a. Financial domain vocabulary gazetteer. This gazetteer contains the most
relevant domain terms and entities. It has been directly mapped onto the
ontology classes and individuals and their corresponding labels including
synonyms. Examples in this category are ‘Annual Percentage Rate’ (APR),
‘Compound Interest’, ‘Dividend’, ‘Income Tax’, ‘Apple’ and ‘BBVA’. This
list is used for the semantic annotation and it does not contain any
information related with opinions.</p>
        <p>Sentiment gazetteer
a. Positive sentiment gazetteer. It contains general terms that imply a positive
opinion such as, for example, ‘growth’, ‘trust’, ‘positive’ or ‘rising’.
b. Negative sentiment gazetteer. It contains general terms that imply a negative
opinion such as, for example, ‘danger’, ‘doubts’ or ‘to cut’.
c. Financial positive sentiment gazetteer. It contains terms related to the
financial domain that imply a positive opinion. For example, ‘earning’,
‘profitability’ or ‘appreciating asset’.
d. Financial negative sentiment gazetteer. It contains terms related to financial
domain that imply a negative opinion. For example, ‘depreciation’,
‘Insufficient Funds’ or ‘creditor’.
iii.</p>
        <p>Modifier gazetteer
a. Intensifier gazetteer. It contains terms that are used to change the degree to
which a term is positive or negative such as, for example, ‘very’, ‘most’ or
‘extremely’.
b. Negation gazetteer. It contains negation expressions such as, for example,
‘no’, ‘never’ or ‘deny’.
c. Temporal sentiment gazetteers. They contain temporal expressions that
imply a modification in the whole news. These expressions appear in
conjunction with positive or negative linguistic expressions modifying their
meaning. They usually increase or decrease negative or positive sentiment.
There are two temporal gazetteers, one with long-term expressions and the
other with short-term expressions. “Last year”, “trimester” or “several
weeks” are examples of the first type, while “this morning”, “today” “this
week” are examples of the second type. The following sentences show an
example of the modification capacity of temporal terms in the financial
domain:
(1) Apple shares have risen around 17% in the last month.
(2) Apple shares have fallen 4.5% this morning.</p>
        <p>Here, “last month” and “this morning” can relativize the weight of the global
meaning. In general, long-term positive or negative opinions are more
reliable than short-term opinions. That is, if the user searches for the general
status of Apple shares and the system retrieves these two entries, then the
general opinion should be positive.
The architecture of the platform is shown in figure 2. The architecture is composed of
four main components: the financial news extraction module, the semantic annotation
module, the opinion-mining module and the search engine. Next, these components
are described in detail.</p>
        <p>Financial news
RSS Feed1</p>
        <p>...</p>
        <p>RSS Feedn</p>
        <p>User</p>
        <p>User query</p>
        <p>Semantic annotation module</p>
        <p>NLP Phase
- Stemmer
- POS Taggers
- Term extraction tools
- Syntactic Parsers
Semantic annotation</p>
        <p>Phase
- Semantic annotation</p>
        <p>Annotated</p>
        <p>Financial News
Opinion mining module</p>
        <p>Sentiment analysis</p>
        <p>Sentiment Gazetter Lists
Positive financial
nºews
+</p>
        <p>Negative
financial news</p>
        <p>Search engine</p>
        <p>Positive and
negative results</p>
        <p>
          Financial
ontology
This module manages the list of RSS feeds. RSS is a family of Web feed formats used
for syndicating content from blogs or Web pages and is commonly used by
newspapers. RSS is an XML file that summarizes information items and links to the
information sources [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Once the resources have been selected, this module
generates a set of abstracts, which will be used as input for the system. An example
list of financial news-related RSS feeds is shown in table 1.
http://www.economist.com/feeds/print-sections/75/europe.xml
http://feeds.reuters.com/reuters/USpersonalfinanceNews
http://feeds.nytimes.com/nyt/rss/Business
http://feeds.bbci.co.uk/news/business/rss.xml
        </p>
        <p>For each RSS source the last news are obtained and stored in a database. The
information that is retrieved from each news is the date of publication, the
information source, the url and the abstract. Abstracts constitute the corpus from
which the system extracts the information. We only consider the abstract and the
headline because they usually condense the polarity of news. Indeed, the analysis of
the whole text can induce to error, since the sentiment polarity of an entire document
is not necessarily the sum of its parts.</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.2 Semantic annotation module</title>
        <p>This module identifies the most important linguistic expressions in the financial
domain using the previously described semantic gazetteer. For each linguistic
expression, the system tries to determine whether the expression under question is an
individual of any of the classes of the domain ontology. Next, the system retrieves all
the annotated knowledge that is situated next to the current linguistic expression in the
text, and tries to create fully-filled annotations with this knowledge.</p>
        <p>Each class in the ontology is defined by means of a set of relations and datatype
properties. Then, when an annotated term is mapped onto an ontological individual,
its datatype and relationships constitute the potential information which is possible to
obtain for that individual. For example, a company has associate relationships such as
‘Moody’sRate’, ‘tradeMarket’ or ‘isLegalRepresentativeFor’. In figure 3, an example
of the annotation process of financial news using GATE is depicted.
Energy
company</p>
        <p>GE Energy
Texaco
Shell</p>
        <p>ICT
company</p>
        <p>Microsoft
Google
Apple</p>
        <p>Nokia
The main objective of this module is to classify the set of news obtained in the
previous module according to their polarity: positive, negative or neutral. For any
retrieved news which has been annotated, the sentiment orientation or sentiment
polarity value is computed. For this, the module makes use of the previously
described gazetteer lists.</p>
        <p>The sentiment polarity (SP) value for each news item is calculated by summing the
polarity values of all annotated terms in the news. In this process, the system must
consider both the terms polarity included in the positive and negative gazetteers and
the contextual valence shifters included in the negation and intensifier gazetteers.</p>
        <p>For any annotated term (at) in a sentence sS, its SP value (SP(at)) is computed as
follows:
1. If at GeneralPositivek, SP(at) = Positive1
2. If at DomainPositivek, SP(at) = Positive2
3. If at GeneralNegativek, SP(at) = Negative1
4. If at DomainNegativek, SP(at) = Negative2
5. If within the relevant cotext of at, there is a term at’Negation, SP(at)=
-SP(at)
6. If within the relevant cotext of at, there is a term at’Intensifier, SP(at) =
2xSP(at)
7. When within the relevant cotext of at, there is a term at’Temporal, if…
7.1. at’LongTerm, SP(at) = 2xSP(at)
7.2. at’ShortTerm + Negative(SP), SP(at) = 2xSP(at)
7.3. at’ShortTerm + Positive(SP), SP(at) = 1xSP(at)</p>
        <p>Then the polarity of each news item is represented as the sum of all SP(at) present
in such news item (n):
f k SP(n)k 
 SP(at)
atn</p>
        <p>In the above algorithm, the term ‘cotext’ refers to the linguistic set that surrounds
an annotated term within the limit of a sentence, i.e. the rest of annotated terms
present before and after it and pertaining to the same sentence. ‘Positive1’ and
‘Positive2’ refer to the degree of positivity of an annotated term, while ‘Negative1’
and ‘Negative2’ refer to the degree of negativity of an annotated term.</p>
        <p>When a long-term temporal expression is found, its value is calculated taking into
account the at pertaining to its cotext. If a positive at is found, then its value is 2. On
the contrary, if a negative at is found its value is -2. Sort- term temporal expressions
are calculated in the same way for negative value, i.e adding -2. However, for positive
value the system only adds 1positive. This is because we consider that financial
shortterm positive values change too frequently to consider them at the same level as
longterm values.</p>
        <p>Next, if the semantic polarity value of a news is less than 0, the news is labelled as
negative. In contrast, if the value is higher than 0, the news is labelled as positive.
Finally, if the sum of all values is 0 the news is labelled as neutral. An example of
how the algorithm works is shown in figure 4.</p>
        <p>T-+15
+8
+2
-9
1
2
3
4</p>
        <p>I++
++</p>
        <p>T++
-
I-+
+</p>
        <p>I-+
+
I++ T++
+ +</p>
        <p>I++</p>
        <p>T++
++
+</p>
        <p>T++
+
T++
++ T++</p>
        <p>T-</p>
        <p>T++
N</p>
        <p>T-</p>
        <p>-</p>
        <p>Let us suppose that a user searches for the company ‘Adidas’. In the example
depicted in figure 4, four different news items are retrieved. In the figure, semantic
annotations are the elements surrounded by a rectangle, which have been mapped
onto ontology instances. GeneralPositive are indicated with one ‘+’ sign and
DomainPositive with two, ‘++’. On the other hand, GeneralNegative are indicated
with one ‘–‘ sign and DomainNegative with two, ‘--'. The modifiers Negative,
Temporal and Intensifier are indicated with ‘N’, ‘T’, ‘I’ respectively, together with
the corresponding positive or negative symbol.</p>
        <p>The outcome of the process is three positive and one negative news items. In this
particular example, the presence of long-term temporal expressions, such as ‘2012’ or
‘year’, in conjunction with positive annotated terms, gives to the news a high positive
value. The user can organize the final results in accordance with their degree of
positivity and negativity.</p>
      </sec>
      <sec id="sec-3-4">
        <title>4.4 Semantic search engine</title>
        <p>In OWL-based ontologies, ‘rdfs:label’ is an instance of ‘rdf:property’ that may be
used to provide a human readable version of a resource name. In this work, all the
resources in the ontology have been annotated with the ‘rdfs:label’ descriptor. By
considering that, the main objective of this module is to identify the financial news
items that are related to the query issued by a user. Besides, this module is responsible
for classifying and sorting the results in accordance with the sentiment classification
that was described in the previous section.</p>
        <p>The system is constantly crawling news information from RSS feeds and creating
semantic annotations for the news pages. If no annotations are created for a news
item, then such news item is not stored in the database. On the other hand, the news
items that have been successfully annotated are processed to obtain their sentiment
classification, which is also stored in the database. For example, let us suppose that
the ontology contains the taxonomy presented in figure 3. There are two kinds of
companies, namely, “Energy company” and “ICT company”. Each of these classes
contains a set of individuals such as “Microsoft” and "GE energy", respectively. If the
user is searching for news about “Microsoft”, the system will certainly return all the
news annotated with the individual Microsoft. Moreover, news related to other ICT
companies could be relevant to the user, so the system also shows other news about
companies such as Google, Apple and Nokia. If the user queries the system for
“Energy companies”, then the result will include all the news that contains the
concept “Energy company” and therefore the news related to the “GE Energy”,
"Texaco” and “Shell" companies will be retrieved. Furthermore, if the query is such a
general word as “Company”, the user is given the possibility of filtering the results
according to the subclasses of “Company”, namely, “Energy company” and “ICT
company”.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5 Evaluation</title>
      <p>In this section, the experimental results obtained by the proposed method in the
financial news domain are presented. The corpus of the experiment contains 57.210
words and comprises 900 abstracts of financial news (512 negative and 388 positive).
This corpus has been extracted from the RSS feeds shown in table 1 and each news
item has been manually labelled, either as a positive news or a negative one, by two
different annotators. This constitutes the baseline for the evaluation, which works as
follows: if the result displayed by the system fits in with the manually annotated
news, the result is considered correct, otherwise, incorrect. In the sentiment analysis
field, it is agreed that human-based annotations are around 70-80% precise (i.e. 2
different humans can disagree in 20-30% of cases). However, for the purposes of this
experiment, the news items that have been source of disagreement between annotators
have been removed.</p>
      <p>In the experiment, a total of five queries are issued to the system to find
information in the financial domain. The results of the experiment are shown in table
2. It is possible to observe that the sentimental analysis accuracy results are very
promising, with an aggregate accuracy mean of 87%. These results take into account
the system’s final decision (positive or negative) and not the process that the system
carries out to produce such decision.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusions</title>
      <p>This paper proposes an algorithm for opinion extraction in financial news. Different
gazetteer lists have been created as specialized lexicons in financial sentiment. The</p>
      <sec id="sec-5-1">
        <title>Total 44 Pos Neg Pos</title>
        <p>Neg
Pos
Neg
Pos
Neg</p>
      </sec>
      <sec id="sec-5-2">
        <title>Total 49</title>
        <p>Total 44
33
11
13
36
15
29
25
97
66
14</p>
      </sec>
      <sec id="sec-5-3">
        <title>Total 122 107 Pos Neg</title>
      </sec>
      <sec id="sec-5-4">
        <title>Total 80 Total 678 592</title>
        <p>1
2
3
4
5
28
9
37
13
34
47
14
24
38
21
86
55
12
67
100%
94.44%
sentiment algorithm assigns different degrees of positivity or negativity to relevant
annotated terms and calculates what the polarity of the news is.</p>
        <p>This approach contributes to the research on financial sentiment annotation, and
the development of decision support systems (1) by proposing a novel approach for
financial sentiment determination in news which combines ontological resources with
natural language processing resources, (2) by describing an algorithm for assigning
differential degrees of positivity or negativity to classifier results on different
categories identified by the classifier, and (3) by proposing a set of resources, i.e.
gazetteer lists and an ontology, for sentiment annotation.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work has been supported by the Spanish Government through project SeCloud
(TIN2010-18650).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1 Chen, H.,
          <string-name>
            <surname>Zimbra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>AI and opinion mining</article-title>
          .
          <source>Intelligent Systems</source>
          , IEEE.
          <volume>25</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>74</fpage>
          -
          <lpage>80</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2 Popescu,
          <string-name>
            <given-names>A.M.</given-names>
            ,
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          : In:
          <article-title>Extracting product features and opinions from reviews</article-title>
          .
          <source>Proceedings of the conference on human language technology and empirical methods in natural language processing; Association for Computational Linguistics</source>
          , pp.
          <fpage>339</fpage>
          -
          <lpage>46</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3 Ding,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>The utility of linguistic rules in opinion mining</article-title>
          .
          <source>In Proceedings of 30th Annual International ACM Special Interest Group on Information Retrieval Conference (SIGIR'07)</source>
          , Amsterdam, The Netherlands (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4 Balahur,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Montoyo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Determining the semantic orientation of opinions of products- a comparative analysis</article-title>
          .
          <source>Procesamiento del lenguaje natural</source>
          ,
          <volume>41</volume>
          , pp.
          <fpage>201</fpage>
          -
          <lpage>8</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5 Ahmad, K., Cheng, D.,
          <string-name>
            <surname>Almas</surname>
          </string-name>
          , Y.:
          <article-title>Multi-lingual Sentiment Analysis of Financial News Streams</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Computational Approaches to Arabic Script-based Languages, Linguistic Society of America</source>
          , Linguistic Institute, Stanford University, pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6 Devitt,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>Sentiment analysis in financial news: A cohesionbased approach</article-title>
          .
          <source>In Proceedings of the Association for Computational Linguistics (ACL)</source>
          , pp.
          <fpage>984</fpage>
          -
          <lpage>991</lpage>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7 Schumaker,
          <string-name>
            <given-names>R.P.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          , H.:
          <article-title>Textual analysis of stock market prediction using breaking financial news: The AZFin text system</article-title>
          .
          <source>ACM Transactions on Information Systems 27</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>8 Studer</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benjamins</surname>
            <given-names>V.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fensel</surname>
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Knowledge engineering: Principles and methods</article-title>
          .
          <source>Data Knowledge Engineering</source>
          .
          <volume>25</volume>
          (
          <issue>1-2</issue>
          ), pp.
          <fpage>161</fpage>
          -
          <lpage>97</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9 Godbole,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Srinivasaiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Skiena</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Largescale sentiment analysis for news and blogs:</article-title>
          <source>In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM)</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>10 Klein</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Häusser</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altuntas</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grcar</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Large scale information extraction and integration infrastructure for supporting financial decision making</article-title>
          .
          <source>Deliverable: D4</source>
          .
          <article-title>1 First semantic information extraction prototype</article-title>
          , http://project-first.eu/content/d41-first
          <article-title>-semanticinformation-extraction-</article-title>
          <string-name>
            <surname>prototype</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11 Gruber TR.
          <article-title>: A translation approach to portable ontology specifications</article-title>
          .
          <source>Knowledge Acquisition</source>
          .
          <volume>5</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>199</fpage>
          -
          <lpage>220</lpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12
          <string-name>
            <surname>Valencia-García</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Fernández-Breis</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruiz-Martínez</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>García-Sánchez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Martínez-Béjar</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A knowledge acquisition methodology to ontology construction for information retrieval from medical documents</article-title>
          .
          <source>Expert Systems: The Knowledge Engineering Journal</source>
          <volume>25</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>314</fpage>
          -
          <lpage>334</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13
          <string-name>
            <surname>Lupiani-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>García-Manotas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valencia-García</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>García-Sánchez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>CastellanosNieves</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernández-Breis</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camón-Herrero</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          :
          <article-title>Financial news semantic search engine</article-title>
          .
          <source>Expert systems with applications</source>
          <volume>38</volume>
          (
          <issue>12</issue>
          ) pp.
          <fpage>15565</fpage>
          -
          <lpage>15572</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14
          <string-name>
            <surname>García-Sánchez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valencia-García</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martínez-Béjar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernández-Breis</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          :
          <article-title>An ontology, intelligent agent-based framework for the provision of semantic web services</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>36</volume>
          (
          <article-title>2) Part 2</article-title>
          , pp.
          <fpage>3167</fpage>
          -
          <lpage>3187</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15
          <string-name>
            <surname>Valencia-García</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>García-Sánchez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castellanos-Nieves</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernández-Breis</surname>
          </string-name>
          , J.T.:
          <article-title>OWLPath: an OWL ontology-guided query editor</article-title>
          :
          <source>IEEE Transactions on Systems, Man, Cybernetics: Part A</source>
          , vol
          <volume>41</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>121</fpage>
          -
          <lpage>136</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16 Partridge C.:
          <article-title>The role of ontology in integrating semantically heterogeneous databases</article-title>
          .
          <source>Report No.: LADSEB-CNR Technical Report 05/2002</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17 Fox,
          <string-name>
            <given-names>M.S.</given-names>
            ,
            <surname>Gruninger</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Enterprise modeling</article-title>
          .
          <source>AI magazine</source>
          .
          <volume>19</volume>
          (
          <issue>3</issue>
          ):
          <volume>109</volume>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18 Corcho,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Martínez</surname>
          </string-name>
          <string-name>
            <surname>Montes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Bas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.L.</given-names>
            ,
            <surname>Bellido</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          : Financial Ontology.
          <source>DIP deliverable D10</source>
          .
          <volume>3</volume>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19 Bonsón,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Cortijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Escobar</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Towards the global adoption of XBRL using international financial reporting standards (IFRS)</article-title>
          .
          <source>International Journal of Accounting Information Systems</source>
          ,
          <volume>10</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>46</fpage>
          -
          <lpage>60</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20 Andreevskaia,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Bergler</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>When specialists and generalists work together: Overcoming domain dependence in sentiment tagging</article-title>
          .
          <source>Proceedings of ACL-08: HLT</source>
          , pp-
          <volume>290</volume>
          -
          <fpage>298</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>21 Grishman</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kittredge</surname>
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Analyzing language in restricted domains: Sublanguage description and processing</article-title>
          . Lawrence Erlbaum, (
          <year>1986</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>22 Grishman</surname>
            <given-names>R</given-names>
          </string-name>
          :
          <article-title>Adaptive information extraction and sublanguage analysis</article-title>
          .
          <source>In Kushmeric N (ed.) Proceedings of Workshop on Adaptive Text Extraction and Mining at Seventeenth International Joint Conference on Artificial Intelligence</source>
          . WA: Seattle. http://nlp.cs.nyu.edu/pubs/papers/grishman-ijcai01.
          <fpage>pdf</fpage>
          , (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23 Murugesan,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <source>Understanding web 2</source>
          .0.: IT professional.
          <volume>9</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>34</fpage>
          -
          <lpage>410</lpage>
          , (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>