<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Identifying Disputed Topics in the News</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Orphee De Clercq</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sven Hertling</string-name>
          <email>hertling@ke.tu-darmstadt.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Veronique Hoste</string-name>
          <email>veronique.hosteg@ugent.be</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Paolo Ponzetto</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Heiko Paulheim</string-name>
          <email>heikog@informatik.uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Knowledge Engineering Group, Technische Universitat Darmstadt</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LT3, Language and Translation Technology Team, Ghent University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Research Group Data and Web Science, University of Mannheim</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>News articles often re ect an opinion or point of view, with certain topics evoking more diverse opinions than others. For analyzing and better understanding public discourses, identifying such contested topics constitutes an interesting research question. In this paper, we describe an approach that combines NLP techniques and background knowledge from DBpedia for nding disputed topics in news sites. To identify these topics, we annotate each article with DBpedia concepts, extract their categories, and compute a sentiment score in order to identify those categories revealing signi cant deviations in polarity across di erent media. We illustrate our approach in a qualitative evaluation on a sample of six popular British and American news sites.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked Open Data</kwd>
        <kwd>DBpedia</kwd>
        <kwd>Sentiment Analysis</kwd>
        <kwd>Online News</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The internet has changed the landscape of journalism, as well as the way readers
consume news. With many newspapers providing a website available o ering
news for free, many people are no longer local readers who are subscribed to one
particular newspaper, but receive news from many sources, covering a wide range
of opinions. At the same time, the availability of online news sites allows for
indepth analysis of topics, their coverage, and the opinions about them. In this
paper, we explore the possibilities of current basic Semantic Web and Natural
Language Processing (NLP) technologies to identify topics carrying disputed
opinions.</p>
      <p>There are di erent scenarios in which identifying those disputed opinions is
interesting. For example, media studies are concerned with analyzing the
political polarity of media. Here, means for automatically identifying con icting topics
can help understanding the political bias of those sources. Furthermore,
campaigns of paid journalism may be uncovered, e.g. if certain media have signi cant
positive or negative deviations in articles mentioning certain politicians.</p>
      <p>In this paper, we start with the assumption that DBpedia categories help
us identify speci c topics. Next, we look at how the semantic orientation of
news articles, based on a lexicon-based sentiment analysis, helps us nd disputed
news. Finally, we apply our methodology to a web crawl of six popular news sites,
which were analyzed for both topics and sentiment. To this end, we rst annotate
articles with DBpedia concepts, and then use the concepts' categories to assign
topics to the articles. Disputed topics are located by rst identifying signi cant
deviations of a topics' average sentiment per news site from the news site's
overall average sentiment, and selecting those topics which have both signi cant
positive and negative deviations.</p>
      <p>This work contributes an interesting application of combining Semantic Web
and NLP techniques for a high-end task. The remainder of this paper is
structured as follows: in the next section we describe related work (Section 2). Next,
we present how we collected and processed the data used for our system (Section
3). We continue by describing some interesting ndings of our approach together
with some of its limitations (Section 4). We nish with some concluding remarks
and prospects for future research (Section 5).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background and Related Work</title>
      <p>
        Text and data mining approaches are increasingly used in the social science eld
of media or content analysis. Using statistical learning algorithms, Fortuna et
al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] focused on nding di erences in American and Arab news reporting and
revealed a bias in the choice of topics di erent newspapers report on or a di
erent choice of terms when reporting on a given topic. Also the work by Segev and
Miesch [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], which envisaged to detect biases when reporting on Israel, found
that news reports are largely critical and negative towards Israel. More
qualitative studies were performed, such as the discourse analysis by Pollak et al.[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
which revealed contrast patterns that provide evidence for ideological di erences
between local and international press coverage.These studies either focus on a
particular event or topic [
        <xref ref-type="bibr" rid="ref14 ref17">14,17</xref>
        ] or use text classi cation in order to de ne
topics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and most often require an upfront de nition of topics and/or manually
annotated training data. In this work, instead, we use semantic web technologies
to semantically annotate newswire text, and develop a fully automatic pipeline
to nd disputed topics by employing sentiment analysis techniques.
      </p>
      <p>
        Semantic annotation deals with enriching texts with pointers to knowledge
bases and ontologies [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Previous work mostly focused on linking mentions of
concepts and instances to either semantic lexicons like WordNet [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or
Wikipediabased knowledge bases [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] like DBpedia [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. DBpedia was for example used by [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
to automatically extract topic labels by linking the inherent topics of a text to
concepts found in DBpedia and mining the resulting semantic topic graphs. They
found that this is a better approach than using text-based methods. Sentiment
analysis, on the other hand, deals with nding opinions in text. Most research
has been performed on clearly opinionated texts such as product or movie
reviews [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], instead of newspaper texts which are believed to be less opinionated.
      </p>
      <p>Web
crawler</p>
      <p>Data
Collection</p>
      <p>News
source #1
...</p>
      <p>News
source #n
News texts</p>
      <p>Sentiment
Analysis
soNuercwes#1 +++
...</p>
      <p>News
++source #n
News texts
with polarity</p>
      <p>Sentiment
lexicons
,</p>
      <p>Topic</p>
      <p>Extraction
News + Television series
source #1 ++ LBGT history
...</p>
      <p>soNuercwes#n ++- ULKibpeorlaitlicpsarties
News texts
with polarity
and semantic
categories</p>
      <p>
        Disputed
Category
Identification
LBGT history
Liberal parties
Semantic
categories
An exception is the work performed by [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in the framework of the European
Media Monitor project [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        While the combination of sentiment analysis and semantic annotation for the
purpose discussed in this paper is relatively new, some applications have been
produced in the past. The DiversiNews tool [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], for example, enables the
analysis of text in a web-based environment for diversi ed topic extraction. Closely
related are DisputeFinder [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and OpinioNetIt [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The former is a browser
extension which highlights known disputed claims and presents the user with a
list of articles supporting a di erent point of view, the latter should allow to
automatically derive a map of the opinions-people network from news and other
web documents.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>Our process comprises four steps, as depicted in Fig. 1. First, data is collected
from online news sites. Next, the collected texts are augmented with sentiment
scores and semantic categories, which are then used to identify disputed
categories.
3.1</p>
      <sec id="sec-3-1">
        <title>Data Collection</title>
        <p>We have collected data from six online news sites. First, we looked at those
having a high circulation and online presence. Another criterion for selection
was the ability to crawl the website, since, e.g., dynamically loaded content is
hard to crawl.</p>
        <p>
          The six selected news sites ful lling these requirements are shown in Table 1.
We work with three UK and three US news sites. As far as the British news
sites are concerned, we selected one rather conservative news site, the Daily
Telegraph which is traditional right-wing; one news site, the Guardian, which
can be situated more in the middle of the political spectrum though its main
points of view are quite liberal; and nally also one tabloid news site, the Mirror,
which can be regarded as a very populist, left-wing news site.1 For the American
news sites, both the Las Vegas Review{Journal and the Hu ngton Post can be
perceived as more libertarian news sites2, with the latter one being the most
progressive [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], whereas the NY Daily News, which is also a tabloid, is still
liberal but can be situated more in the center and is even conservative when it
comes to matters such as immigration and crime.
        </p>
        <p>The news site articles were collected with the python web crawling framework
Scrapy3. This open-source software focuses on extracting items, in our case, news
site articles. Each item has a title, an abstract, a full article text, a date, and an
URL. We only crawled articles published in the period September 2013 { March
2014. Duplicates are detected and removed based on the article headlines.4
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Sentiment Analysis</title>
        <p>We consider the full article text as the context to determine the document's
semantic orientation. The basis of our approach to de ne sentiment relies on
word lists which are used to determine positive and negative words or phrases.</p>
        <p>
          We employ three well-known sentiment lexicons. The rst one is the Harvard
General Inquirer lexicon { GenInq [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] { which contains 4,206 words with either
a positive or negative polarity. The second one is the Multi-Perspective Question
Answering Subjectivity lexicon { MPQA [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] { which contains 8,222 words rated
between strong and weak positive or negative subjectivity and where
morphosyntactic categories (PoS) are also represented. The last one is the AFINN
lexicon [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], which includes 2,477 words rated between -5 to 5 for polarity.
        </p>
        <p>
          Before de ning a news article's polarity, all texts were sentence-split,
tokenized and part-of-speech tagged using the LeTs preprocessing toolkit [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. In
1 Cf. results of 2005 MORI research: http://www.theguardian.com/news/datablog/
2009/oct/05/sun-labour-newspapers-support-elections.
2 http://articles.latimes.com/2006/mar/08/entertainment/et-vegas8
3 http://scrapy.org/
4 The dataset and all other resources (e.g. RapidMiner processes) are made freely
available to the research community at http://dws.informatik.uni-mannheim.de/
en/research/identifying-disputed-topics-in-the-news.
        </p>
        <p>U.S. closes Syrian embassy in Washington, D.C.</p>
        <p>Senate panel approves huge sale of Apache helicopters to Iraq
Israel announces construction of Jewish settlements in the West Bank
dbpedia:Israel
dbpedia:Syria
dbpedia:Iraq
dbpedia:West_Bank
dcterms:subject</p>
        <p>dcterms:subject
category:Levant</p>
        <p>category:</p>
        <p>Fertile_Crescent
skos:broader
category:Near_East
(1)
a next step, various sentiment scores were calculated on the document level by
performing a list look-up. For each document, we calculated the fraction of
positive and negative words by normalizing over text length, using each lexicon
separately. Then, in a nal step we calculated the sum of the values of identi ed
sentiment words, which resulted in an overall value for each document. That is,
for each document d, our approach takes into consideration an overall lexicon
score de ned as:
lexscore(d) =
n
X vwi :
i=1
where wi is the i-th word from d matched in the lexicon at hand, and vwi its
positive or negative sentiment value.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Topic Extraction</title>
        <p>
          We automatically identify the topics of our news articles on the basis of a
twostep process. First, we identify concepts in DBpedia [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. To that end, each article's
headline and abstract are processed with DBpedia Spotlight [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Next, categories
for each concept are created, corresponding to the categories in Wikipedia: we
extract all direct categories for each concept, and add the more general categories
two levels up in the hierarchy.
        </p>
        <p>These two phases comprise a number of generalizations to assign topics to a
text. First, processing with DBpedia Spotlight generalizes di erent surface forms
of a concept to a general representation of that concept, e.g. Lebanon, Liban, etc.,
as well as their in ected forms, are generalized to the concept dbpedia:Lebanon.
Second, di erent DBpedia concepts (such as dbpedia:Lebanon, dbpedia:Syria)
are generalized to a common category (e.g. category:Levant). Third, categories
(e.g. category:Levant, category:Fertile Crescent) are generalized to super
categories (e.g. category: Near East). We provide an illustration of this
generalization process in Fig. 2.</p>
        <p>
          The whole process of topic extraction, comprising the annotation with
DBpedia Spotlight and the extraction of categories, is performed in the RapidMiner
Linked Open Data Extension [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Table 2 depicts the number of concepts and
categories extracted per source. It can be observed that the number of categories
is about a factor of 10 larger than the number of concepts found by DBpedia
Spotlight alone. This shows that it is more likely that two related articles are
found by a common category, rather than a common concept.
(3)
        </p>
        <p>If the z score is positive, articles in the category c are more positive than the
average of the news source and the other way around. By looking up that z
score in a Gaussian distribution table, we can discard those deviations that
are statistically insigni cant. For instance, the Mirror contains three articles
annotated with the category Church of Scotland, with an average AFINN
sentiment score of 20:667, which is signi cant at a z-value of 2:270.
3. In the last step, we select those categories for which there is at least one
signi cant positive and one signi cant negative deviation. If two disputed
categories share the same extension of articles (i.e. the same set of articles is
annotated with both categories), we merge them into a cluster of disputed
categories.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Analysis</title>
      <p>The output of our system is presented in Table 3, showing that up to 19 disputed
topics can be identi ed in our sample. In what follows we present some interesting
ndings based on a manual analysis of the output and we also draw attention
to some limitations of our current approach. In general, we opt in this work for
a validation study of the system output { as opposed, for instance, to a
goldstandard based evaluation. This is because, due to the very speci c nature of
our problem domain, any ground truth would be temporally bound to a set of
disputed topics for a speci c time span.
4.1</p>
      <sec id="sec-4-1">
        <title>Findings</title>
        <p>If we look at the di erent percentages indicating the amount of articles found
with a signi cant positive or negative sentiment, we see that these numbers di er
among the lexicons. The Daily Mirror seems to contain most subjective articles
when using the GenInq lexicon, a role played by The Guardian and The Daily
News NY when using the MPQA lexicon and the AFINN lexicon, respectively.
The largest proportions are found within the Daily Mirror and the NY Daily
News, which is not surprising since these are the two tabloid news sites in our
dataset. Though the Daily Telegraph and the Daily Mirror seem to have no
signi cant deviations using the MPQA lexicon5, we nevertheless nd disputed
topics among the other four news sites. Consequently, the MPQA has the fewest
(11), followed by AFINN (17) and GenInq (19).</p>
        <p>Initially, we manually went through the output list of disputed topics and
selected two topics per lexicon that intuitively represent interesting news articles
(Table 5). What draws the attention when looking at these categories is that
these are all rather broad. However, if we have a closer look at the disputed
articles we clearly notice that these actually do represent contested news items.
Within the category Alternative medicine, for example, we nd that three
articles focus on medical marijuana legalization. To illustrate, we present these
articles with their headlines, the number of subjective words with some examples,
and the overall GI lexicon value6.</p>
        <p>{ NY Daily News. \Gov. Cuomo to allow limited use of medical marijuana
in New York" ! 7 positive (e.g. great, tremendous) and 5 negative (e.g.
di cult, stark) words; GI value of 2.00.
{ NY Daily News : \Gov. Cuomo says he won't legalize marijuana
Coloradostyle in New York", ! 5 positive (e.g. allow, comfortable) and 8 negative
(e.g. violation, controversial) words; GI value of -3.
{ Las Vegas Review : \Unincorporated Clark County could house Southern
Nevada medical marijuana dispensaries", ! 26 positive (e.g. ensure,
accommodate) and 10 negative (e.g. pessimism, prohibit) words; GI value of 16.
5 This might be due to MPQA's speci c nature, it has di erent gradations of sentiment
and also PoS tags need to be assigned in order to use it
6 However, as previously mentioned in Section 3, for the actual sentiment analysis we
only considered the actual news article and not its headline or abstract.</p>
        <p>Though the last article is clearly about a di cult issue within this whole
discussion, we see that the Las Vegas Review-Journal reports mostly positive about
this subject which could be explained by its libertarian background. Whereas
the NY Daily News, which is more conservative regarding such topics, reports
on this positive evolution by using less outspoken positive and even negative
language. A similar trend is re ected in the same two news sites when reporting
on another contested topic, i.e. gay marriage, which turns up using the MPQA
lexicon in the category LGBT history. We again present some examples.
{ Las Vegas Review : \Nevada AG candidates split on gay marriage" ! 25
positive: 16 weak (allow, defense) and 9 (clearly, opportunity) are strong
subjective and 13 negative: 10 weak (against, absence) and 3 (heavily,
violate) strong subjective. MPQA value of 19.
{ NY Daily News : \Michigan gov. says state won't recognize same-sex
marriages", ! 7 positive: 5 weak (reasonable, successfully) and 2 strong
(extraordinary, hopeful) subjective and 9 negative: 4 weak (little, least) and 5
strong subjective (naive, furious). MPQA value of -5.</p>
        <p>Another interesting nding we discover is that for four out of six categories,
the articles are quite evenly distributed between UK and US news sites and that
two categories stand out: Death seems to be more British and Liberal parties
more American. If we have a closer look at the actual articles representing these
categories we see 9 out of the 11 Death articles actually deal with murder and
were written for the Daily Mirror which is a tabloid news site focusing more
on sensation. As far as the 34 American articles regarding liberal parties are
concerned, we notice that all but six were published by the Las Vegas
ReviewJournal which is known for its libertarian editorial stance.</p>
        <p>These ndings reveal that using a basic approach based on DBpedia
category linking and lexicon-based sentiment analysis already allows us to nd some
interesting, contested news articles. Of course, we are aware that our samples
are too small to make generalizing assumptions which brings us to a discussion
of some of the limitations of our current approach.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Limitations</title>
        <p>In order to critically evaluate the limitations of our approach, we rst had a
look at the actual \topic representation". Since we use the lexicons as a basis
to nd disputed topics, we randomly select 20 news articles that show up
under a speci c category per lexicon and assess its representativeness. We found
that, because of errors in the semantic annotation process, out of these 60
examples, only 34 were actually representative of the topic or category in which they
were represented. If we look at the exact numbers per lexicons, this amounts
to an accuracy of 55% in the GenInq, one of 70% in MPQA and one of 40% in
AFINN. Examples of mismatches, i.e. where a DBpedia Spotlight concept was
misleadingly or erroneously tagged, are presented next:
{ AFINN, category:Television series by studio, tagged concepts:
United States Department of Veterans A airs, Nevada, ER TV series !
article is about a poor emergency room, not about the TV series ER.
{ GenInq, category:Film actresses by award, tagged concepts: Prince Harry
of Wales, Angelina Jolie ! article is about charity fraud, Angelina Jolie is
just a patron of the organization.</p>
        <p>We performed the same analysis on our manually selected interesting topics
(cf. Table 4) and found that actually 74 out of the 83 articles were representative.</p>
        <p>When trying to evaluate the sentiment analysis we found that this is a di cult
task when no gold standard annotations or clear guidelines are available. Various
questions immediately come to mind: does the sentiment actually represent a
journalist's or newspaper's belief or does it just tell something more about the
topic at hand? For example, considering the news articles in the Guardian dealing
with murder it might be that words such as \murder", \kill",... are actually
included as subjective words within the lexicon. However, at the moment this
latter question is overruled by our disputed topic ltering step, which discards
topics that are negative across all news sites.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>In this paper, we have discussed an approach which nds disputed topics in
news media. By assigning sentiment scores and semantic categories to a
number of news articles, we can isolate those semantic categories whose sentiment
scores deviate signi cantly across di erent news media. Our approach is entirely
unsupervised, requiring neither an upfront de nition of possible topics nor
annotated training data. An experiment with articles from six UK and US news
sites has shown that such deviations can be found for di erent topics, ranging
from political parties to issues such as drug legislation and gay marriage.</p>
      <p>
        There is room for improvement and further investigation in quite a few
directions. Crucially, we have observed that the assignment of topics is not
always perfect. There are di erent reasons for that. First, we annotate the whole
abstract of an article and extract categories. Apart from the annotation tool
(DBpedia Spotlight) not working 100% accurately, this means that categories
extracted for minor entities have the same weight as those extracted for major
ones. Performing keyphrase extraction in a preprocessing step (e.g. as proposed
by Mihalcea and Csomai [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]) might help overcoming this problem.
      </p>
      <p>In our approach, we only assign a global sentiment score to each article. A
more ne-grained approach would assign di erent scores to individual entities
found in the article. This would help, e.g. handling cases such as articles which
mention politicians from di erent political parties. In that case, having a polarity
value per entity would be more helpful than a global sentiment score.
Furthermore, more sophisticated sentiment analysis combining the lexicon approach
with machine learning techniques may improve the accuracy.</p>
      <p>Our approach identi es many topics, some of which overlap and refer to a
similar set of articles. To condense these sets of topics, we use categories'
extensions, i.e. the sets of articles annotated with a category. Here, an approach
exploiting both the extension as well as the subsumption hierarchy of categories
might deliver better results. Another helpful clue for identifying media polarity
is analyzing the coverage of certain topics. For example, campaigns of paid
journalism can be detected by a news site having a few articles on products from a
brand, which are not covered by other sites.</p>
      <p>Although many issues remain open, we believe this provides a rst seminal
contribution that shows the substantial bene ts of bringing together NLP and
Semantic Web techniques for high-level, real-world applications focused on a
better, semantically-driven understanding of Web resources such as online media.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The work presented in this paper has been partly funded by the PARIS project
(IWT-SBO-Nr. 110067) and the German Science Foundation (DFG) project
Mine@LOD (grant number PA 2373/1-1). Furthermore, Orphee De Clercq is
supported by an exchange grant from the German Academic Exchange Service
(DAAD STIBET scholarship program).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Rawia</given-names>
            <surname>Awadallah</surname>
          </string-name>
          , Maya Ramanath, and
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Opinionetit: Understanding the opinions-people network for politically controversial topics</article-title>
          .
          <source>In Proceedings of CIKM '11</source>
          , pages
          <fpage>2481</fpage>
          {
          <fpage>2484</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Alexandra</given-names>
            <surname>Balahur</surname>
          </string-name>
          , Ralf Steinberger, Mijail Kabadjov, Vanni Zavarella, Erik van der Goot, Matina Halkia, Bruno Pouliquen, and
          <string-name>
            <given-names>Jenya</given-names>
            <surname>Belyaeva</surname>
          </string-name>
          .
          <article-title>Sentiment analysis in the news</article-title>
          .
          <source>In Proc. of LREC'10</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Jon</given-names>
            <surname>Bekken</surname>
          </string-name>
          .
          <article-title>Advocacy newspapers</article-title>
          . In Christopher H. Sterling, editor,
          <source>Encyclopedia of Journalism. SAGE Publications</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Rob</given-names>
            <surname>Ennals</surname>
          </string-name>
          , Beth Trushkowsky, John Mark Agosta, Tye Rattenbury, and
          <string-name>
            <given-names>Tad</given-names>
            <surname>Hirsch</surname>
          </string-name>
          .
          <article-title>Highlighting disputed claims on the web</article-title>
          .
          <source>In ACM International WWW Conference</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Christiane Fellbaum, editor.
          <source>WordNet: An Electronic Lexical Database</source>
          . MIT Press, Cambridge, Mass.,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Blaz</given-names>
            <surname>Fortuna</surname>
          </string-name>
          , Carolina Galleguillos, and
          <string-name>
            <given-names>Nello</given-names>
            <surname>Cristianini</surname>
          </string-name>
          .
          <article-title>Detecting the bias in media with statistical learning methods</article-title>
          .
          <source>In Text Mining: Theory and Applications</source>
          . Taylor and Francis Publisher,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Eduard</given-names>
            <surname>Hovy</surname>
          </string-name>
          , Roberto Navigli, and Simone Paolo Ponzetto.
          <article-title>Collaboratively built semi-structured content and Arti cial Intelligence: The story so far</article-title>
          .
          <source>Arti cial Intelligence</source>
          ,
          <volume>194</volume>
          :2{
          <fpage>27</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Ioana</given-names>
            <surname>Hulpus</surname>
          </string-name>
          , Conor Hayes, Marcel Karnstedt, and
          <string-name>
            <given-names>Derek</given-names>
            <surname>Greene</surname>
          </string-name>
          .
          <article-title>Unsupervised graph-based topic labelling using dbpedia</article-title>
          .
          <source>In Proc. of WSDM '13</source>
          , pages
          <fpage>465</fpage>
          {
          <fpage>474</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Jens</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas,
          <string-name>
            <given-names>Pablo N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          , Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef,
          <string-name>
            <surname>Sren Auer</surname>
            , and
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>DBpedia { A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pablo</surname>
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Mendes</surname>
            , Max Jakob,
            <given-names>Andres</given-names>
          </string-name>
          <string-name>
            <surname>Garc</surname>
            a-Silva, and
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Dbpedia spotlight: Shedding light on the web of documents</article-title>
          .
          <source>In Proc. of the 7th International Conference on Semantic Systems (I-Semantics)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Rada</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andras</given-names>
            <surname>Csomai</surname>
          </string-name>
          . Wikify!:
          <article-title>Linking documents to encyclopedic knowledge</article-title>
          .
          <source>In Proc. of CIKM '07</source>
          , pages
          <fpage>233</fpage>
          {
          <fpage>242</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Finn</given-names>
            <surname>Nielsen</surname>
          </string-name>
          .
          <article-title>A new ANEW: Evaluation of a word list for sentiment analysis in microblogs</article-title>
          .
          <source>In Proc. of the ESWC2011 Workshop on Making Sense of Microposts: Big things come in small packages</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Heiko</surname>
            <given-names>Paulheim</given-names>
          </string-name>
          , Petar Ristoski, Evgeny Mitichkin, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Data mining with background knowledge from the web</article-title>
          . In RapidMiner World,
          <year>2014</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Senja</surname>
            <given-names>Pollak</given-names>
          </string-name>
          , Roel Coesemans, Walter Daelemans, and
          <string-name>
            <given-names>Nada</given-names>
            <surname>Lavrac</surname>
          </string-name>
          .
          <article-title>Detecting contrast patterns in newspaper articles by combining discourse analysis and text mining</article-title>
          .
          <source>Pragmatics</source>
          ,
          <volume>5</volume>
          :
          <year>1947</year>
          {
          <year>1966</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ana-Maria Popescu</surname>
            and
            <given-names>Orena</given-names>
          </string-name>
          <string-name>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Extracting product features and opinions from reviews</article-title>
          .
          <source>In Anne Kao and Stephen R</source>
          . Poteet, editors,
          <source>Natural Language Processing and Text Mining</source>
          , pages
          <fpage>9</fpage>
          <lpage>{</lpage>
          28. Springer London,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Lawrence Reeve and Hyoil Han.
          <article-title>Survey of semantic annotation platforms</article-title>
          .
          <source>In Proc. of the 2005 ACM symposium on Applied computing</source>
          , pages
          <volume>1634</volume>
          {
          <fpage>1638</fpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Elad</given-names>
            <surname>Segev</surname>
          </string-name>
          and
          <string-name>
            <given-names>Regula</given-names>
            <surname>Miesch</surname>
          </string-name>
          .
          <article-title>A systematic procedure for detecting news biases: The case of israel in european news sites</article-title>
          .
          <source>International Journal of Communication</source>
          ,
          <volume>5</volume>
          :
          <year>1947</year>
          {
          <year>1966</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ralf</surname>
            <given-names>Steinberger</given-names>
          </string-name>
          , Bruno Pouliquen,
          <string-name>
            <surname>and Erik Van der Goot.</surname>
          </string-name>
          <article-title>An introduction to the europe media monitor family of applications</article-title>
          . CoRR, abs/1309.5290,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Philip J. Stone</surname>
          </string-name>
          , Dexter C. Dunphy,
          <string-name>
            <surname>Marshall</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
          </string-name>
          , and
          <string-name>
            <surname>Daniel</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ogilvie</surname>
          </string-name>
          . The General Inquirer:
          <article-title>A Computer Approach to Content Analysis</article-title>
          . MIT Press, Cambridge, MA,
          <year>1966</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Mitja</surname>
            <given-names>Trampus</given-names>
          </string-name>
          , Flavio Fuart, Jan Bercic, Delia Rusu, Luka Stopar, and
          <string-name>
            <given-names>Tadej</given-names>
            <surname>Stajner</surname>
          </string-name>
          .
          <article-title>Diversinews a stream-based, on-line service for diversi ed news</article-title>
          .
          <source>In SiKDD 2013</source>
          , pages
          <fpage>184</fpage>
          {
          <fpage>188</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Marjan Van de Kauter</surname>
            , Geert Coorman, Els Lefever, Bart Desmet, Lieve Macken, and
            <given-names>Veronique</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
          </string-name>
          .
          <article-title>Lets preprocess: The multilingual lt3 linguistic preprocessing toolkit</article-title>
          .
          <source>Computational Linguistics in the Netherlands Journal</source>
          ,
          <volume>3</volume>
          :
          <fpage>103</fpage>
          {
          <fpage>120</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Theresa</surname>
            <given-names>Wilson</given-names>
          </string-name>
          , Janyce Wiebe, and Paul Ho man.
          <article-title>Recognizing contextual polarity in phrase-level sentiment analysis</article-title>
          .
          <source>In Proc. of HLT05</source>
          , pages
          <fpage>347</fpage>
          {
          <fpage>354</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>