<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Missing Mr. Brown and buying an Abraham Lincoln - Dark Entities and DBpedia</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marieke van Erp</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filip Ilievski</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Rospocher</string-name>
          <email>rospocher@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Piek Vossen</string-name>
          <email>piek.vosseng@vu.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>VU University Amsterdam</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We argue for the need for the community to address the issue of “dark entities”, those domain entities for which a knowledge base has no information in the context of the entity linking task for building Event-Centric Knowledge Graphs. Through an analysis of a large (1,2 million article) automotive newswire corpus against DBpedia, we identify six classes of errors that lead to dark entities. Finally, we outline further steps that can be taken for tackling this issue.</p>
      </abstract>
      <kwd-group>
        <kwd>natural language processing</kwd>
        <kwd>domain-specific entity linking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Knowledge graphs are becoming ever more important for companies to organise and
access their data [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. A key starting point for most knowledge graphs is data extracted
from Wikipedia. Hence, DBpedia [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and DBpedia Spotlight [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] often serve as basis for
knowledge graph construction and entity grounding efforts [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. However, due to the
breadth and the general nature of Wikipedia, it is often insufficient for domain-specific
use cases such as biomedical entity linking [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Further, there are bias and coverage
issues with Wikipedia and its derivatives as evidenced by the work of Kittur [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and
Tiddi [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] as well as lack of location information shown in Wikidata World Maps3.
      </p>
      <p>Another dimension that is not sufficiently covered by DBpedia is events. As we aim
to build Event-Centric Knowledge Graphs (ECKGs), we need to know more about an
entity than its type. If it is for example a person, we need to know things such as what
his/her role is, when (s)he was appointed, what (s)he did before, etc. For organisations,
we need to know what deals they made with whom, what products they offer etc.</p>
      <p>Given the importance of DBpedia for entity grounding efforts and its apparent skew,
we assert that more attention needs to be paid to the issue of dark entities in our natural
language processing pipelines. Dark entities are those entities for which no relevant
information was identified in a given knowledge base / entity repository. In many cases,
this means that there is no resource present in the knowledge base. There are also cases
where a resource is present, but it contains very little or no relevant information to
further reason about the entity with. For example, http://dbpedia.org/page/
Maurice_Lippens, contains no other information than redirect links.
3 https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en</p>
      <p>
        To obtain insight in the scale of the problem, we performed an analysis of the dark
entities phenomenon with respect to a newswire corpus and DBpedia. For news, the
performance of NLP pipelines are typically good and the coverage of DBpedia should
be satisfactory. We find that for over 40,000 entities, there was no additional
information available in an external resource. Based on this analysis, we catalog six classes of
errors that lead to dark entities. We also discuss true dark entities, those in which the
failure is due to the fact that no resource within DBpedia exists. True dark entities are
those domain entities that, while not “famous” enough to deserve a page in Wikipedia
(and hence an entry in DBpedia), play some role in the given domain. We note that
the community has been concerned with NIL entities [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] (entities that may not have
high rates of occurrence within a corpus or are domain-specific). Dark entities subsume
this problem and expand it to consider problems in the entity grounding itself as well
as deriving ECKGs involving these entities. For example, the event “crash” can mean
different things. Knowing its participants are entities is not enough. We need to know
what they do and how they are related. If the subject and object are the CEO and CFO of
the same company this is valuable information to judge the implication of them
crashing, whilst a company that crashes has an entirely different meaning. Dark entities are
also related to the notion of emerging entities [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], but this is focused on identifying
potentially ambiguous entities becoming highly popular in news in a short time.
      </p>
      <p>This paper is structured as follows. In Section 2, we describe the dataset used for
our analysis, followed by a discussion of the entity grounding pipeline used (Section 3).
In Section 4, we present our analyses. We conclude with thoughts on ways forward.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Global Automotive Industry Dataset</title>
      <p>The Global Automotive Industry Dataset was compiled in the context of the
NewsReader EU project,4 to investigate the impact of the 2008-2009 financial crisis on the
automotive industry.</p>
      <p>
        The dataset comprises 1,259,748 English news articles covering the period 2003
- 2013. It was extracted from the LexisNexis database5 through a series of complex
queries containing major car makers. The dataset has been used as a test bed to scale
NLP and Semantic Web technologies to extract entities, events and information about
them from natural language text, resulting in Event-Centric Knowledge Graphs that
formalize long-term developments and changes. The system is described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and its
modules are openly available separately or as a virtual machine from http://www.
newsreader-project.eu/results/software/. The 1.2M articles were first
processed using NewsReader’s natural language processing pipeline that performs
information extraction at the document level. Then, a cross-document coreference module
aggregated the different event and entity mentions in the text to unique instances, with
links to the locations in the text documents where the events and entities were
mentioned.6 This resulted in 25,156,574 events and 1,967,150 entities. But for only less
4 http://www.newsreader-project.eu
5 http://www.lexisnexis.com
6 The generated RDF TRiG files are available from
http://www.newsreaderproject.eu/results/data/. The sources cannot be made available due to copyright
issues but can be accessed to LexisNexis subscribers via the document IDs in the RDF.
than 15% of the entity instances (277,425 out of 1,967,150) that the pipeline detected,
the system was able to link the entity to a DBpedia resource.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Entity Grounding in Detail</title>
      <p>
        To recognise entities in text, ixa-pipe-nerc [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]7 is employed, a perceptron based NERC
system that is trained on the AIDA/CoNLL 2003 dataset,8 enriched with
unsupervised clusters. This NERC system outperforms state-of-the-art systems on both the
AIDA/CoNLL 2003 benchmark and a set of 120 WikiNews corpus9 articles annotated
for this project ( [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], tables 6 and 7).
      </p>
      <p>
        The results from the NERC module in the pipeline, were used as input for DBpedia
Spotlight statistical models [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to ground the entity in the DBpedia knowledge base. On
the benchmark datasets TAC 2011 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and AIDA/CoNLL 200310 scores of 79.77
precision and 60.68 recall and 79.67 precision and 75.94 recall respectively were obtained.
We chose not to rely on DBpedia Spotlight’s internal entity spotter as it links far more
entities than the named entities we are currently investigating. Naturally, an analysis on
the concepts spotted and linked by Spotlight lies in the extension of this work.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Dark Entities: An Analysis</title>
      <p>An analysis of dark entity candidates shows that there are four main reasons for errors
that are related to recognising the correct candidates to link:
Real world data is dirty Character encodings are still a hard problem, resulting in
recognised entities such as JonBenaˆ©t and Bill %26 Melinda Gates Foundation.</p>
      <p>This is despite the fact that all data comes from a standardised digital repository.
Incorrect Named Entity Recognition Boundaries Named entity recognisers have
difficulties with longer entities where parts are left out (‘Ferdinand’ instead of ‘Rio
Ferdinand’), or erroneously included (‘Copyright Targeted News Services’ instead
of ‘Targeted News Services’).</p>
      <p>Conjunctions The named entity recogniser may get the boundaries of the entity as it
is referred to correct as it also relies on syntactic constituent information, but this
chunk may be made up of two or more entities, such as ‘Peugeot and Citroen’.
Both are valid entities and have resources in DBpedia, but the conjunction proves
difficult to analyse for the named entity disambiguator.</p>
      <p>
        Coreference The state-of-the-art in-document coreference resolution hovers around F1
score of 60 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This means that the system might be able to link the first mention
of ‘Gordon Brown’ in the text, but fail to connect this to a subsequent mention of
‘Mr. Brown’.
7 https://github.com/ixa-ehu/ixa-pipe-nerc
8 http://www.cnts.ua.ac.be/conll2003/ner/
9 http://en.wikinews.org/wiki/Main_Page
10
https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yagonaga/aida/downloads/
These errors may not result in true dark entities, but they mostly make it difficult for
the named entity linker to identify the correct context to link the entity candidate to.
Data cleaning can alleviate some of the instances coming from the first two causes but
is not trivial. Another possible remedy for the entity boundaries problem is to perform
the tasks of entity recognition and disambiguation simultaneously [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. As for
conjunctions, at first sight, one could think of splitting entities that have a mention of ‘and’
or ‘&amp;’ in them, but this would incorrectly split ‘Standard &amp; Poor’s Financial Services
LLC’. And what does one do with names that may have commas as part of their name?
      </p>
      <p>Resolving errors from the entity recogniser, still leaves two more causes of error,
related to dark entities:
Subdivisions The granularity level of an entity in text may not match its granularity in
a knowledge base. For instance, only the ‘bigger’ organisation will have an entry
in the knowledge base, making it difficult to link more specific entity mentions
such as ‘Volkswagen Middle East’. Linking to the generic ‘Volkswagen’ resource
may not be optimal, for example in a potential case where Volkswagen closes the
Volkswagen Middle East branch. When reasoning over such a statement, one would
falsely state that dbpedia:Volkswagen11 closes dbpedia:Volkswagen.</p>
      <p>Domain Mismatch / Ambiguity In Figure 1, the distribution of selected DBpedia
classes is charted against the class distribution of the entities recognised in the Global
Automotive Dataset v2 by DBpedia Spotlight and the class distribution of the
entities in 120 manually annotated Wikinews articles for this domain. The figure shows
that organisations are much more important in the NewsReader domain than in
DBpedia. It should be noted that there is a mismatch between the DBpedia class ‘Work’
and the ‘Work’ class in the Gold Standard dataset, as in the DBpedia classification
‘Work’ creative works such as artwork and software, whereas the NewsReader
annotation also includes electronics and car models. In the DBpedia classification,
these are covered by different classes such as ‘Device’ and
‘MeanOfTransportation’. The DBpedia ontology also shows that there is much attention for sports and
celebrities, as is demonstrated by the detailed subclass hierarchy under the ‘Person’
class. In the NewsReader domain, most persons are businesspeople, of different
levels of fame (CEOs, politicians and journalists). The domain mismatches between
DBpedia and the automotive dataset are often cause for linking errors, most
apparent when an entity is consistently linked to the most popular DBpedia candidate,
despite its low domain relevance or context fit. In the NewsReader data this for
example happens with ‘Lincoln’ which should refer to the carmaker (or in some
instances to ‘Lincoln Matresses’) but is consistently linked to the entry on ‘Abraham
Lincoln’ (or ‘Lincoln City F.C.’, ‘Lincoln Park’, ‘City of Lincoln’) instead.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Our analyses showed that although newswire is considered a standard domain in NLP
evaluations, with many tools tuned to it, recognising and linking entities mentioned is
nontrivial. Similarly as the phenomenon of emerging entities, the case of dark entities
has been treated as a side effect in contemporary linking systems. While the analysis is
11 PREFIX dbpedia: &lt;http://dbpedia.org/resource/&gt;
preliminary, combined with anecdotal evidence from discussions with other community
members emphasizes the need to address the problem of dark entities. What are some
avenues for addressing our six classes of errors? Possibilities include:
– Dynamic set of knowledge bases. The LOD cloud contains vast amount of sources,
diverse with respect to their topical domain and level of detail. Linking entities to
domain-relevant knowledge bases should increase the recall of the dark entities
and reduce the mismatch in granularity. Unfortunately, few practicalities need to
be resolved. Firstly, currently it is hard to include additional sources into DBpedia
Spotlight. Secondly, it may be nontrivial to decide which sources to consider.
– Expand knowledge bases. LOD resources are used as input for NLP applications.</p>
      <p>
        Recent work [
        <xref ref-type="bibr" rid="ref17 ref8">17, 8</xref>
        ] have exploited the possibility for the reverse direction: feeding
the NLP results back into the LOD cloud. Although these are not perfect yet, they
may be useful in extending the coverage of LOD. Besides, the LOD cloud resources
themselves are also not error-free [
        <xref ref-type="bibr" rid="ref16 ref3">16, 3</xref>
        ]. New information in the cloud could in
turn lead to different outcome for the NLP application, making their relationship
somewhat circular. We propose to examine the applicability of this circularity for
the dark entities use case, through a three-step strategy: 1) Identification of
unresolved entities; 2) Analysis of the domain in order to normalise these entities and
determine their semantic type and relations to other entities. Results are then fed
into a domain knowledge base, which is an add-on to DBpedia; 3) Re-processing of
the domain texts to obtain better interpretations of the entities and their activities.
– Leverage latent semantics. Techniques such as word2vec [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] that mine latent
semantics may be effective for providing guidance on how to propagate information
from one known entity to another. Such techniques are a promising mechanism to
enrich the representation of an entity, however, the errors caused by the chosen
latent semantics algorithm should be taken into account.
      </p>
      <p>Going forward, we hope that by tackling dark entities we can make for natural
language processing systems which are more accurate and have higher coverage.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The research for this paper was supported by the European Union’s 7th Framework
Programme via the NewsReader Project (ICT-316404).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aldabe</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beloki</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laparra</surname>
            , E., de Lacalle,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soroa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokkens</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Izquierdo</surname>
            , R., van Erp,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vossen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girardi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minard</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          :
          <article-title>Event detection, version 2 deliverable 4.2.2</article-title>
          . Deliverable, NewsReader
          <string-name>
            <surname>Project</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bermudez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
          </string-name>
          , G.:
          <article-title>IXA Pipeline: Efficient and ready to use multilingual NLP tools</article-title>
          .
          <source>In: Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014)</source>
          , Reykjavik, Iceland (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Beek</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rietveld</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bazoobandi</surname>
            ,
            <given-names>H.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wielemaker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlobach</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Lod laundromat: a uniform way of publishing other people's dirty data</article-title>
          .
          <source>In: The Semantic Web-ISWC</source>
          <year>2014</year>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>228</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Becker</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Dbpedia - a crystallization point for the web of data</article-title>
          .
          <source>Journal of Web Semantics: : Science, Services and Agents on the World Wide Web</source>
          <volume>7</volume>
          (
          <issue>3</issue>
          ),
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Entity-centric coreference resolution with model stacking</article-title>
          .
          <source>In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          . pp.
          <fpage>1405</fpage>
          -
          <lpage>1415</lpage>
          . Bejing, China (
          <volume>26</volume>
          -
          <issue>31</issue>
          <year>July 2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Daiber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakob</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hokamp</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          :
          <article-title>Improving efficiency and accuracy in multilingual entity extraction</article-title>
          .
          <source>In: Proceedings of the 9th International Conference on Semantic Systems</source>
          . pp.
          <fpage>121</fpage>
          -
          <lpage>124</lpage>
          . ACM (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. van Erp,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Vossen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Agerri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Minard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.L.</given-names>
            ,
            <surname>Speranza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Urizar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Laparra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Aldabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Rigau</surname>
          </string-name>
          , G.:
          <article-title>Annotated data, version 2</article-title>
          .
          <source>deliverable d3.3.2. Tech. rep., NewsReader Project</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hoffart</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>Discovering emerging entities with ambiguous names</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on World wide web</source>
          . pp.
          <fpage>385</fpage>
          -
          <lpage>396</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grishman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Knowledge base population: Successful approaches and challenges</article-title>
          .
          <source>In: Proceedings of ACL</source>
          <year>2011</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kittur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>E.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suh</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>What's in wikipedia? : mapping topics and conflict using socially annotated category structure</article-title>
          .
          <source>In: CHI'09</source>
          . pp.
          <fpage>1509</fpage>
          -
          <lpage>1512</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mikolov</surname>
          </string-name>
          , T.,
          <string-name>
            <surname>tau Yih</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweig</surname>
          </string-name>
          , G.:
          <article-title>Linguistic regularities in continuous space word representations</article-title>
          .
          <source>In: Proceedings of NAACL HLT</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Moro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raganato</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          , R.:
          <article-title>Entity linking meets word sense disambiguation: a unified approach</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>2</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nickel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabrilovich</surname>
          </string-name>
          , E.:
          <article-title>A review of relational machine learning for knowledge graphs - from multi-relational link prediction to automated knowledge graph construction</article-title>
          . http://arxiv.org/pdf/1503.00759.
          <string-name>
            <surname>pdf</surname>
          </string-name>
          (
          <year>Mar 2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Rizzo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troncy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bruemmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Nerd meets nif: Lifting nlp extraction results to the linked data cloud</article-title>
          .
          <source>In: LDOW'12</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Tiddi</surname>
            , I., d'Aquin,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
          </string-name>
          , E.:
          <article-title>Quantifying the bias in data links</article-title>
          .
          <source>In: Knowledge Engineering and Knowledge Management</source>
          , pp.
          <fpage>531</fpage>
          -
          <lpage>546</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Wienand</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Detecting incorrect numerical data in dbpedia</article-title>
          .
          <source>In: The Semantic Web: Trends and Challenges</source>
          , pp.
          <fpage>504</fpage>
          -
          <lpage>518</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>K.Q.</given-names>
          </string-name>
          :
          <article-title>Probase: A probabilistic taxonomy for text understanding</article-title>
          .
          <source>In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data</source>
          . pp.
          <fpage>481</fpage>
          -
          <lpage>492</lpage>
          . ACM (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Howsmon</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hahn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
          </string-name>
          , H.:
          <article-title>Entity linking for biomedical literature</article-title>
          .
          <source>extraction 24</source>
          ,
          <issue>19</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>