<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multilingual Disambiguation of Named Entities Using Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ricardo Usbeck}</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel-Cyrille Ngonga Ngomo}</string-name>
          <email>fusbeckjngongag@informatik.uni-leipzig.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wencan Luo~</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lars Wesemann</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>One key step towards extracting structured data from unstructured data sources is the disambiguation of entities. With AGDISTIS, we provide a time-e cient, state-of-the-art, knowledge-base-agnostic and multilingual framework for the disambiguation of RDF resources. The aim of this demo is to present the English, German and Chinese version of our framework based on DBpedia. We show the results of the framework on texts pertaining to manifold domains including news, sports, automobiles and e-commerce. We also summarize the results of the evaluation of AGDISTIS on several languages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        A signi cant portion of the information on the Web is still only available in
textual format. Addressing this information gap between the Document Web
and the Data Web requires amongst others the extraction of entities and
relations between these entities from text. One key step during this processing is
the disambiguation of entities (also known as entity linking). The AGDISTIS
framework [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] (which will also be presented at this conference) addresses two
of the major drawbacks of current entity linking frameworks [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1,2,3</xref>
        ]: time
complexity and accuracy. With AGDISTIS, we have developed a framework that
achieves polynomial time complexity and outperforms the state of the art w.r.t.
accuracy. The framework is knowledge-base-agnostic (i.e., it can be deployed on
any knowledge base) and is also language-independent. In this demo, we will
present AGDISTIS deployed on three di erent languages (English, German and
Chinese) and three di erent knowledge bases (DBpedia, the German DBpedia
and the Chinese DBpedia). To the best of our knowledge, we therewith provide
the rst Chinese instantiation of entity linking to DBpedia. We will also
demonstrate the AGDISTIS web services endpoints for German, English and Chinese
disambiguation and show how data can be sent to the endpoints. Moreover, the
output format of AGDISTIS will be explained. An online version of the demo is
available at http://agdistis.aksw.org/demo.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Demonstration</title>
      <p>Within our demonstration, we aim to show how AGDISTIS can be used by
non-expert as well as expert users. For non-experts, we provide a graphical user
interface (GUI). Experts can choose to use the REST interfaces provided by
the tool or use a Java snippet to call the REST interface. The whole of this
functionality, which will be described in more details in the following sections,
will also be demonstrated at the conference.
2.1</p>
      <sec id="sec-2-1">
        <title>AGDISTIS for non-expert users</title>
        <p>A screenshot of the AGDISTIS GUI is shown in Figure 1. This GUI supports
the following work ow.</p>
        <p>
          Entity Recognition After typing or pasting text into the input eld, users can
choose between either annotating the entities manually or having the entities
detected automatically. In the rst case, the labels of the entities are to be
marked by using square brackets (see central panel of Figure 1). In the case of
an automatic annotation, we send the text to the FOX framework, which has
been shown to outperform the state of the art in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. We will demonstrate this
feature by using both manually pre-annotated text and text without annotations
in our examples (see upper bar of Figure 1). Moreover, we will allow the crowd
to enter arbitrary texts that pertain to their domain of interest.
Automatic Language Detection Once the user has set which entities are to
be disambiguated, the marked-up text is sent to the language detection module
based on [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. We chose this library because it is both precise (&gt; 99% precision)
and time-e cient. If the input is detected to belong to one of the languages
we support (i.e., German, Chinese, English), then we forward the input to a
dedicated AGDISTIS instance for this given language. In all other cases, an error
message is shown to the user, pointing towards the language at hand not being
supported. The main advantage of this approach is that the user does not need
to select the language in which the text is explicated manually, thus leading to
an improved user experience. We will demonstrate this feature by entering text
in di erent languages (German, English, French, Chinese, etc.) and presenting
the output of the framework for each of these test cases.
        </p>
        <p>
          Entity Linking This is the most important step of the whole work ow. The
annotated text is forwarded to the corresponding language-speci c deployment
of AGDISTIS, of which each relies on a language-speci c version of DBpedia 3.9.
The approach underlying AGDISTIS [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is language-independent and combines
breadth- rst search and the well-known HITS algorithm. In addition, string
similarity measures and label expansion heuristics are used to account for typos
and morphological variations in naming. Moreover, Wikipedia-speci c surface
forms for resources can be used.
        </p>
        <p>Output Within the demo the annotated text is shown below the input eld
where disambiguated entities are colored to highlight them. While hovering a
highlighted entity the disambiguated URI is shown. We will demonstrate the
output of the entity linking by using the examples shown in the upper part of
Figure 1. The output of the system will be shown both in a HTML version
and made available as a download in JSON. Moreover, we will allow interested
participants to enter their own examples and view the output of the tool.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>AGDISTIS for expert users</title>
        <p>To support di erent languages we set up a REST URI for each of the language
versions. Each of these endpoints understands two mandatory parameters: (1)
text which is an UTF-8 and URL encoded string with entities annotated with
XML-tag &lt;entity&gt; and (2) type='agdistis' to disambiguate with the
AGDISTIS algorithm. In the future, several wrappers will be implemented to use di
erent entity linking algorithms for comparison. Following, a CURL1 snippet shows
how to address the web service, see also http://agdistis.aksw.org:
curl --data-urlencode "text='&lt;entity&gt;Barack Obama&lt;/entity&gt; arrives
in &lt;entity&gt;Washington, D.C.&lt;/entity&gt;.'" -d type='agdistis'
{AGDISTIS URL}/AGDISTIS
1 http://curl.haxx.se/</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>
        English and German Evaluation. AGDISTIS has been evaluated on 8
different datasets from diverse domains such as news, sports or buisiness reports.
For English datasets AGDISTIS is able to outperform the currently best
disambiguation framework, TagMe2, on three out of four datasets by up to 29.5%
F-measure. Considering the only German dataset available for named entity
disambiguation, i.e., news.de [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we are able to outperform the only competitor
DBpedia Spotlight by 3% F-measure.
      </p>
      <p>Chinese Evaluation. We evaluated the Chinese version of AGDISTIS within
a question answering setting. To this end, we used the multilingual benchmark
provided in QALD-42. Since the Chinese language is not supported, we extended
the QALD-4 benchmark by translating the English questions to Chinese and
inserted the named entity links manually. The accuracies achieved by AGDISTIS
for the train and test datasets are 65% and 70% respectively.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We presented the demo of AGDISTIS for three di erent languages on three
di erent DBpedia-based knowledge bases. In future work, we aim to create a
single-server multilingual version of the framework that will intrinsically support
several languages at the same time. To this end, we will use a graph merging
algorithm to combine the di erent versions of DBpedia to a single graph. The
disambiguation steps will then be carried out on this unique graph.</p>
      <p>Acknowledgments This work has been supported by the
ESF and the Free State of Saxony and the FP7 project
GeoKnow (GA No. 318159).
2 http://greententacle.techfak.uni-bielefeld.de/~cunger/qald</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Ferragina</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ugo</given-names>
            <surname>Scaiella</surname>
          </string-name>
          .
          <article-title>Fast and accurate annotation of short texts with wikipedia pages</article-title>
          .
          <source>IEEE software</source>
          ,
          <volume>29</volume>
          (
          <issue>1</issue>
          ),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Pablo</surname>
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Mendes</surname>
            , Max Jakob, Andres Garcia-Silva, and
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Dbpedia spotlight: Shedding light on the web of documents</article-title>
          .
          <source>In Proceedings of the 7th International Conference on Semantic Systems (I-Semantics)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Moro</surname>
          </string-name>
          , Alessandro Raganato, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <article-title>Entity linking meets word sense disambiguation: a uni ed approach</article-title>
          .
          <source>TACL</source>
          ,
          <volume>2</volume>
          :
          <fpage>231</fpage>
          {
          <fpage>244</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Michael Roder, Ricardo Usbeck, Sebastian Hellmann, Daniel Gerber, and Andreas Both.
          <article-title>N3 - a collection of datasets for named entity recognition and disambiguation in the nlp interchange format</article-title>
          .
          <source>In LREC</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Nakatani</given-names>
            <surname>Shuyo</surname>
          </string-name>
          .
          <article-title>Language detection library for java</article-title>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Rene</given-names>
            <surname>Speck</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          .
          <article-title>Ensemble learning for named entity recognition</article-title>
          .
          <source>In International Semantic Web Conference</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Ricardo</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <surname>Axel-Cyrille Ngonga</surname>
            <given-names>Ngomo</given-names>
          </string-name>
          , Soren Auer, Daniel Gerber, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Both</surname>
          </string-name>
          .
          <article-title>Agdistis - agnostic disambiguation of named entities using linked open data</article-title>
          .
          <source>In International Semantic Web Conference</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>