<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multilingual Word Sense Disambiguation and Entity Linking for Everybody</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Moro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Cecconi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Navigli</string-name>
          <email>naviglig@di.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sapienza University of Rome, Viale Regina Elena 295</institution>
          ,
          <addr-line>00198</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present a Web interface and a RESTful API for our state-of-the-art multilingual word sense disambiguation and entity linking system. The Web interface has been developed, on the one hand, to be user-friendly for non-specialized users, who can thus easily obtain a rst grasp on complex linguistic problems such as the ambiguity of words and entity mentions and, on the other hand, to provide a showcase for researchers from other elds interested in the multilingual disambiguation task. Moreover, our RESTful API enables an easy integration, within a Java framework, of state-of-the-art language technologies. Both the Web interface and the RESTful API are available at http://babelfy.org</p>
      </abstract>
      <kwd-group>
        <kwd>Multilinguality</kwd>
        <kwd>Word Sense Disambiguation</kwd>
        <kwd>Entity Linking</kwd>
        <kwd>Web interface</kwd>
        <kwd>RESTful API</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The tasks of Word Sense Disambiguation (WSD) and Entity Linking (EL) are
well-known in the computational linguistics community. WSD [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] is a
historical task aimed at assigning meanings to single-word and multi-word occurrences
within text, while the aim of EL [
        <xref ref-type="bibr" rid="ref12 ref3">3, 12</xref>
        ] is to discover mentions of entities within
a text and to link them to the most suitable entry in the considered
knowledge base. These two tasks are key to many problems in Arti cial Intelligence
and especially to Machine Reading (MR) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], i.e., the problem of automatic,
unsupervised understanding of text. Moreover, the recent upsurge of interest in
the use of semi-structured resources to create novel repositories of knowledge
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] has opened up new opportunities for wide-coverage, general-purpose Natural
Language Understanding techniques. The next logical step, from the point of
view of Machine Reading, is to link natural language text to the aforementioned
resources.
      </p>
      <p>
        In this paper, we present a Web interface and a Java RESTful API for our
state-of-the-art approach to WSD and EL in arbitrary languages: Babelfy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Babelfy is the rst approach which explicitly aims at performing both
multilingual WSD and EL at the same time. The approach is knowledge-based and
exploits semantic relations between word meanings and named entities from
BabelNet [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a multilingual semantic network which provides lexicalizations and
glosses for more than 9 million concepts and named entities in 50 languages.
      </p>
    </sec>
    <sec id="sec-2">
      <title>BabelNet</title>
      <p>
        In our work we use the BabelNet 2.51 semantic network [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] since it is the largest
available multilingual knowledge base and is obtained from the automatic
seamless integration of Wikipedia2, WikiData3, OmegaWiki4, WordNet [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Open
Multilingual WordNet [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Wiktionary5. It is available in di erent formats,
such as via its Java API, a SPARQL endpoint and a linked data interface [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It
contains more than 9 million concepts and named entities, 50 million
lexicalizations and around 250 million semantic relations (see http://babelnet.org/stats
for more detailed statistics). Moreover, by using this resource we can leverage the
multilingual lexicalizations of the concepts and entities it contains to perform
disambiguation in any of the 50 languages covered in BabelNet.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>The Babelfy System</title>
      <p>
        Our state-of-the-art approach, Babelfy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], is based on a loose identi cation of
candidate meanings (substring matching instead of exact matching) coupled with
a densest subgraph heuristic which selects high-coherence semantic
interpretations. Here we brie y describe its three main steps:
1. Each vertex, i.e., either concept or named entity, is automatically associated
with a semantic signature, that is, a set of related vertices by means of
random walks with restart on the BabelNet network.
2. Then, given an input text, all the linkable fragments, i.e., pieces of text being
equal to or substring of at least one lexicalization contained in BabelNet, are
selected and, for each of them, the possible meanings are listed according to
the semantic network.
3. A graph-based semantic interpretation of the whole text is produced by
linking the candidate meanings of the selected fragments using the
previouslycomputed semantic signatures. Then a densest subgraph heuristic is used
to extract the most coherent interpretation and nally the fragments are
disambiguated by using a centrality measure within this graph.
      </p>
      <p>
        A detailed description and evaluations of the approach are given in [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Web Interface and RESTful API</title>
      <p>We developed a Web interface and a RESTful API by following the KISS
principle, i.e., \keep it simple, stupid". As can be seen from the screenshot in Figure
1 http://babelnet.org
2 http://www.wikipedia.org
3 http://wikidata.org
4 http://omegawiki.org
5 http://wiktionary.org
1, the Web interface asks for the input text, its language and whether the
partial matching heuristic should be used instead of the exact string matching one.
After clicking on \Babelfy!" the user is presented with the annotated text where
we denote with green circles the concepts and with yellow circles the named
entities. As for the Java RESTful API, users can exploit our approach by writing
less than 10 lines of code. Here we show a complete example:
// get an instance of the Babelfy RESTful API manager
Babelfy bfy = Babelfy.getInstance(AccessType.ONLINE);
// the string to be disambiguated
String inputText = "hello world, I'm a computer scientist";
// the actual disambiguation call
Annotation annotations = bfy.babelfy("", inputText,</p>
      <p>Matching.EXACT, Language.EN);
// printing the result
for(BabelSynsetAnchor annotation : annotations.getAnnotations())
System.out.println(annotation.getAnchorText()+"\t"+
annotation.getBabelSynset().getId()+"\t"+
annotation.getBabelSynset());
4.1</p>
      <p>Documentation for the RESTful API
Annotation babelfy(String key, String inputText,</p>
      <p>Matching candidateSelectionMode, Language language)
The rst parameter is the access key. A random or empty key will grant 100
requests per day (but a less restrictive key can be requested). The second
parameter is a string representing the input text (sentences or whole documents
can be input up to a maximum of 3500 characters). The third parameter is an
enum with two possible values: EXACT or PARTIAL, to enable, respectively,
the exact or partial matching heuristic for the selection of fragment candidates
found in the input text. The fourth parameter is the language of the input text
(among 50 languages denoted with their ISO 639-1 uppercase code).</p>
      <p>Annotation is the object that contains the output of our system. A user
can access the POS-tagged input text with getText() which returns a list of
WordLemmaTag objects with the respective getters. With getAnnotations() a
user will get a list of BabelSynsetAnchor objects, i.e., the actual annotations.
A user can use getAnchorText() to get the disambiguated fragment of text and
with getBabelSynset() get the selected Babel synset. Moreover, if a user wants
to anchor the disambiguated entry to the input text, the start and end indices
of the tagged text can be gotten with getStart() and getEnd().
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we presented and described the typical use of the Web interface and
Java RESTful API of our state-of-the-art system for multilingual Word Sense
Disambiguation and Entity Linking, i.e., Babelfy, available at http://babelfy.org</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The authors gratefully acknowledge the support of the
ERC Starting Grant MultiJEDI No. 259234.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bond</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Foster</surname>
          </string-name>
          , R.:
          <article-title>Linking and extending an open multilingual wordnet</article-title>
          .
          <source>In: Proc. of ACL</source>
          . pp.
          <volume>1352</volume>
          {
          <issue>1362</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ehrmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cecconi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vannella</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mccrae</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          , R.:
          <article-title>Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0</article-title>
          .
          <source>In: Proc. of LREC</source>
          . pp.
          <volume>401</volume>
          {
          <issue>408</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Erbs</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zesch</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Link Discovery: A Comprehensive Analysis</article-title>
          .
          <source>In: Proc. of ICSC</source>
          . pp.
          <volume>83</volume>
          {
          <issue>86</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fellbaum</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>WordNet: An Electronic Lexical Database</article-title>
          . MIT Press (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          :
          <article-title>Collaboratively built semi-structured content and Arti cial Intelligence: The story so far</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>194</volume>
          ,
          <issue>2</issue>
          {
          <fpage>27</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Mitchell, T.M.
          <article-title>: Reading the Web: A Breakthrough Goal for AI</article-title>
          .
          <source>AI Magazine</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Moro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tucci</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passonneau</surname>
          </string-name>
          , R.J.:
          <article-title>Annotating the MASC Corpus with BabelNet</article-title>
          .
          <source>Proc. of</source>
          LREC pp.
          <volume>4214</volume>
          {
          <issue>4219</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Moro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raganato</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          , R.:
          <source>Entity Linking meets Word Sense Disambiguation: A Uni ed Approach. TACL 2</source>
          ,
          <issue>231</issue>
          {
          <fpage>244</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          , R.:
          <article-title>Word sense disambiguation: A survey</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>41</volume>
          (
          <issue>2</issue>
          ),
          <volume>1</volume>
          {
          <fpage>69</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          , R.:
          <article-title>A Quick Tour of Word Sense Disambiguation, Induction and Related Approaches</article-title>
          .
          <source>In: Proc. of SOFSEM</source>
          . pp.
          <volume>115</volume>
          {
          <issue>129</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S.P.:</given-names>
          </string-name>
          <article-title>BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>193</volume>
          ,
          <fpage>217</fpage>
          {
          <fpage>250</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNamee</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Entity linking: Finding extracted entities in a knowledge base</article-title>
          .
          <source>In: Multi-source, Multilingual Information Extraction and Summarization</source>
          , pp.
          <volume>93</volume>
          {
          <issue>115</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>