<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Can Predicate Lexicalizations Help in Named Entity Disambiguation?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Position Paper</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Heiko Paulheim</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Mannheim, Germany Data and Web Science Group</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>and Christina Unger</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Most named entity disambiguation approaches use various resources, such as surface form catalogues and relations of entities in the target knowledge base. In contrast, predicates that describe relations between the entity mentions in text are only scarcely exploited. In this position paper, we argue how predicates, i.e., surface forms for relations in the target knowledge base, can potentially help to improve named entity disambiguation results.</p>
      </abstract>
      <kwd-group>
        <kwd>Named Entity Disambiguation</kwd>
        <kwd>Ontology Lexicon</kwd>
        <kwd>Knowledge Base Lexicalization</kwd>
        <kwd>DBpedia</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The identification of entities in text usually comprises two steps. First, mentions of
entities are recognized, which often involves a big amount of ambiguity. For example,
the expression Heidi could refer to the model Heidi Klum, the Swiss children book,
and so on. Therefore, the recognized mentions need to be disambiguated. This second
step is often called named entity disambiguation (NED) or entity linking, as it involves
linking mentions to unique identifiers in a knowledge base. For example, the entity
mention Heidi in a sentence such as Heidi and her husband Seal live in Vegas would be
linked to the DBpedia [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] resource dbr:Heidi Klum, while the same mention in a
sentence such as Heidi was written by Swiss author Johanna Spyri would be linked to
the DBpedia resource dbr:Heidi, representing the children book.
      </p>
      <p>Named entity disambiguation often uses dictionaries which collect textual surface
forms of entities, e.g. mapping the forms New York, NCY, and Big Apple to the
DBpedia entity dbr:New York City. In many cases, also co-occurences and relations
between entities are taken into account for disambiguation. For example, in the sentence
Cairo was the code name for a project at Microsoft from 1991 to 1996, the co-occurence
of Cairo with Microsoft allows to link it to the operating system instead of the Egypt
city. However, co-occuring entities are not always sufficient for disambiguation. For
example, in the sentence While Apple is an electronics company, Mango is a clothing one,
the co-occurence of Apple and Mango does not provide enough context to distinguish
between companies and fruits.</p>
      <p>
        To the best of our knowledge, NED approaches usually do not exploit predicates
occuring in texts along with entities, such as husband or written by in the Heidi
example, or company in the Apple and Mango example. In this paper, we argue that such
predicates are actually a helpful source of knowledge to improve NED, especially when
little other context is available for disambiguation. We demonstrate this using examples
from the KORE50 benchmark [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], exploiting property lexicalizations. For example, for
the property spouse, typical lexicalizations are married to, husband of, and wife of.
      </p>
      <p>Such lexicalizations can help named entity disambiguation in two respects. First,
properties in knowledge bases such as DBpedia often specify domain and range
information in their ontologies, i.e., valid classes of entities that can appear in the subject
and object position of a statement using that property. This domain and range
information can be used to discard NED candidates that are inconsistent with the ontology. For
example, consider the following KORE50 sentence:</p>
      <p>David and Victoria named their children Brooklyn, Romeo, Cruz, and Harper
Seven.</p>
      <p>Here, Brooklyn is easily confused with the New York City borough Brooklyn by NED
tools. However, taking the predicate children into account, which is a lexicalization of
the property child, we can discard this misleading option because the domain and
range of child are persons, while Brooklyn is a place.</p>
      <p>The second possible use of property lexicalizations is that we can explicitly search
for relations between entities in the knowledge base. For example, in the above case,
we would already have learned that the mentioned entities (Brooklyn, Romeo, etc.) are
persons. Given that we already correctly disambiguated one of the entities, we can use
this information to search for entities that stand in the child relation to it. For example,
if we already linked David to David Beckham, we can use the DBpedia triple
dbr:David Beckham dbo:child dbr:Brooklyn Beckham .
to link Brooklyn to dbr:Brooklyn Beckham, instead of any other person.</p>
      <p>In order to exploit such information, lexicalizations of properties are required. One
such collection is DBlexipedia.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Predicate Lexicalizations in DBlexipedia</title>
      <p>
        DBlexipedia [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is an ontology lexicon that connects properties in the DBpedia ontology
to common surface forms that express them in a particular natural language, together
with linguistic information about their morpho-syntactic properties.
      </p>
      <p>
        The lexicon published on http://dblexipedia.org is the result of applying
the automatic ontology lexicon induction method M-ATOLL [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], which creates
ontology lexica in lemon [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] format as follows. It takes as input an ontology and dataset
(here, DBpedia) and a dependency parsed text corpus in the target language (here,
English Wikipedia). As first step, M-ATOLL retrieves all triples for a given property
from the dataset. For example, the results for the property spouse include the triple
&lt;Lulu,spouse,Maurice Gibb&gt;. Then, it retrieves all sentences from the parsed
text corpus that contain mentions of the subject and object of the extracted triples, e.g.
In 1969 the singer Lulu married Maurice Gibb. It searches for predefined patterns in
those sentences, in order to extract candidate lexicalizations of the property, such as to
marry.
      </p>
      <p>So far, M-ATOLL covers entries that describe transitive verbs (e.g. to cross),
intransitive verbs with a prepositional object (e.g. to live in), relational nouns with
prepositional object (e.g. capital of ), and relational adjectives (e.g. similar to).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Preliminary Experiment</title>
      <p>
        To analyze the potential value of property lexicalizations for the NED task, we analyzed
the 50 sentences of the KORE50 corpus. We processed each of those sentences using
DBpedia Spotlight [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in the standard settings. Out of the 50 cases, DBpedia Spotlight
performed a wrong disambiguation for at least one entity in 37 cases.
      </p>
      <p>Next, we looked at the errors made, and analyzed whether the error could potentially
be solved by using information on predicates occuring in the sentence. To that end, we
looked up the predicate in DBlexipedia. If we found it as lexicalization of a DBpedia
property, we marked the error as potentially solveable if
– a wrongly disambiguated entity had a type which was inconsistent with the
respective property’s domain or range, or
– a wrongly disambiguated entity had a direct connection through the found property
to the correct entity.</p>
      <p>For example, In the following KORE50 sentence, DBpedia Spotlight correctly links
Angelina to Angelina Jolie, but fails to link Jon and Brad to the correct entities.</p>
      <p>Angelina, her father Jon, and her partner Brad never played together in the
same movie.</p>
      <p>The predicate father, however, can be found in DBlexipedia as lexialization of the
property child, which links Angeline Jolie to Jon Voight, the correct linking for
Jon.</p>
      <p>Similarly, in the following sentence, both Hurricane and Desire are incorrectly
linked by DBpedia Spotlight.</p>
      <p>Dylan performed Hurricane about the black fighter Carter, from his album
Desire.</p>
      <p>Here, the predicate perform is found as lexicalization of the property musicalArtist
with domain Single and range MusicalArtist, which helps disambiguating
Hurricane to the Bob Dylan single. Furthermore, album is found as lexicalization of the
property artist, which relates Bob Dylan with his album Desire.</p>
      <p>Also the mention John in the following sentence is incorrectly linked.</p>
      <p>Pixar produced Cars, and John directed it.
But it could be correclty linked using the lexical knowledge from DBlexipedia that
direct expresses the property director, and the factual knowledge from DBpedia
that the correctly identified movie Cars stands in the director relation to the entity
John Lasseter.</p>
      <p>In total, in 17 out of the 37 cases where DBpedia Spotlight performed a wrong
disambiguation, the error could have been identified with either one of the two strategies.1
These 17 cases are distributed quite equally across domains:
– CEL (Celebrities): 4/8
– MUS (Music): 4/8
– BUS (Business): 1/8
– SPO (Sports): 4/7
– POL (Politics): 4/6</p>
      <p>In addition, we can identify cases where the proposed approach cannot help. First,
it can happen that a predicate is not contained in DBlexipedia. For example, neither
drop out nor join are listed as lexicalizations of any property, so they cannot be used for
disambiguating Steve in Steve dropped out of Stanford to join Microsoft.</p>
      <p>Second, it can happen that a lexicalization is found but either does not point to the
correct property, or the corresponding triple in DBpedia is missing. For example, in the
phrase Steve, the former CEO of Apple, DBlexipedia does list CEO of as lexicalization
of the property keyPerson, but in DBpedia Steve Jobs is related to Apple Inc.
by means of board and occupation.</p>
      <p>Third, there are sentences without an explicit predicate between entity mentions, as
the following one:</p>
      <p>Steve, Bill, Sergey, and Larry have drawn a great deal of admiration these days
for their pioneering successes that changed the world we live in.</p>
      <p>Analogously, there are sentences that contain predicates but the expressed relation is
not modelled in DBpedia. For example, the sentence Mu¨ller scored a hattrick against
England contains the predicate score against, which does not correspond to any
property in DBpedia. Similar cases affect predicates that are modeled through more complex
constructs, such as property paths or reifications.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>This preliminary experiment shows that predicates, i.e. natural language lexicalizations
of properties in the knowledge base, are a valuable source of knowledge when trying
to improve the results of NED in cases where only little context is available for
disambiguation. Although a formal evaluation on an actual implementation is still missing,
the findings from the experiments are quite promising.</p>
      <p>1 However, there may be more than one error in the sentences, and in some cases, we would not be able to address all of
those. Hence, this should not be misread as “half of the errors can be identified.”</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Hoffart</surname>
          </string-name>
          , Stephan Seufert, Dat Ba Nguyen,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Theobald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Kore: keyphrase overlap relatedness for entity disambiguation</article-title>
          .
          <source>In Proceedings of the 21st ACM international conference on Information and knowledge management</source>
          , pages
          <fpage>545</fpage>
          -
          <lpage>554</lpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Jens</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas,
          <string-name>
            <given-names>Pablo N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          , Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef,
          <article-title>So¨ren Auer, and Christian Bizer. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>John</surname>
            <given-names>McCrae</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Dennis</given-names>
            <surname>Spohr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Philipp</given-names>
            <surname>Cimiano</surname>
          </string-name>
          .
          <article-title>Linking lexical resources and ontologies on the semantic web with lemon</article-title>
          .
          <source>In The Semantic Web: Research and Applications</source>
          , pages
          <fpage>245</fpage>
          -
          <lpage>259</lpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Pablo</surname>
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Mendes</surname>
          </string-name>
          , Max Jakob,
          <article-title>Andre´s Garc´ıa-</article-title>
          <string-name>
            <surname>Silva</surname>
            , and
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Dbpedia spotlight: shedding light on the web of documents</article-title>
          .
          <source>In Proceedings of the 7th International Conference on Semantic Systems</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Walter</surname>
          </string-name>
          , Christina Unger, and
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Cimiano</surname>
          </string-name>
          .
          <article-title>ATOLL - A framework for the automatic induction of ontology lexica</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          ,
          <volume>94</volume>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>B</given-names>
          </string-name>
          (
          <volume>0</volume>
          ):
          <fpage>148</fpage>
          -
          <lpage>162</lpage>
          ,
          <year>2014</year>
          . Special Issue following the 18th
          <source>International Conference on Applications of Natural Language Processing to Information Systems (NLDB'13).</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Walter</surname>
          </string-name>
          , Christina Unger, and
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Cimiano. M-ATOLL</surname>
          </string-name>
          :
          <article-title>A framework for the lexicalization of ontologies in multiple languages</article-title>
          .
          <source>In The Semantic Web - ISWC</source>
          <year>2014</year>
          , volume
          <volume>8796</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>472</fpage>
          -
          <lpage>486</lpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Walter</surname>
          </string-name>
          , Christina Unger, and
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Cimiano</surname>
          </string-name>
          .
          <article-title>Dblexipedia: A nucleus for a multilingual lexical semantic web</article-title>
          .
          <source>In 3rd International Workshop on NLP&amp;DBpedia</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>