<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generating Referring Expressions from Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Armita Khajeh Nassiri</string-name>
          <email>armita.khajeh-nassiri@polytechnique.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nathalie Pernelle</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fatiha Sa¨ıs</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>E ́cole Polytechnique</institution>
          ,
          <addr-line>Palaiseau F-91128</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LRI, Paris Sud University, CNRS 8623, Paris Saclay University</institution>
          ,
          <addr-line>Orsay F-91405</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A referring expression (RE) is a description in natural language or a logical formula that can uniquely identify an entity. For instance, the 44th president of the United States unambiguously characterizes Barack Obama. Referring expressions find applications in disambiguation, data anonymization, query answering, or data linking. There may potentially exist many logical expressions for uniquely identifying an entity. Generation of referring expressions is a well-studied task in natural language generation [1]. Hence, various algorithms with different objectives have been proposed to automatically discover REs. These approaches vary depending on the expressivity of the logical formulas they can generate. For instance in [1, 2], REs that are created are conjunctions of atoms. While in [3], more complex REs represented in description logics are discovered that can involve the universal quantifier. In this work our focus lies on automatically discovering REs for each entity within a class of a knowledge graph. Keys of a class are sets of properties whose values can uniquely identify one entity of that class. Hence, if the properties for the keys are instantiated, they can each be considered as a referring expression. What interests us in this work, is to efficiently discover REs by focusing on the ones that cannot be found by instantiating the keys. It should be noted that the quality of REs we discover is very dependent on the dataset. The completeness, correctness and lack of noise in the knowledge graph plays a pivotal role in how good and interpretable REs are.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In this work, we discover minimal REs existing in a class. By minimality, we mean
that there is no other RE that we discover and that can be logically entailed by the
minimal one. The REs we mine always consist of conjunctions that specify the classes
the entities belong to.</p>
      <p>
        To generate REs for a given class C, we start by creating the maximal non-keys
of C (the set of properties such that addition of a property will make it a key for that
class) using SAKey [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The algorithm first generates candidate expressions containing
one instantiated property (i.e. p(x; v)). Whenever an expression E only describes one
instance i of C, E is output as a referring expression. Adding more properties to the
      </p>
      <p>Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>Khajeh Nassiri et al.
description E will still uniquely identify the instance i, just making it more complex.
Hence, we remove the REs (e.g. p(i; v)) found at the end of this step and reduce the
search space. Then, the remaining candidate expressions are taken into account with
one more property at each step, until either the search space is empty or there is no more
set of non-keys to consider. To increase the depth of subgraph, we have to consider the
class of the new individual and obtain its corresponding set of maximal non-keys so
that the process can be reiterated. Some pruning techniques can be applied to limit the
size and the complexity of the REs discovered by our approach. For instance the depth
of the graph pattern and number of allowed variables can be limited.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Experimental Evaluation</title>
      <p>
        We chose YAGO as the knowledge graph on which we discover the REs and used 10
different classes such as Actor, City and Book (same data used in VICKEY [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). We
mined REs of depth one and for example, for the class City (with 1.1M triples) we
found 1.2M REs in less than 2 minutes. On average, our approach can detect from 1.5
to 14.3 RE per individual depending on the class.
      </p>
      <p>This approach can discover RE such as: made in heaven is the album created by Queen
in the year 1991. Among the actors, only George Clooney has been born in
LexingtonKentucky in the year 1961. When we ran the algorithm with depth 2, we obtained REs
like Alfred Werner is a scientist who has won the Nobel Prize in Chemistry and has
graduated from a university located in Zurich.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion References</title>
      <p>In this paper, we proposed an approach that can efficiently discover REs by reducing
the search space thanks to maximal non-keys. Due to the incompleteness of knowledge
graphs, entity linking using keys may be insufficient to link all individuals. We expect
that using REs will increase the recall of rule-based data linking methods.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Dale</surname>
          </string-name>
          .
          <article-title>Generating referring expressions - constructing descriptions in a domain of objects and processes. ACL-MIT press series in natural language processing</article-title>
          . MIT Press,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>E.</given-names>
            <surname>Krahmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. v.</given-names>
            <surname>Erk</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Verleg</surname>
          </string-name>
          .
          <article-title>Graph-based generation of referring expressions</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>29</volume>
          (
          <issue>1</issue>
          ):
          <fpage>53</fpage>
          -
          <lpage>72</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>Towards soundness preserving approximation for abox reasoning of OWL2</article-title>
          .
          <source>In Proceedings of the 23rd International Workshop on Description Logics (DL</source>
          <year>2010</year>
          ), Waterloo, Ontario, Canada, May 4-
          <issue>7</issue>
          ,
          <year>2010</year>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Symeonidou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Armant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pernelle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Sa</surname>
          </string-name>
          <article-title>¨ıs. SAKey: Scalable Almost Key Discovery in RDF Data</article-title>
          . In S. Verlag, editor,
          <source>In proceedings of the 13th International Semantic Web Conference, ISWC 2014, volume Lecture Notes in Computer Science</source>
          , pages
          <fpage>33</fpage>
          -
          <lpage>49</lpage>
          ,
          <source>Riva del Garda</source>
          , Italy, Oct.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Symeonidou</surname>
          </string-name>
          , L. Gala`rraga,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pernelle</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Sa¨ıs, and</article-title>
          <string-name>
            <given-names>F.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          . VICKEY:
          <article-title>Mining Conditional Keys on Knowledge Bases</article-title>
          .
          <source>In International Semantic Web Conference (ISWC)</source>
          , volume
          <volume>10587</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>661</fpage>
          -
          <lpage>677</lpage>
          , Austria, Oct 2017. Springer.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>