<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Discovering Expressive Rules for Complex Ontology Matching and Data Interlinking</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Manuel Atencia</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Je´roˆme David</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Je´roˆme Euzenat</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liliana Ibanescu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nathalie Pernelle</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fatiha Sa¨ıs</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E´ lodie Thie´blin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cassia Trojahn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRIT, UMR 5505</institution>
          ,
          <addr-line>1118 Route de Narbonne, F-31062 Toulouse</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LRI, Paris Sud University, CNRS 8623, Paris Saclay University</institution>
          ,
          <addr-line>Orsay F-91405</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>UMR MIA-Paris, AgroParisTech, INRA, Universite ́ Paris-Saclay</institution>
          ,
          <addr-line>75005 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Univ. Grenoble Alpes</institution>
          ,
          <addr-line>Inria, CNRS, Grenoble INP, LIG, F-38000 Grenoble</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Ontology matching and data interlinking as distinct tasks aim at facilitating the
interoperability between different knowledge bases. Although the field has fully developed
in the last years, most ontology matching works still focus on generating simple
correspondences (e.g., Author W riter). These correspondences are however insufficient
to fully cover the different types of heterogeneity between knowledge bases and
complex correspondences are required (e.g., LRIM ember Researcheru9belongsT oLab:
fLRIg). Few approaches have been proposed for generating complex alignments,
focusing on correspondence patterns or exploiting common instances between the
ontologies. Similarly, unsupervised data interlinking approaches (which do not require
labelled samples) have recently been developed. One approach consists in
discovering linking rules on unlabelled data, such as simple keys [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] (e.g., flastN ame; labg)
or conditional keys [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (e.g., flastN ameg under the condition c = Researcher u
9lab:fLRIg). Results have shown that the more expressive the rules are, the higher
the recall is. However naive approaches cannot be applied on large datasets. Existing
approaches presuppose either that the data conform to the same ontology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or that
all possible pairs of properties be examined [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Complementary, link keys are a set
of pairs of properties that identify the instances of two classes of two RDF datasets
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (e.g., fhcreator; auteuri; htitle; titreig linkkey hBook; Livrei, expresses that
instances of the Book class which have the same values for properties creator and title
as an instance of the Livre class has for auteur and titre are the same). Such, link
keys may be directly extracted without the need for an alignment.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Proposed approach</title>
      <p>
        We introduce here an approach that aims at evaluating the impact of complex
correspondences in the task of data interlinking established from the application of keys (Figure
1).Given two populated ontologies O1 and O2, we first apply the CANARD system [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
for establishing complex correspondences (1). Then, the key discovery tools VICKEY
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and LinkEx are applied for the discovery of simple keys, conditional keys, and link
keys from the instances of O1 and O2, exploiting the complex correspondences as input
(as a way of reducing the key search space) (2). The keys are then applied in the data
interlinking task, which can also benefit from the complex correspondences (as a way
of extending the sets of instances to be compared) (3). Finally, as CANARD considers
shared instances, the matching is iterated by considering the detected identity links.
      </p>
      <p>We plan to evaluate the approach to verify, on the one hand, whether the use of
complex correspondences allows to improve the results of data interlinking. On the other
hand, thanks to the use of the detected identity links, it would also be reasonable to
expect improvements in ontology matching results. Experiments will be run on DBpedia
and YAGO, covering different domains such as people, organizations, and locations, as
there exists reference entity links or these datasets.</p>
      <p>Acknowledgement. This work is supported by the CNRS Blanc project RegleX-LD.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Atencia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>David</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          .
          <article-title>Data interlinking through robust linkkey extraction</article-title>
          .
          <source>In ECAI</source>
          , pages
          <fpage>15</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Symeonidou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Armant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pernelle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Sa</surname>
          </string-name>
          <article-title>¨ıs. Sakey: Scalable almost key discovery in rdf data</article-title>
          .
          <source>In ISWC</source>
          , pages
          <fpage>33</fpage>
          -
          <lpage>49</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Symeonidou</surname>
          </string-name>
          , L. Gala´rraga,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pernelle</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Sa¨ıs, and</article-title>
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          .
          <article-title>VICKEY: mining conditional keys on knowledge bases</article-title>
          .
          <source>In ISWC</source>
          , pages
          <fpage>661</fpage>
          -
          <lpage>677</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>E</given-names>
            <surname>´</surname>
          </string-name>
          . Thie´blin, O. Haemmerle´, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Trojahn</surname>
          </string-name>
          .
          <article-title>CANARD complex matching system: results of the 2018 OAEI evaluation campaign</article-title>
          .
          <source>In OM@ISWC</source>
          , pages
          <fpage>138</fpage>
          -
          <lpage>143</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>