=Paper= {{Paper |id=Vol-2536/om2019_poster6 |storemode=property |title=Discovering Expressive Rules for Complex Ontology Matching and Data Interlinking |pdfUrl=https://ceur-ws.org/Vol-2536/om2019_poster6.pdf |volume=Vol-2536 |authors=Manuel Atencia,Jérôme David,Jérôme Euzenat,Liliana Ibanescu,Nathalie Pernelle,Fatiha Saïs,Élodie Thiéblin,Cássia Trojahn |dblpUrl=https://dblp.org/rec/conf/semweb/AtenciaDEIPSTT19 }} ==Discovering Expressive Rules for Complex Ontology Matching and Data Interlinking== https://ceur-ws.org/Vol-2536/om2019_poster6.pdf
    Discovering Expressive Rules for Complex Ontology
              Matching and Data Interlinking

    Manuel Atencia1 , Jérôme David1 , Jérôme Euzenat1 , Liliana Ibanescu2 , Nathalie
           Pernelle3 , Fatiha Saı̈s3 , Élodie Thiéblin4 , and Cassia Trojahn4
      1
         Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, F-38000 Grenoble, France
                                    firstname.lastname@inria.fr
     2
        UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, 75005 Paris, France
                               firstname.lastname@agroparistech.fr
    3
       LRI, Paris Sud University, CNRS 8623, Paris Saclay University, Orsay F-91405, France
                                      firstname.lastname@lri.fr
             4
               IRIT, UMR 5505, 1118 Route de Narbonne, F-31062 Toulouse, France
                                     firstname.lastname@irit.fr


1    Introduction

Ontology matching and data interlinking as distinct tasks aim at facilitating the inter-
operability between different knowledge bases. Although the field has fully developed
in the last years, most ontology matching works still focus on generating simple corre-
spondences (e.g., Author ≡ W riter). These correspondences are however insufficient
to fully cover the different types of heterogeneity between knowledge bases and com-
plex correspondences are required (e.g., LRIM ember ≡ Researcheru∃belongsT oLab.
{LRI}). Few approaches have been proposed for generating complex alignments, fo-
cusing on correspondence patterns or exploiting common instances between the on-
tologies. Similarly, unsupervised data interlinking approaches (which do not require
labelled samples) have recently been developed. One approach consists in discover-
ing linking rules on unlabelled data, such as simple keys [2] (e.g., {lastN ame, lab})
or conditional keys [3] (e.g., {lastN ame} under the condition c = Researcher u
∃lab.{LRI}). Results have shown that the more expressive the rules are, the higher
the recall is. However naive approaches cannot be applied on large datasets. Existing
approaches presuppose either that the data conform to the same ontology [2] or that
all possible pairs of properties be examined [1]. Complementary, link keys are a set
of pairs of properties that identify the instances of two classes of two RDF datasets
[1] (e.g., {hcreator, auteuri, htitle, titrei} linkkey hBook, Livrei, expresses that in-
stances of the Book class which have the same values for properties creator and title
as an instance of the Livre class has for auteur and titre are the same). Such, link
keys may be directly extracted without the need for an alignment.


2    Proposed approach

We introduce here an approach that aims at evaluating the impact of complex correspon-
dences in the task of data interlinking established from the application of keys (Figure
1).Given two populated ontologies O1 and O2 , we first apply the CANARD system [4]


Copyright c 2019 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
for establishing complex correspondences (1). Then, the key discovery tools VICKEY
[3] and LinkEx are applied for the discovery of simple keys, conditional keys, and link
keys from the instances of O1 and O2 , exploiting the complex correspondences as input
(as a way of reducing the key search space) (2). The keys are then applied in the data
interlinking task, which can also benefit from the complex correspondences (as a way
of extending the sets of instances to be compared) (3). Finally, as CANARD considers
shared instances, the matching is iterated by considering the detected identity links.




   Fig. 1. Workflow of ontology matching and data interlinking enhanced by key discovery.

    We plan to evaluate the approach to verify, on the one hand, whether the use of com-
plex correspondences allows to improve the results of data interlinking. On the other
hand, thanks to the use of the detected identity links, it would also be reasonable to ex-
pect improvements in ontology matching results. Experiments will be run on DBpedia
and YAGO, covering different domains such as people, organizations, and locations, as
there exists reference entity links or these datasets.
Acknowledgement. This work is supported by the CNRS Blanc project RegleX-LD.


References
1. M. Atencia, J. David, and J. Euzenat. Data interlinking through robust linkkey extraction. In
   ECAI, pages 15–20, 2014.
2. D. Symeonidou, V. Armant, N. Pernelle, and F. Saı̈s. Sakey: Scalable almost key discovery in
   rdf data. In ISWC, pages 33–49, 2014.
3. D. Symeonidou, L. Galárraga, N. Pernelle, F. Saı̈s, and F. M. Suchanek. VICKEY: mining
   conditional keys on knowledge bases. In ISWC, pages 661–677, 2017.
4. É. Thiéblin, O. Haemmerlé, and C. Trojahn. CANARD complex matching system: results of
   the 2018 OAEI evaluation campaign. In OM@ISWC, pages 138–143, 2018.