-

EXONA Results for OAEI 2015

Syrine Damak

damaksyrine@gmail.com 0

Hazem Souid

swdhazem@gmail.com 0

Marouen Kachroudi

marouen.kachroudi@fst.rnu.tn 0

Sami Zghal

sami_zghal@planet.tn 0 0 UniversitØ de Tunis El Manar, FacultØ des Sciences de Tunis, LIPAH Programmation Algorithmique et Heuristique , 2092, Tunis, Tunisie

This paper presents the results of EXONA in the Ontology Alignment Evaluation Initiative (OAEI) 2015. EXONA is an automatic instance-based ontology alignment systems in which we parse ontology as rst step. In the second step, we index instances of the rst ontology. These indexed instances will be applied for the querying phase. In the last step, our system aligns instances based by aggregating score of dierent terminological matchers. We rst describe the overall framework of our matching System (EXONA) then we detail the techniques used in the framework for instance matching. Last, we give a thorough analysis on our results and discuss some future work on our system. It's our rst participation in the OAEI instance matching, the results are good in terms of recall, precision and F-measure.

have to be indexed. Correspondence consists in querying the index already built. This request contains instances of the target knowledge base non indexed.

In order to solve the problem of large knowledge bases, we propose an index by concept to minimise the area and the time expended on searching instances behind the request technique. 1.1

State, purpose, general statement This section describes the overall framework of EXONA . Our system includes three modules, i.e., transformation, indexation and correspondence .

The system proposed operates in three successive modules, each of those is branched into two phases. The system begins with transforming the knowledge bases into two independent graphs, those graphs formed by OWL nodes. After having constructed these graphs, it’s time instance objects be created. The construction of instance object appealed the neighbourhood technical in which neighbourhood spread by similarity is done. This technique aims to enrich instance object by neighbouring instances with which a high similarity exists. We proceed after that by a terminological normalization of instances. This normalisation is compulsory for the indexation of instances as well as for the similarity calculation. We index after that instances of the source data knowledge. It ’s not a blind indexation. In fact it is done by concept. This is in order to provide optimal search elds oriented concept. After index creation, it’s time to query this index. The request emitted contains instances from the target knowledge base. Those instances have to be normalized before being passed through the request. After querying the index, a candidate set is returned. Each candidate pair is accompanied by ˆ score indicating its rank behind the rest of pair. To identify the pair of instance that have to be aligned, we lter this set of candidate set by saving only the two best pairs. As a verication process, we calculate terminological similarity. This latter want to be combined by the score given on requesting phase. Those similarities are aggregated then to identify the pair of instances to be aligned having the higher similarity score. 1.2

Specic techniques used The process of EXONA system consists in the following three successive modules, namely : transformation, indexation and correspondence.

1. Transformation module

This module is branched into two phases, namely : graph construction and instance creation.

Graph construction : As input, our system receive two OWL les. Those les are transformed into two independent OWL graphs. Graphs are more adequate representation ensuring highlighting of information. Instance creation : Instance object is an object formed by an identier and a content. It is identify by an URI; its content is formed by a set of information which makes it an autonomous entity. It contains the list of neighbouring instances spread by similarity. Moreover, it contains the RDF triplet of this one. To calculate similarity, we have used Edit distance. 2. Indexation module

This module is formed by two phases, namely : pretreatment and instance creation.

pretreatment : In this phase we remove special symbols like "´¿.*-", etc. and stop words like "the", etc. standardization of case, etc. This pretreatment serves then in the requesting phase.

Indexation : indexation aims to index instances of the source data knowledge. Each document is idented by the URI of the instance and contains the content of instance formed only by its data property and the data property of the set of neighbouring instances spread by similarity. 3. Correspondence module

This module is divided into two phases, namely : querying and Filtering and matches identication.

Querying : querying phase has as input instances of the target knowledge base. Those instances have to be pretreated before the process of research on the index. This phase generates as an output a list of candidates accompanied by their score.

Filtering and matches identication : During this phase, the system takes the two candidates having the highest score. Then, it calculates terminological similarity of this pair of candidate. EXONA system identies the pair of instances to be aligned by aggregating similarities with the one given by the search process. The pair of instance to be aligned is the one having the best score. 1.3

Adaptations made for the evaluation We have changed the version of Lucene from 4.10.2 to 3.6. In fact, our rst one needs specic adaptations to be accepted by 2015 evaluation campaign. 1.4

Link to the set of provided alignments (in align format) http://www.mediare.com/download/b3vx3zio02br45y/ EXON AOAEI2015.zip 2

Results

The instance matching 2015 track contains two subtasks. Each task is articulated in two tests with dierent scales (i.e., number of instances to match): i) Sandbox (small scale). It contains two datasets called source and target as well as the set of expected mappings (i.e., reference alignment). ii) Mainbox (medium scale).

Author Disambiguation Task The goal of the Author Disambiguation Task is to link OWL instances referring to the same person (i.e., author) based on their publications. This task is done with the two datasets previously invoked.

Sandbox task The Sandbox test aims to evaluate behaviour of our system with ˆ small scales. Table 1 below presents the results obtained by running EXONA on the instance matching track of OAEI campaign 2015 done with the Sandbox task. Mainbox task This task is also done with Mainbox task. The gaol of this test is to evaluate the behaviour of our system in large scale. Table 2 below presents the results obtained by running EXONA on the instance matching track of OAEI campaign 2015 done with the Mainbox task. The goal of Author Recognition Task is to associate a person (i.e., author) with the corresponding publication report containing aggregated information about the publication activity of the person, such as number of publications, h-index, years of activity, number of citations. This task is done also with the two datasets previously invoked. Sandbox task The Sandbox test aims to evaluate behaviour of our system with ˆ small scales. Table 3 below presents the results obtained by running EXONA on the instance matching track of OAEI campaign 2015 done with the Sandbox task. Mainbox task This task is also done with Mainbox task. The goal of this test is to evaluate the behaviour of our system in large scale. Table 4 below presents the results obtained by running EXONA on the instance matching track of OAEI campaign 2015 done with the Mainbox task.

Precision Recall F-measure EXONA 0.409 0.409 0.409 InsMT+ 0.246 0.028 0.05 Lily 0.999 0.998 0.999 LogMap 0.999 1.0 0.999

RiMOM 0.999 0.999 0.999

Table 4. Results Author Recognition Task for Mainbox task 3

Conclusion

Exona participation in OAEI 2015 is encouraging although its participation is restricted to a few sub-cases, as it supplies good metric values in the two considered cases. Results reects some strengths and some good aspects that need to be improved.

1. Euzenat , J. , Shvaiko , P. : Ontology Matching (Second Edition). SpringerVerlag , Heidelberg (DE) ( 2013 )