<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OAEI 2017 results of KEPLER</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marouen KACHROUDI</string-name>
          <email>marouen.kachroudi@fst.rnu.tn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gayo DIALLO</string-name>
          <email>gayo.diallo@u-bordeaux.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sadok BEN YAHIA</string-name>
          <email>sadok.benyahia@fst.rnu.tn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>BPH Center - INSERM U1219, Team ERIAS &amp; LaBRI UMR5800</institution>
          ,
          <addr-line>Univ. Bordeaux</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université de Tunis El Manar, Faculté des Sciences de Tunis Informatique Programmation Algorithmique et Heuristique LIPAH-LR 1 ES14</institution>
          ,
          <addr-line>2092, Tunis, Tunisie</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents and discusses the results produced by KEPLER for the 2017 Ontology Alignment Evaluation Initiative (OAEI 2017). This method is based on the exploitation of three different strategy levels. The proposed alignment method KEPLER is enhanced by the integration of powerful treatments inherited from other related domains, such as Information Retrieval (IR) [1]. For scaling, the method is equipped with a partitioning module. For the management of multilingualism, KEPLER develops a well-defined strategy based on the use of a translator, and this provides very encouraging results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>State, purpose, general statement</title>
      <p>The proposed method, KEPLER, exploits besides the classic techniques, an external
resource, i.e., a translator to deal with multilingualism. KEPLER implements an alignment
strategy which aims at exploiting all the wealth of the used ontologies.
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Specific techniques used</title>
      <p>The main idea of KEPLER is to exploit the expressiveness of the OWL language to
detect and compute the similarity between entities of two given ontologies through 6
complementary modules as presented in figure 1.</p>
      <p>Entities are described using OWL primitives with their semantics. We can then
consider ontology as a semantic graph where entities are nodes connected by links which
are OWL primitives. These links have specified semantic primitives. Consequently, if
two ontologies in the same domain are similar, their semantic graphs are also the same.</p>
      <p>Parsing and pretreatment This module allows to extract the ontological entities
initially represented by a primary form of lists. In other words, at the parsing stage,
we seek primarily to transform an OWL ontology in a well defined structure that
preserves and highlight all the information contained in this ontology. Furthermore, in the
resulting informative format, it has a considerable impact on the results of the similarity
computation thereafter. Thus, we get couples formed by the entity name and its
associated labels.</p>
      <p>
        Partitioning This module aims at splitting ontologies into smaller parts to support
the alignment task [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Consequently, partitioning a set B(C) is to find subsets B1, B2,...,
Bn, encompassing semantically close elements bound by a relevant set of relationships,
i.e., O = ∪fB1; B2; :::; Bng, where Bi is an ontological block, and n is the resulting
number of extracted blocks. Hence, we can define an ontological portion as a reduced
ontology that could be extracted from another larger one by splitting up the latter
according to its both constituants : structures and semantics. One way to obtain such a
partitioning, can be to maximize the relationships inside a block and minimize the
relationship between the blocks themselves. The partitioning quality result can be evaluated
using different criteria:
– The size of the generated blocks: that must have a reasonable size, i.e., a number of
elements that can be handled by an alignment tool;
– The number of the generated blocks: this number should be as small as possible to
limit the number of block pairs to be aligned;
– The compactness degree of a block: a block is said to be substantially compact if
relations (lexical and structural ones) are stronger inside the block and low outside.
      </p>
      <p>
        Translation : An originality of our system, is to solve the heterogeneity problem
mainly due to multilingualism, given the importance of this research area [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. This
challenge brings us to choose between two alternatives, either we consider the
translation path to one of the languages according to the two input ontologies, or we consider
the translation path to a chosen pivot language. At this stage, we must have a
foreseeable vision for the rest of our approach. Specifically, at the semantic alignment stage we
use an external resource, i.e., WordNet3. The latter is a lexical database for the English
language. Therefore, the choice is governed by the use of WordNet, and we will prepare
a translation of the two ontologies to the pivot language, which is English. To perform
the translation phase we chose Bing Microsoft4 tool.
      </p>
      <p>
        Indexation : Indexing is one of the novelties of our approach. It consists in reducing
the search space through the use of effective search strategy on the built indexes which
represent the input ontologies components. To enable faster searching, the driving idea
that was previously used in some works [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is to execute the analysis in advance and
store it in an optimized format for the search.
      </p>
      <p>Candidate Mappings Identification : The role of this module is to find the entities
in common between the indexes. Once the indexes are set up, the querying step of the
latter is activated. Thus, the query implementation satisfies the terminology search and
semantic aspects at once as we are querying documents in a vector representation that
contain a given ontological entity and its synonyms obtained via WordNet. It is worthy
to mention that indexes querying is done in both senses.</p>
      <p>Filtering and Recovery : The filtering module consists of two complementary
submodules, each one is responsible of a specific task in order to refine the set of primarily
aligned candidates. At this stage, once the list of candidates is ready, the alignment
method uses the first filter. We should note that indexes querying may includes a set of
redundant mappings. Doing so, this filter eliminates the redundancy. It goes through the
list of candidates and for each candidate, it checks if there are duplicates. If this is the
case, it removes the redundant element(s). At the end of filtering phase, we have a
candidates list without redundancy, however, there is always the concern of false positives,
3 https://wordnet.princeton.edu/
4 https://www.bing.com/translator
in fact, there was the need to establish a second filter. Once the redundant candidates
are deleted, the system uses the second filter that eliminates false positives. This filter is
applied to what we call partially redundant entities. An entity is considered as partially
redundant if it belongs to two different mappings (i.e., being given three ontological
entities e1, e2 and e3. If on the one hand, e1 is aligned to e2, and secondly, e1 is aligned
to e3, this last alignment is qualified as doubtful. We note that our method generates
(1 : 1) alignments. To overcome this challenge, the alignment method compares the
topology of the two suspicious entities (e3 neighbors with e1 neighbors, e2 neighbors
with e1 neighbors ) with respect to the redundant entity e1, and retains the couple
having the highest topological proximity value. All candidates are subject of this filter, and
as output we have the final alignment file.</p>
      <p>Alignment Generation : The result of the alignment process provides a set of
mappings, which are serialized in the RDF format.
2</p>
      <p>Results
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>Anatomy</title>
      <p>
        In this section, we present the results obtained by KEPLER in the OAEI 2017.
This track consists of two real world ontologies to be matched, the source ontology
describing the Adult Mouse Anatomy (with 2744 classes) and the target ontology is the
NCI Thesaurus describing the Human Anatomy (with 3304 classes). For this track, and
according to figure 2, KEPLER succeeded to extract 74% of correct mappings with a
precision about 95%. Figure 2 summarizes the evaluation metrics values for Anatomy
track. To this end, it is important to mention that KEPLER has managed to support the
ontologies of the Anatomy database thanks to the Ontopart module [
        <xref ref-type="bibr" rid="ref4 ref7">7, 4</xref>
        ].
The conference track consists of 15 ontologies from the conference organization
domain and each ontology must be matched against every other ontologies. The dataset
describes the domain of organizing conferences from different perspectives. Precision
values varies between 76% and 58%. Recall values varies between 48% and 68%. The
metrics are obtained according to several evaluation scenarios.
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Multifarm</title>
      <p>
        This dataset is composed of a subset of the Conference track, translated in nine
different languages (i.e., Chinese, Czech, Dutch, French, German, Portuguese, Russian,
Spanish and Arabic). With a special focus on multilingualism, it is possible to evaluate
and compare the performance of alignment approaches through these test cases. Based
on several previous contributions [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref8 ref9">8–13</xref>
        ], the designed main goal of the MultiFarm track
is to evaluate the ability of the alignment systems to deal with multilingual ontologies.
It serves the purpose of evaluating the strength and weakness of a given system across
languages.
      </p>
      <p>KEPLER uses a specific technique to determine the equivalence between ontology
entities described in different natural languages. We chose to use the English as a pivot
language. The use of a pivot language ensures greater consistency of obtained
translations since it starts from the same text. In the different ontologies case, the method is
ranked fourth with a recall value of 0:31 as depicted by figure 3.</p>
      <p>Whereas in the same ontologies case, the method occupies the first place with a
recall value of 0:52 as flagged by figure 4.
In the scalability register, this track consists of finding alignments between the
Foundational Model of Anatomy (FMA), SNOMED CT, and the National Cancer Institute
Thesaurus (NCI). These ontologies are semantically rich and contain tens of thousands
of classes. The Large BioMed Track consists of three matching problems, i.e., (1)
FMANCI matching problem, (2) FMA-SNOMED matching problem and (3) SNOMED-NCI
matching problem. KEPLER handles large ontologies in two phases: the first phase
consists on partitioning the ontologies into a set of blocks and the second phase selects
two suitable blocks giving the highest value of similarity to be aligned. KEPLER treated
(Task 1: FMA-NCI small fragments)[Precision : 0.96 / Recall : 0.83]
according to figure 5.</p>
      <p>
        As depicted by figure 6, KEPLER processed also the task 3 of the LargeBio dataset
(FMA-SNOMED small fragments) with a Precision of 0:82 and Recall of 0:55. In the
Phenotype track, our method succeeds in processing only the DOID-ORDO sub-case
by identifying 1824 matches for 1237 expected ones.
In this paper, we briefly presented the alignment system KEPLER with comments of
the results obtained according to the OAEI 2017 tracks, corresponding to the SEALS
platform evaluation modalities. Several observations regarding these results were
highlighted, in particular the impact of the elimination of any ontological resource on the
similarity values. KEPLER is an ongoing work which borrows its idea from two
previous systems, CLONA [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and SERVOMAP [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It showed promising results for its first
participation. As future work, we plan to consolidate our system to more support the
instance based ontology alignment in a wider range and context. We have dealt with
this issue before [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ], but the test base update imposes other challenges, in terms of
the used ontological languages and the evolutive semantic description formalisms.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Diallo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>An effective method of large scale ontology matching</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          <volume>5</volume>
          (
          <issue>44</issue>
          ) doi:10.1186/2041-1480-5-
          <fpage>44</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Designing the web for an open society</article-title>
          .
          <source>In: Proceedings of the 20th International Conference on World Wide Web (WWW2011)</source>
          , Hyderabad, India (
          <year>2011</year>
          )
          <fpage>3</fpage>
          -
          <lpage>4</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varde</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nayak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senellart</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>The hidden web, xml and semantic web: A scientific data management perspective</article-title>
          .
          <source>Computing Research Repository</source>
          (
          <year>2011</year>
          )
          <fpage>534</fpage>
          -
          <lpage>537</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben Yahia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Ontopart: at the cross-roads of ontology partitioning and scalable ontology alignment systems</article-title>
          .
          <source>International Journal of Metadata, Semantics and Ontologies</source>
          <volume>8</volume>
          (
          <issue>3</issue>
          ) (
          <year>2013</year>
          )
          <fpage>215</fpage>
          -
          <lpage>225</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Diallo</surname>
          </string-name>
          , G.:
          <article-title>Efficient building of local repository of distributed ontologies</article-title>
          .
          <source>In: Proceedings of the Seventh International Conference on Signal-Image Technology and Internet-Based Systems, SITIS</source>
          <year>2011</year>
          , Dijon, France,
          <source>November 28 - December 1</source>
          ,
          <year>2011</year>
          . (
          <year>2011</year>
          )
          <fpage>159</fpage>
          -
          <lpage>166</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dramé</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diallo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delva</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dartigues</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mouillet</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salamon</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mougin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Reuse of termino-ontological resources and text corpora for building a multilingual domain ontology: An application to alzheimer's disease</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>48</volume>
          (
          <year>2014</year>
          )
          <fpage>171</fpage>
          -
          <lpage>182</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben Yahia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Large ontologies partitioning for alignment techniques scaling</article-title>
          .
          <source>In: Proceedings of the 9th International Conference on Web Information Systems and Technologies (WEBIST)</source>
          ,
          <fpage>8</fpage>
          -10 May, Aachen, Germany (
          <year>2013</year>
          )
          <fpage>165</fpage>
          -
          <lpage>168</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Yahia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Zghal</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Damo - direct alignment for multilingual ontologies</article-title>
          .
          <source>In: Proceedings of the 3rd International Conference on Knowledge Engineering and Ontology Development (KEOD)</source>
          ,
          <fpage>26</fpage>
          -
          <lpage>29</lpage>
          October, Paris, France (
          <year>2011</year>
          )
          <fpage>110</fpage>
          -
          <lpage>117</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben Yahia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>When external linguistic resource supports crosslingual ontology alignment</article-title>
          .
          <source>In: Proceedings of the 5th International Conference on Web and Information Technologies (ICWIT</source>
          <year>2013</year>
          ),
          <fpage>9</fpage>
          -
          <lpage>12</lpage>
          , May, Hammamet, Tunisia (
          <year>2013</year>
          )
          <fpage>327</fpage>
          -
          <lpage>336</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben Yahia</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Using linguistic resource for cross-lingual ontology alignment</article-title>
          .
          <source>International Journal of Recent Contributions from Engineering</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ) (
          <year>2013</year>
          )
          <fpage>21</fpage>
          -
          <lpage>27</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben Yahia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Bridging the multilingualism gap in ontology alignment</article-title>
          .
          <source>International Journal of Metadata, Semantics and Ontologies</source>
          <volume>9</volume>
          (
          <issue>3</issue>
          ) (
          <year>2014</year>
          )
          <fpage>252</fpage>
          -
          <lpage>262</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>El Abdi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souid</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Yahia</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Clona results for oaei 2015</article-title>
          .
          <source>In: Proceedings of the 12th International Workshop on Ontology Matching (OM-2015)</source>
          <article-title>Colocated with the 14th International Semantic Web Conference (ISWC-</article-title>
          <year>2015</year>
          ).
          <article-title>Volume 1545 of CEUR-WS</article-title>
          .,
          <source>Bethlehem (PA US)</source>
          (
          <year>2015</year>
          )
          <fpage>124</fpage>
          -
          <lpage>129</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diallo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Yahia</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Initiating cross-lingual ontology alignment with information retrieval techniques</article-title>
          . In:
          <article-title>Actes de la 6ème Edition des Journées Francophones sur les Ontologies (JFO'</article-title>
          <year>2016</year>
          ), Bordeaux, France (
          <year>2016</year>
          )
          <fpage>57</fpage>
          -
          <lpage>68</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Damak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souid</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Exona results for oaei 2015</article-title>
          .
          <source>In: Proceedings of the 12th International Workshop on Ontology Matching (OM-2015)</source>
          <article-title>Colocated with the 14th International Semantic Web Conference (ISWC-</article-title>
          <year>2015</year>
          ).
          <article-title>Volume 1545 of CEUR-WS</article-title>
          .,
          <source>Bethlehem (PA US)</source>
          (
          <year>2015</year>
          )
          <fpage>145</fpage>
          -
          <lpage>149</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Damak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Alignement d'ontologies à base d'instances indexées</article-title>
          . In:
          <article-title>Actes de la 6ème Edition des Journées Francophones sur les Ontologies (JFO'</article-title>
          <year>2016</year>
          ), Bordeaux, France (
          <year>2016</year>
          )
          <fpage>69</fpage>
          -
          <lpage>74</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>