<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>EVOCROS: Results for OAEI 2019</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juliana Medeiros Destro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Javier A. Vargas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Cesar dos Reis</string-name>
          <email>jreisg@ic.unicamp.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo da S. Torres</string-name>
          <email>ricardo.torres@ntnu.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computing, University of Campinas</institution>
          ,
          <addr-line>Campinas-SP</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Norwegian University of Science and Technology (NTNU)</institution>
          ,
          <addr-line>Alesund</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>This paper describes the updates in EVOCROS, a crosslingual ontology alignment system suited to create mappings between ontologies described in di erent natural language. Our tool combines syntactic and semantic similarity measures with information retrieval techniques. The semantic similarity is computed via NASARI vectors used together with BabelNet, which is a domain-neutral semantic network. In particular, we investigate the use of rank aggregation techniques in the cross-lingual ontology alignment task. The tool employs automatic translation to a pivot language to consider the similarity. EVOCROS was tested and obtained high quality alignment in the Multifarm dataset. We discuss the experimented con gurations and the achieved results in OAEI 2019. This is our second participation in OAEI.</p>
      </abstract>
      <kwd-group>
        <kwd>cross-lingual matching knowledge</kwd>
        <kwd>ranking aggregation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>State, purpose, general statement</title>
      <p>EVOCROS is a cross-lingual ontology alignment tool. The newest version of the
tool leverages supervised methods of ranking aggregation techniques exploiting
labeled information (i.e., training data) and ground-truth relevance to boost
the e ectiveness of a new ranker. Our goal is to leverage rank aggregation in
cross-lingual mapping, by generating ranked lists based on distinct similarity
measurements between the concepts of source and target ontologies.
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Speci c techniques used</title>
      <p>The tool is developed in Python 3 and uses learning to rank techniques
implemented in the well-known library RankLib. We model the mapping problem as
an information retrieval query. Figure 1 depicts the work ow of the proposed
technique. The inputs are source and target ontologies written in Web Ontology
Language (OWL). These ontologies are converted to objects. The rst step is
the pre-processing of the source and target input ontologies, converting them
into owlready2 objects. Each concept of the source ontology is compared to all
concepts of the target ontology.</p>
      <p>RankLib: https://sourceforge.net/p/lemur/wiki/RankLib/ (As of November 16,
2019).</p>
      <p>Python 3 library to manipulate ontologies as objects.</p>
      <p>Each entity of the source ontology is compared with all entities of the same
type found in the target ontology (i.e., classes are matched to classes and
properties are matched to properties). In this sense, for each entity ei in the source
ontology OX , we calculate the similarity value with each entity ej in the target
ontology OY (Figure 2), thus generating a ranked list frank1; rank2; rank3;
rank4g for each similarity measure used (cf. Figure 3).</p>
      <p>For similarity measures that rely on monolingual comparison (i.e., syntactic
and WordNet), the automatic translation of labels of entities ei 2 OX and
ej 2 OY to a pivot language is used by leveraging Google Translate API during
runtime. These similarity comparisons generate k ranks, each one based on a
di erent similarity measure. We use the measures to generate the ranks, thus
adding the exibility to the use or the addition of di erent similarity measures
without disrupting the technique.</p>
      <p>
        The ranks are then aggregated using LambdaMART [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] because this
technique has the best score among the majority of languages during the execution
phase of OAEI 2019. Figure 4 presents that the set of multiple ranks are
aggregated in a nal rank. The Top-1 result of the aggregated rank c2 2 COY is
mapped to the source ontology entity c1 2 COX , thus generating the candidate
mapping m(c1; c2) (cf. Figure 5). The mapping output follows the standard used
by the Alignment API [?].
      </p>
    </sec>
    <sec id="sec-4">
      <title>Link to the set of provided alignments (in align format)</title>
      <p>
        Alignment results are available at https://github.com/jmdestro/evocros-results
(As of November 16, 2019).
In this section, we describe the results obtained in the experiments conducted
in OAEI 2019.
We consider the MultiFarm dataset [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], version released in 2015. Our experiments
built cross-language ontology mappings by using English as a pivot language
for Levenshtein [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Jaro [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and WordNet similarity measures. The semantic
similarity relying on the Babelnet does not require a translation as it can retrieve
the synsets used in NASARI vectors [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], by using the concepts original language.
The application of each similarity measure in our technique generated a rank.
      </p>
      <p>A subset of all languages was used for training and validation. The subsets are
10% of queries for training set, 15% queries for validation set, and 75% queries
for testing. These subsets were generated per language and then combined, so
the algorithms were trained, validated and tested using all languages at once.
The comparable gold standard (i.e., MultiFarm manually curated mappings)
were adjusted to contain only the queries related to the testing subset. In this
sense, a lower number of entities was considered in the tests, because we removed
the set of queries used in training and validation from the reference mappings
to ensure consistency.</p>
      <p>Table 1 presents the obtained values for precision, recall, and f-measure for
each language pair tested. The precision, recall, and f-measure scores have the
same value due to the nature of the experiments. Our approach generates n :
n mappings, where n = jOX j = jOY j because the ontologies are translations
of each other to di erent natural languages, thus every entity in the source
ontology presents a correspondence in the target ontology. In this sense, both
the gold standard and the generated mappings have the same size because each
query (i.e., each entity in the source ontology) generates a mapping between the
query (source entity) and the top-1 result of the nal aggregated rank. Results</p>
      <sec id="sec-4-1">
        <title>General comments</title>
        <p>In this section, we discuss our results and the ways to improve the system.
3.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Comments on the results</title>
      <p>The tool had satisfactory results, with competitive f-measure, but the execution
time was exceedingly long due even with local caches for Babelnet NASARI
vectors. This is due to the amount of comparisons required during execution
because each concept or attribute in the source ontology is compared against all
concepts and attributes of the target ontology.
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Discussions on the way to improve the proposed system</title>
      <p>
        This was the second evaluation of the system and results are encouraging. Our
main goals for future work are: Reduce execution time: the tool has a long
execution time even with local caches. Our future work will explore ontology
partitioning during the pre-processing stage of the matching task to reduce
the amount of comparisons needed, thus improving the execution time. Bag
of graphs: ontologies can be represented as graphs, thus allowing for
partitioning [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and comparison of sub-graphs. Bag-of-graphs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a graph matching
approach, similar to bag-of-words. It represents graphs as feature vectors, highly
simplifying the computation of graph similarity and reducing execution time.
We propose as future investigation to use a simple vector-based representation
for graphs and investigate it for cross-lingual ontology matching.
3.3
      </p>
    </sec>
    <sec id="sec-7">
      <title>Comments on OAEI</title>
      <p>Although we were not participating, our tool was executed on the Knowledge
Graph track. There were issues during the evaluation phase, preventing the
system to fully participate in both Multifarm and KG tracks.
4</p>
      <sec id="sec-7-1">
        <title>Conclusion</title>
        <p>The newest version of EVOCROS proposed an approach considering four
similarity measures to build ranks and used a supervised method of rank aggregation.
This is the second participation of the system in OAEI. The evaluation with
the Multifarm dataset con rmed the quality of mappings generated by our
technique. For future work, we plan to improve our cross-lingual alignment proposal
by considering di erent combinations of similarity measures and di erent ways of
computing the syntactic and semantic similarities taking into account additional
stages in the pre-processing of the ontology.</p>
      </sec>
      <sec id="sec-7-2">
        <title>Acknowledgements</title>
        <p>This work was supported by S~ao Paulo Research Foundation (FAPESP): grant
#2017/02325-5.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Camacho-Collados</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pilehvar</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          , R.:
          <article-title>Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>240</volume>
          ,
          <fpage>36</fpage>
          {
          <fpage>64</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hamdi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Safar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reynaud</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zargayouna</surname>
          </string-name>
          , H.:
          <article-title>Alignment-based partitioning of large-scale ontologies</article-title>
          .
          <source>In: Advances in knowledge discovery and management</source>
          , pp.
          <volume>251</volume>
          {
          <fpage>269</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jaro</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Advances in record-linkage methodology as applied to matching the 1985 census of tampa, orida</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          <volume>84</volume>
          (
          <issue>406</issue>
          ),
          <volume>414</volume>
          {
          <fpage>420</fpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Levenshtein</surname>
            ,
            <given-names>V.I.</given-names>
          </string-name>
          :
          <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>
          .
          <source>Soviet Physics Doklady</source>
          <volume>10</volume>
          ,
          <issue>707</issue>
          {
          <fpage>710</fpage>
          (
          <year>1966</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
            <given-names>A</given-names>
          </string-name>
          -Castro,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Freitas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Van Hage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.R.</given-names>
            ,
            <surname>Montiel-Ponsoda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>De</surname>
          </string-name>
          <string-name>
            <surname>Azevedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.R.</given-names>
            ,
            <surname>Stuckenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>SVaB-Zamazal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Svatek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Tamilin</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , et al.:
          <article-title>Multifarm: A benchmark for multilingual ontology matching</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>15</volume>
          ,
          <issue>62</issue>
          {
          <fpage>68</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>F.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de O. Werneck</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldenstein</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tabbone</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , da
          <string-name>
            <given-names>S.</given-names>
            <surname>Torres</surname>
          </string-name>
          , R.:
          <article-title>Graph-based bag-of-words for classi cation</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>74</volume>
          (
          <string-name>
            <surname>Supplement</surname>
            <given-names>C)</given-names>
          </string-name>
          ,
          <volume>266</volume>
          { 285 (Feb
          <year>2018</year>
          ). https://doi.org/10.1016/j.patcog.
          <year>2017</year>
          .
          <volume>09</volume>
          .018, http://www.sciencedirect.com/science/article/pii/S0031320317303680
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burges</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svore</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
          </string-name>
          , J.:
          <article-title>Adapting boosting for information retrieval measures</article-title>
          .
          <source>Information Retrieval</source>
          <volume>13</volume>
          (
          <issue>3</issue>
          ),
          <volume>254</volume>
          {270 (Jun
          <year>2010</year>
          ). https://doi.org/10.1007/s10791-009-9112-1
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>