<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LogMap family results for OAEI 2014 ?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>E. Jime´nez-Ruiz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>B. Cuenca Grau</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>W. Xia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Solimando</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>X. Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V. Cross</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Y. Gong</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. Zhang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Chennai-Thiagarajan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science and Software Engineering, Miami University</institution>
          ,
          <addr-line>Oxford, OH</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Oxford</institution>
          ,
          <addr-line>Oxford</addr-line>
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dipartimento di Informatica, Universita` di Genova</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the results obtained in the OAEI 2014 campaign by our ontology matching system LogMap and its variants: LogMap-C, LogMap-Bio and LogMapLt. The LogMap project started in January 2011 with the objective of developing a scalable and logic-based ontology matching system. This is our fifth participation in the OAEI and the experience has so far been very positive. Presentation of the system Ontology matching systems typically rely on lexical and structural heuristics and the integration of the input ontologies and the mappings may lead to many undesired logical consequences. In [13] three principles were proposed to minimize the number of potentially unintended consequences, namely: (i) consistency principle, the mappings should not lead to unsatisfiable classes in the integrated ontology; (ii) locality principle, the mappings should link entities that have similar neighbourhoods; (iii) conservativity principle, the mappings should not introduce alterations in the classification of the input ontologies. Violations to these principles may hinder the usefulness of ontology mappings. The practical effect of these violations, however, is clearly evident when ontology alignments are involved in complex tasks such as query answering [17]. LogMap [12, 14] is a highly scalable ontology matching system that implements the consistency and locality principles. LogMap also supports (real-time) user interaction during the matching process, which is essential for use cases requiring very accurate mappings. LogMap is one of the few ontology matching system that (i) can efficiently match semantically rich ontologies containing tens (and even hundreds) of thousands of classes, (ii) incorporates sophisticated reasoning and repair techniques to minimise the number of logical inconsistencies, and (iii) provides support for user intervention during the matching process.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Logic-based module extraction. The practical feasibility of unsatisfiability detection
and repair critically depends on the size of the input ontologies. To reduce the size of
the problem, we exploit ontology modularisation techniques. Ontology modules with
well-understood semantic properties can be efficiently computed and are typically much
smaller than the input ontology (e.g. [6]).</p>
      <p>Propositional Horn reasoning. The relevant modules in the input ontologies together
with (a subset of) the candidate mappings are encoded in LogMap using a Horn
propositional representation. Furthermore, LogMap implements the classic Dowling-Gallier
algorithm for propositional Horn satisfiability [7]. Such encoding, although incomplete,
allows LogMap to detect unsatisfiable classes soundly and efficiently.
Axiom tracking and greedy repair. LogMap extends Dowling-Gallier’s algorithm to
track all mappings that may be involved in the unsatisfiability of a class. This
extension is key to implementing a highly scalable repair algorithm.</p>
      <p>
        Semantic indexation. The Horn propositional representation of the ontology modules
and the mappings are efficiently indexed using an interval labelling schema [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] — an
optimised data structure for storing directed acyclic graphs (DAGs) that significantly
reduces the cost of answering taxonomic queries [5, 19]. In particular, this semantic
index allows us to answer many entailment queries over the input ontologies and the
mappings computed thus far as an index lookup operation, and hence without the need
for reasoning. The semantic index complements the use of the propositional encoding
to detect and repair unsatisfiable classes.
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Adaptations made for the 2014 evaluation</title>
      <p>In the OAEI 2014 campaign we have participated with 3 additional variants:
LogMapLt is a “lightweight” variant of LogMap, which essentially only applies
(efficient) string matching techniques.</p>
      <p>LogMap-C is a variant of LogMap which, in addition to the consistency and locality
principles, also implements the conservativity principle (see details in [21, 20]).
The repair algorithm is more aggressive than in LogMap, thus we expect highly
precise mappings but with a significant decrease in recall.</p>
      <p>LogMap-Bio includes an extension to use BioPortal [10, 11] as a (dynamic) provider
of mediating ontologies instead of relying on a few preselected ontologies [4]. In
the OAEI 2014, LogMap-Bio uses the top-5 mediating ontologies given by the
algorithm presented in [4]. Note that, LogMap-Bio only participates in the biomedical
tracks. In the other tracks the results are expected to be the same as LogMap.</p>
      <p>LogMap’s algorithm described in [12, 14] has also been adapted with the following
new functionalities:
i Multilingual support. We have implemented a multilingual module based on google
translate4 to participate in the Multifarm track. Additionally, in order to split
Chi4 Currently we use the (unofficial) API available at https://code.google.com/p/
google-api-translate-java/.
nese words, we rely on the ICTCLAS library5 developed by the Institute of
Computing Technology of the Chinese Academy of Sciences.
ii Extended repair algorithm. We have extended the Horn propositional projection
of the input ontologies to involve data and object properties in the repair
process [24]. LogMap’s repair module is now more complete and it is also able to
repair (object and data) property mappings.6
iii Extended interactive support. The interactive algorithm described in [14] has been
slightly extended to include object and data properties in the process. Note that this
extension was already included in the OAEI 2013 campaign.
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Link to the system and parameters file</title>
      <p>LogMap is open-source and released under GNU Lesser General Public License 3.0.7
Latest components and source code are available from the LogMap’s Google code page:
http://code.google.com/p/logmap-matcher/.</p>
      <p>LogMap distributions can be easily customized through a configuration file
containing the matching parameters.</p>
      <p>LogMap, including support for interactive ontology matching, can also be used
directly through an AJAX-based Web interface: http://csu6325.cs.ox.ac.uk/.
This interface has been very well received by the community, with more than 1,500
requests processed so far coming from a broad range of users.
1.3</p>
    </sec>
    <sec id="sec-4">
      <title>Modular support for mapping repair</title>
      <p>Only very few systems participating in the OAEI competition implement repair
techniques. As a result, existing matching systems (even those that typically achieve very
high precision scores) compute mappings that lead in many cases to a large number of
unsatisfiable classes.</p>
      <p>We believe that these systems could significantly improve their output if they were
to implement repair techniques similar to those available in LogMap. Therefore, with
the goal of providing a useful service to the community, we have made LogMap’s
ontology repair module (LogMap-Repair) available as a self-contained software component
that can be seamlessly integrated in most existing ontology matching systems [15, 9].
2</p>
      <sec id="sec-4-1">
        <title>Results</title>
        <p>In this section, we present a summary of the results obtained by the LogMap family in
the OAEI 2014 campaign. Please refer to http://oaei.ontologymatching.
org/2014/results/index.html for complete results.</p>
        <sec id="sec-4-1-1">
          <title>5 https://code.google.com/p/ictclas4j/</title>
          <p>6 The OAEI 2014 coherence results does not exhibit these improvements since only the
conference track ontologies involve mappings among properties and LogMap 2013 was already
coherent. It does have, however, an impact when repairing other mapping sets as shown in [24].
7 http://www.gnu.org/licenses/
Ontologies in this track have been synthetically generated. The goal of this track is to
evaluate the matching systems in scenarios where the input ontologies lack important
information (e.g., classes contain no meaningful URIs or labels) [8].</p>
          <p>Table 1 summarises the average results obtained by LogMap and its variants. Note
that the computation of candidate mappings in LogMap (and its variants) heavily relies
on the similarities between the vocabularies of the input ontologies; hence, there is a
direct negative impact in the cases where the labels are replaced by random strings.
Surprisingly, LogMapLt obtained the best results in the dog test case.
2.2</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Anatomy track</title>
      <p>This track involves the matching of the Adult Mouse Anatomy ontology (2,744 classes)
and a fragment of the NCI ontology describing human anatomy (3,304 classes). The
reference alignment has been manually curated [25], and it contains a significant number
of non-trivial mappings.</p>
      <p>Table 2 summarises the results obtained by the LogMap family. LogMap-Bio ranked
2nd in the track. The use of BioPortal as mediating ontology provider had a significant
improvement in recall. LogMap-Bio runtime is near 10 minutes since the discovery of
the mediating ontologies is performed on-the-fly [4]. Regarding mapping coherence,
only two tools (apart from LogMap, LogMap-C and LogMap-Bio) generated coherent
alignments. The evaluation was run on a server with 3.46 GHz (6 cores) and 8GB RAM.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Conference track</title>
      <p>The Conference track uses a collection of 16 ontologies from the domain of academic
conferences [23]. These ontologies have been created manually by different people and
are of very small size (between 14 and 140 entities). The track uses two reference
alignments RA1 and RA2. RA1 contains manually curated mappings between 21 ontology
pairs, while RA2 also contains composed mappings based on the alignments in RA1.</p>
      <p>Table 3 summarises the average results obtained by the LogMap family. The last
column represents the total runtime on generating all 21 alignments. Tests were run on
a laptop with Intel Core i5 2.67GHz and 8GB RAM. LogMap ranked 2nd and
LogMapC ranked 3rd. They both produced coherent alignments.
2.4</p>
    </sec>
    <sec id="sec-7">
      <title>Multifarm track</title>
      <p>This track is based on the translation of the OntoFarm collection of ontologies into 9
different languages [18].</p>
      <p>In the OAEI 2014, only LogMap, AML and XMap implemented specific
multilingual techniques. Table 4 summarises the results. LogMap achieved very competitive
results in terms of precision. Regarding recall, however, there is still room for
improvement. In the close future we plan to extend the multilingual module with more
sophisticated translation techniques.
2.5</p>
    </sec>
    <sec id="sec-8">
      <title>Library track</title>
      <p>The library track involves the matching of the STW thesaurus (6,575 classes) and the
TheSoz thesaurus (8,376 classes). Both of these thesauri provide vocabulary for
economic and social sciences. Table 5 summarises the results obtained by the LogMap
family. The track was run on a computer with one 2.4GHz core with 7GB RAM and 2 cores.
LogMap ranked 2nd in this track. The results for LogMap* are obtained with a version
of the input OWL ontologies using skos labels (i.e. skos:altLabel and skos:prefLabel).</p>
    </sec>
    <sec id="sec-9">
      <title>2.6 Interactive matching track</title>
      <p>The interactive track is based on the conference track and it uses the RA1 reference
alignment as Oracle. Table 6 summarizes the obtained results by LogMap with the</p>
      <p>P</p>
      <p>R</p>
      <p>F</p>
      <p>P</p>
      <p>R</p>
      <p>Inc. Degree.
1,751
8,634
6,331
317
interactive mode activated. LogMap with interactivity improved both the average
Precision and Recall wrt LogMap with the interactive mode deactivated (see Section 2.3).
LogMap performed on average, 3.91 calls to the Oracle along the 21 matching tasks.
LogMap ranked 2nd in the interactive matching track, but it was the system performing
less calls to the oracle.
2.7</p>
    </sec>
    <sec id="sec-10">
      <title>Large BioMed track</title>
      <p>This track consists of finding alignments between the Foundational Model of Anatomy
(FMA), SNOMED CT, and the National Cancer Institute Thesaurus (NCI). These
ontologies are semantically rich and contain tens of thousands of classes. UMLS
Metathesaurus [3] has been selected as the basis for the track reference alignments.</p>
      <p>Table 7 summarises the results obtained by the LogMap family. The table shows
the total time in seconds to complete all tasks in the track and averages for Precision,
Recall, F-measure and Incoherence degree. The track was run on a Ubuntu Laptop with
an Intel Core i7-4600U CPU @ 2.10GHz x 4 and allocating 15Gb of RAM..</p>
      <p>Only AML and LogMap variants (excluding LogMapLt) generated almost
coherent alignments. LogMap ranked 2nd in the track, while LogMap-C and LogMap-Bio
obtained the best average Precision and the second best average Recall, respectively.
LogMapLt was the fastest to complete all tasks.
The Ontology Alignment for Query Answering (OA4QA) track [22] does not follow the
classical ontology alignment evaluation with respect to a set of reference alignments.
Precision and recall is calculated with respect to the ability of the generated alignments
to answer a set of queries in a ontology-based data access scenario where several
ontologies exist. Given a query and an ontology pair, a model (or reference) answer set is
computed using the correspondent reference alignment for the ontology pair. Precision
and recall is calculated with respect to these model answer sets.</p>
      <p>In the OAEI 2014 the ontologies and reference alignment (RA1) are based on the
conference track. RAR1 is a repaired version of RA1 different from RA2 in the
conference track. Table 8 summarises the (average) results for the LogMap family. LogMap
and LogMap-C ranked 1st and 2nd in the track, although the number of queries is still
not large enough to provide representative values for Precision and Recall. However,
the most interesting result is the number of queries a system is able to answer when
the computed alignments is incoherent. For example, LogMapLt, since it does not
implement mapping repair techniques, is only able to answer 11 of the queries, which
damages the obtained precision and recall.</p>
    </sec>
    <sec id="sec-11">
      <title>2.9 Instance matching track</title>
      <p>The results of LogMap (and LogMap-C) were not as good as previous years. Note that,
LogMap does not implement specialised instance matching techniques. Nevertheless,
LogMap outperformed two of the participating tools specialised in instance matching.
Table 9 summarises the results obtained by LogMap and LogMap-C.
3
3.1</p>
      <sec id="sec-11-1">
        <title>General comments and conclusions</title>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>Comments on the results</title>
      <p>LogMap, apart from Benchmark and Instance Matching tracks for which does not
implement specific techniques, has been one of the top systems in the OAEI 2014.
Furthermore, it has also been one of the few systems implementing repair techniques and
providing (almost) coherent mappings in all tracks.</p>
      <p>LogMap’s main weakness relies on the fact that the computation of candidate
mappings is based on the similarities between the vocabularies of the input ontologies;
hence, there is a direct negative impact in the cases where the ontologies are lexically
disparate or do not provide enough lexical information (e.g. Benchmark and Instance
Matching).
3.2</p>
    </sec>
    <sec id="sec-13">
      <title>Discussions on the way to improve the proposed system</title>
      <p>LogMap is now a stable and mature system that has been made available to the
community. There are, however, many exciting possibilities for future work. For example we
aim at improving the multilingual features and the current use of external resources like
BioPortal. Furthremore, we are applying LogMap in practice in the domain of oil and
gas industry within the FP7 Optique8 [16], which presents a very challenging scenario.
3.3</p>
    </sec>
    <sec id="sec-14">
      <title>Comments on the OAEI test cases</title>
      <p>The number and quality of the OAEI tracks is growing year by year. However, there is
always room for improvement:
Comments on the OA4QA track. The new OA4QA track has succesfully shown the
negative impact of a incoherent alignment in query answering tasks. However, the number
of queries is still small to provide representative values for the F-measure. More queries
and more challenging ontologies will make the track more attractive.</p>
      <p>Comments on the OAEI interactive matching track. The interactive track has been a
very important step forward in the OAEI, however, larger and more challengings tasks
should be included. For example, matching tasks (e.g. anatomy and largebio) where
the number of questions to the expert user or Oracle may be critical. Furthermore, it is
quite unlikely that the expert user will be perfect, thus, the interactive matching track
should also consider the evaluation of several Oracles with different error rates such as
the evaluation performed in [14].</p>
      <p>Comments on the OAEI largebio track. One of the objectives of the largebio track is the
creation of a “silver standard” reference alignment by harmonising the output of the
different participating systems. In the next OAEI campaign it would be very interesting to
actively use this “silver standard” in the construction of the track’s reference alignment.
This will help to improve the completeness of the reference alignment.</p>
      <sec id="sec-14-1">
        <title>8 http://www.optique-project.eu/</title>
        <p>2. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press /</p>
        <p>Addison-Wesley (1999)
3. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical
terminology. Nucleic Acids Research 32, 267–270 (2004)
4. Chen, X., Xia, W., Jime´nez-Ruiz, E., Cross, V.: Extending an ontology alignment system
with bioportal: a preliminary analysis. In: Poster at Int’l Sem. Web Conf. (ISWC) (2014)
5. Christophides, V., Plexousakis, D., Scholl, M., Tourtounis, S.: On labeling schemes for the</p>
        <p>Semantic Web. In: Int’l World Wide Web (WWW) Conf. pp. 544–555 (2003)
6. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies: Theory
and practice. J. Artif. Intell. Res. 31, 273–318 (2008)
7. Dowling, W.F., Gallier, J.H.: Linear-time algorithms for testing the satisfiability of
propositional Horn formulae. J. Log. Prog. 1(3), 267–284 (1984)
8. Euzenat, J., Rosoiu, M.E., dos Santos, C.T.: Ontology matching benchmarks: Generation,
stability, and discriminability. J. Web Sem. 21, 30–48 (2013)
9. Faria, D., Jime´nez-Ruiz, E., Pesquita, C., Santos, E., Couto, F.M.: Towards annotating
potential incoherences in bioportal mappings. In: 13th Int’l Sem. Web Conf. (ISWC) (2014)
10. Fridman Noy, N., Shah, N.H., Whetzel, P.L., Dai, B., et al.: BioPortal: ontologies and
integrated data resources at the click of a mouse. Nucleic Acids Research 37, 170–173 (2009)
11. Ghazvinian, A., Noy, N.F., Jonquet, C., Shah, N.H., Musen, M.A.: What four million
mappings can tell you about two hundred ontologies. In: Int’l Sem. Web Conf. (ISWC) (2009)
12. Jime´nez-Ruiz, E., Cuenca Grau, B.: LogMap: Logic-based and Scalable Ontology Matching.</p>
        <p>In: Int’l Sem. Web Conf. (ISWC). pp. 273–288 (2011)
13. Jime´nez-Ruiz, E., Cuenca Grau, B., Horrocks, I., Berlanga, R.: Logic-based assessment of
the compatibility of UMLS ontology sources. J. Biomed. Sem. 2 (2011)
14. Jime´nez-Ruiz, E., Cuenca Grau, B., Zhou, Y., Horrocks, I.: Large-scale interactive ontology
matching: Algorithms and implementation. In: Europ. Conf. on Artif. Intell. (ECAI) (2012)
15. Jime´nez-Ruiz, E., Meilicke, C., Cuenca Grau, B., Horrocks, I.: Evaluating mapping repair
systems with large biomedical ontologies. In: 26th Description Logics Workshop (2013)
16. Kharlamov, E., Jime´nez-Ruiz, E., Zheleznyakov, D., et al.: Optique: Towards OBDA Systems
for Industry. In: Eur. Sem. Web Conf. (ESWC) Satellite Events. pp. 125–140 (2013)
17. Meilicke, C.: Alignment Incoherence in Ontology Matching. Ph.D. thesis, University of</p>
        <p>Mannheim (2011)
18. Meilicke, C., Castro, R.G., Freitas, F., van Hage, W.R., Montiel-Ponsoda, E., de Azevedo,
R.R., Stuckenschmidt, H., Sˇva´b-Zamazal, O., Sva´tek, V., Tamilin, A., Trojahn, C., Wang, S.:
MultiFarm: a benchmark for multilingual ontology matching. J. Web Sem. (2012)
19. Nebot, V., Berlanga, R.: Efficient retrieval of ontology fragments using an interval labeling
scheme. Inf. Sci. 179(24), 4151–4173 (2009)
20. Solimando, A., Jime´nez-Ruiz, E., Guerrini, G.: Detecting and correcting conservativity
principle violations in ontology-to-ontology mappings. In: Int’l Sem. Web Conf. (ISWC) (2014)
21. Solimando, A., Jime´nez-Ruiz, E., Guerrini, G.: A multi-strategy approach for detecting and
correcting conservativity principle violations in ontology alignments. In: Proc. of the 11th
International Workshop on OWL: Experiences and Directions (OWLED). pp. 13–24 (2014)
22. Solimando, A., Jime´nez-Ruiz, E., Pinkel, C.: Evaluating Ontology Alignment Systems in</p>
        <p>Query Answering Tasks. In: Poster at Int’l Sem. Web Conf. (ISWC) (2014)
23. Sˇ va´b, O., Sva´tek, V., Berka, P., Rak, D., Toma´sˇek, P.: OntoFarm: towards an experimental
collection of parallel ontologies. In: Int’l Sem. Web Conf. (ISWC). Poster Session (2005)
24. Zhang, S., Jime´nez-Ruiz, E., Cuenca Grau, B.: Inconsistency Repair in Ontology
Matching. MSc thesis., University of Oxford (2014), http://www.cs.ox.ac.uk/isg/
projects/LogMap/papers/Master_thesis_Shuo_Zhang.pdf
25. Zhang, S., Mork, P., Bodenreider, O.: Lessons learned from aligning two representations of
anatomy. In: Conf. on Princliples of Knowledge Representation and Reasoning (KR) (2004)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgida</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jagadish</surname>
            ,
            <given-names>H.V.</given-names>
          </string-name>
          :
          <article-title>Efficient management of transitive relationships in large data and knowledge bases</article-title>
          .
          <source>In: ACM SIGMOD Conf. on Management of Data</source>
          . pp.
          <fpage>253</fpage>
          -
          <lpage>262</lpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>