<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LogMap and LogMapLt Results for OAEI 2012</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ernesto Jime´nez-Ruiz</string-name>
          <email>ernesto@cs.ox.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Cuenca Grau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ian Horrocks</string-name>
          <email>ian.horrocks@cs.ox.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Oxford</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the results obtained by our ontology matching system LogMap and its 'lightweight” variant called LogMapLt within the OAEI 2012 campaign. The LogMap project started in January 2011 with the objective of developing a scalable and logic-based ontology matching system. This is our third participation in the OAEI and the experience has so far been very positive. Presentation of the system LogMap [10, 14] is a highly scalable ontology matching system with built-in reasoning and inconsistency repair capabilities. LogMap also supports (real-time) user interaction during the matching process, which is essential for use cases requiring very accurate mappings. To the best of our knowledge, LogMap is the only matching system that (1) can efficiently match semantically rich ontologies containing tens (and even hundreds) of thousands of classes, (2) incorporates sophisticated reasoning and repair techniques to minimise the number of logical inconsistencies, and (3) provides support for user intervention during the matching process. LogMap is also available as a “lightweight” variant called LogMapLt, which essentially skips all reasoning, repair and semantic indexation steps. Due to its simplicity, scalability and reasonable quality of its output, LogMapLt has been adopted as baseline in some OAEI tracks [19].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Technical challenges</title>
      <p>Building a scalable, logic-based and interactive ontology matching presents important
technical challenges. Moreover, these requirements are in some respects conflicting,
and design choices require compromises between them. We next provide an overview
of the technical challenges we have faced in the design of LogMap.</p>
      <p>I. Computing Candidate Mappings. Computing mappings requires pairwise
comparison of the entities in the vocabularies of the relevant ontologies (e.g., using a string
matcher). This leads to a search space that is quadratic in the size of the ontologies (e.g.,
there are over 4 billion candidate mappings between FMA and NCI). For large
ontologies, performing such huge number of pairwise comparisons is unfeasible in practice,
even if we rely on the fastest available string matchers. Hence, reducing the search space
of candidate mappings is a key challenge for a scalable ontology matching system.
II. Detection of unsatisfiable classes. Ontology O1 [ O2 [ M resulting from the
integration of O1 and O2 via mappings M may entail axioms that do not follow from
O1, O2, or M alone. Many such entailments correspond to unsatisfiable classes, which
are due to either erroneous mappings or to inherent disagreements between O1 and O2.
For example, the union of FMA, SNOMED and the UMLS [3] mappings between them
(which are the result of careful manual curation) has over 6; 000 unsatisfiable classes
[13], and the number of unsatisfiable classes may be even higher when mappings are
not subject to manual curation. Although state-of-the-art OWL 2 reasoners can
efficiently classify existing large-scale biomedical ontologies individually (e.g., ELK [16]
can classify SNOMED in a few seconds and HermiT [21] can classify FMA in less than
a minute), the integration of these ontologies via mappings leads to challenging
classification problems [9] (e.g., no reasoner known to us can classify the integration of
SNOMED and NCI via mappings).</p>
      <p>III. Repair of unsatisfiable classes. Standard justification-based repair techniques (e.g.,
[15, 23, 8]) can be used to repair the identified unsatisfiable classes in O1 [ O2 [ M.
These techniques have been implemented in mapping repair systems such as
ContentMap [12] and Alcomo1 [18]. The scalability problem, however, is exacerbated by
the number of unsatisfiable classes to be repaired. For example, computing all
justifications for just one out of the 6; 000 unsatisfiable classes in the integration of
FMASNOMED via UMLS mappings requires, on average, over 9 minutes using HermiT
— even with the optimisation proposed in [24]; doing this for all unsatisfiable classes
would require more than 6 weeks.</p>
      <p>IV. Expert feedback during the matching process is important for use cases requiring
very accurate mappings; however, smooth interaction with domain experts imposes very
strict scalability requirements. Furthermore, feedback requests to a human expert should
not be overwhelming and should be used only when strictly needed. Hence, it is crucial
to reduce the number of feedback requests, on the one hand, as well as the delay between
successive requests, on the other hand.
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Technical approach</title>
      <p>In order to meet these challenges, we have relied on the following key elements in the
design of LogMap (see [10, 14] for details).</p>
      <p>Lexical indexation. An inverted index is used to store the lexical information contained
in the input ontologies. This index is the key to addressing challenge I since it allows
for the efficient computation of an initial set of mappings of manageable size. Similar
indexes have been successfully used in information retrieval and search engine
technologies [2].</p>
      <p>Logic-based module extraction. The practical feasibility of unsatisfiability detection
and repair critically depends on the size of the input ontologies. To reduce the size of
the problem, we exploit ontology modularisation techniques. Ontology modules with
well-understood semantic properties can be efficiently computed and are typically much
smaller than the input ontology [5, 17].
1 Note that Alcomo also implements incomplete reasoning and repair techniques.
Propositional Horn reasoning. The relevant modules in the input ontologies together
with (a subset of) the candidate mappings are encoded in LogMap using a Horn
propositional representation. LogMap implements the classic Dowling-Gallier algorithm for
propositional Horn satisfiability [6, 7], which can be exploited to detect unsatisfiable
classes in linear time. Such encoding, although incomplete, allows LogMap to address
challenge II soundly and efficiently.</p>
      <p>Axiom tracking and greedy repair. LogMap extends Dowling-Gallier’s algorithm to
track all mappings that may be involved in the unsatisfiability of a class. This
extension is key to implementing a highly scalable greedy repair algorithm that can meet
challenge III.</p>
      <p>Semantic indexation. The Horn propositional representation of the ontology modules
and the mappings are efficiently indexed using an interval labelling schema [1] — an
optimised data structure for storing directed acyclic graphs (DAGs) that significantly
reduces the cost of answering taxonomic queries [4, 22]. In particular, this semantic
index allows us to answer many entailment queries over the input ontologies and the
mappings computed thus far as an index lookup operation, and hence without the need
for reasoning. The semantic index complements the use of a propositional encoding to
address challenges II-III and it is the key to meeting challenge IV.
1.3</p>
    </sec>
    <sec id="sec-4">
      <title>Adaptations made for the evaluation</title>
      <p>LogMap’s algorithm described in [10, 14] has been extended with basic functionalities
to support matching of instance data.</p>
      <p>LogMap’s instance matching module is based on the same lexical indexation
techniques used in LogMap to match classes. In order to discover additional instance
mappings, LogMap also exploits the property assertions of the input ontologies to analise
the structure of their ABoxes.</p>
      <p>In order to minimise the number of logical errors caused by the instance mappings,
LogMap’s repair module is also used to detect and repair conflicts.
1.4</p>
    </sec>
    <sec id="sec-5">
      <title>Link to the system and parameters file</title>
      <p>LogMap2 is open-source and released under GNU Lesser General Public License 3.0.3
Latest components and source code are available from the LogMap’s Google code page:
http://code.google.com/p/logmap-matcher/.</p>
      <p>LogMap distributions can be easily customized through a configuration file
containing the matching parameters.</p>
      <p>LogMap can also be used directly through an AJAX-based Web interface where
matching tasks can be easily requested: http://csu6325.cs.ox.ac.uk/
2 http://www.cs.ox.ac.uk/isg/projects/LogMap/
3 http://www.gnu.org/licenses/
In this section, we present the results obtained by LogMap and LogMapLt in the OAEI
2012 campaign.
Ontologies in this track have been synthetically generated. The goal of this track is to
evaluate the matching systems in scenarios where the input ontologies lack important
information (e.g., classes contain no meaningful URIs or labels).</p>
      <p>Table 1 summarises the average results obtained by LogMap and LogMapLt. Note
that the computation of candidate mappings in LogMap and LogMapLt heavily relies
on the similarities between the vocabularies of the input ontologies; hence, there is a
direct negative impact in the cases where the labels are replaced by random strings.
This track involves the matching of the Adult Mouse Anatomy ontology (2,744 classes)
and a fragment of the NCI ontology describing human anatomy (3,304 classes). The
reference alignment has been manually curated, and it contains a significant number of
non-trivial mappings.</p>
      <p>Table 2 summarises the results obtained by LogMap and LogMapLt. The evaluation
was run on a machine with 4GB RAM and 2 cores.
The Conference track uses a collection of 16 ontologies from the domain of academic
conferences [25]. These ontologies have been created manually by different people and
are of very small size (between 14 and 140 entities). The track uses two reference
alignments RA1 and RA2. RA1 contains manually curated mappings between a subset of the
120 ontology pairs evaluated in the track. RA2 contains composed mappings, based on
the alignments in RA1, between all the ontology pairs.</p>
      <p>Table 3 summarises the average results obtained by LogMap and LogMapLt. The
last column represents the total runtime on generating all 120 alignments. Tests were
run on a laptop with Intel Core i5 2.67GHz and 4GB RAM.
This track is based on the translation of the OntoFarm collection of ontologies into
9 different languages [20]. Both LogMap and LogMapLt, as expected, obtained poor
results since they do not implement specific multilingual techniques.
2.5</p>
    </sec>
    <sec id="sec-6">
      <title>Library track</title>
      <p>The library track involves the matching of the STW thesaurus (6,575 classes) and the
TheSoz thesaurus (8,376 classes). Both of these thesauri provide vocabulary for
economic and social sciences. Table 4 summarises the results obtained by LogMap and
LogMapLt. The track was run on a machine with 7GB RAM and 2 cores.
2.6</p>
    </sec>
    <sec id="sec-7">
      <title>Large BioMed track</title>
      <p>This track aims at finding alignments between large and semantically rich biomedical
ontologies such as FMA, SNOMED, and NCI [11]. UMLS Metathesaurus has been
selected as the basis for the track reference alignments [3]. Since the UMLS mappings
together with the input ontologies lead to numerous unsatisfiable classes, two
refinements of the UMLS mappings have also been considered as reference alignments. These
refinements have been generated using LogMap’s repair facility [10] and the Alcomo
debugging system [18]. The track has been split into nine tasks involving different
fragments of FMA, SNOMED, and NCI.</p>
      <p>LogMap has been evaluated with two configurations in this track. LogMap’s
default algorithm computes an estimation of the overlapping between the input ontologies
before the matching process, while the variant LogMapnoe has this feature deactivated.</p>
      <p>Tables 5-7 summarises the results obtained by LogMap, LogMapnoe and LogMapLt.
Precision and recall represent average values for the three reference alignments. The
number of unsatisfiable classes as a consequence of reasoning (using HermiT [21]) with
the input ontologies and the output mappings is also given.4 Note that LogMap, unlike
LogMapnoe, failed to detect and repair a few unsatisfiable classes in the SNOMED-NCI
matching problem since they were outside the computed ontology fragments. The track
was run on a server with 16 CPUs and allocating 15GB RAM.
4 Since no OWL 2 reasoner can classify the integration of SNOMED and NCI via mappings [9],
the Dowling-Gallier algorithm [6] for propositional Horn satisfiability was used instead.</p>
      <p>System
LogMap
LogMapnoe
LogMapLt
System
LogMap
LogMapnoe
LogMapLt
System
LogMap
LogMapnoe
LogMapLt
System
LogMap
LogMapnoe
LogMapLt
System
LogMap
LogMapnoe
LogMapLt
System
LogMap
LogMapnoe
LogMapLt</p>
    </sec>
    <sec id="sec-8">
      <title>2.7 Instance matching</title>
      <p>LogMap and LogMapLt have participated in the Sandbox and IIMB matching tasks. The
SandBox and IIMB datasets have been automatically generated by introducing a set of
controlled transformations in an initial ABox, as a result Sandbox and IIMB contains
11 and 80 synthetic ABoxes, respectively.</p>
      <p>Table 8 summarises the average results obtained by LogMap and LogMapLt. The
results are quite promising considering that this is the first participation of LogMap in
this track. Nevertheless, there is still room for improvement in order to deal with more
challenging tasks.
3</p>
      <p>General comments and conclusions
Comments on the results. LogMap’s main weakness is that the computation of candidate
mappings relies on the similarities between the vocabularies of the input ontologies;
hence, there is a direct negative impact in the cases where the ontologies are lexically
disparate or do not provide enough lexical information.</p>
      <p>Discussions on the way to improve the proposed system. LogMap is now a stable and
mature system that has been made available to the community. There are, however,
many exciting possibilities for future work. For example we aim at implementing
multilingual features in order to be competitive in the Multifarm track. We also intend to
extend LogMap’s instance matching module with more sophisticated techniques.
Comments on the OAEI 2012 measures. Although the mapping coherence is a measure
already used in the OAEI we consider that is not given the required weight in the
evaluation. Thus, developers focus on creating matching systems that maximize the F-measure
but they disregard the impact of the generated ouput in terms of logical errors.
Acknowledgements. This work was supported by the Royal Society, the EPSRC project
LogMap and the EU FP7 projects SEALS and Optique. We also thank the organisers
of the OAEI evaluation campaigns for providing test data and infrastructure and Anton
Morant and Yujiao Zhou who have also contributed to the LogMap project in the past.
2. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press /</p>
      <p>Addison-Wesley (1999)
3. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical
terminology. Nucleic Acids Research 32, 267–270 (2004)
4. Christophides, V., Plexousakis, D., Scholl, M., Tourtounis, S.: On labeling schemes for the</p>
      <p>Semantic Web. In: Int’l World Wide Web (WWW) Conf. pp. 544–555 (2003)
5. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies: Theory
and practice. J. Artif. Intell. Res. 31, 273–318 (2008)
6. Dowling, W.F., Gallier, J.H.: Linear-time algorithms for testing the satisfiability of
propositional Horn formulae. J. Log. Prog. 1(3), 267–284 (1984)
7. Gallo, G., Urbani, G.: Algorithms for testing the satisfiability of propositional formulae. J.</p>
      <p>
        Log. Prog. 7(1), 45–6
        <xref ref-type="bibr" rid="ref1">1 (1989</xref>
        )
8. Horridge, M., Parsia, B., Sattler, U.: Laconic and precise justifications in OWL. In: Int’l Sem.
      </p>
      <p>Web Conf. (ISWC). pp. 323–338 (2008)
9. Jime´nez-Ruiz, E., Cuenca Grau, B., Horrocks, I.: On the feasibility of using OWL 2 DL
reasoners for ontology matching problems. In: OWL Reasoner Evaluation Workshop (2012)
10. Jimenez-Ruiz, E., Cuenca Grau, B.: LogMap: Logic-based and Scalable Ontology Matching.</p>
      <p>In: Int’l Sem. Web Conf. (ISWC). pp. 273–288 (2011)
11. Jime´nez-Ruiz, E., Cuenca Grau, B., Horrocks, I.: Exploiting the UMLS Metathesaurus in the</p>
      <p>Ontology Alignment Evaluation Initiative. In: E-LKR Workshop (2012)
12. Jime´nez-Ruiz, E., Cuenca Grau, B., Horrocks, I., Berlanga, R.: Ontology integration using
mappings: Towards getting the right logical consequences. In: Eur. Sem. Web Conf. (ESWC).
pp. 173–187 (2009)
13. Jime´nez-Ruiz, E., Cuenca Grau, B., Horrocks, I., Berlanga, R.: Logic-based assessment of
the compatibility of UMLS ontology sources. J. Biomed. Sem. 2 (2011)
14. Jime´nez-Ruiz, E., Cuenca Grau, B., Zhou, Y., Horrocks, I.: Large-scale interactive ontology
matching: Algorithms and implementation. In: Eur. Conf. on Artif. Intell. (ECAI) (2012)
15. Kalyanpur, A., Parsia, B., Horridge, M., Sirin, E.: Finding all justifications of OWL DL
entailments. In: Int’l Sem. Web Conf. (ISWC). pp. 267–280 (2007)
16. Kazakov, Y., Kro¨tzsch, M., Simancik, F.: Concurrent classification of EL ontologies. In: Int’l</p>
      <p>Sem. Web Conf. (ISWC). pp. 305–320 (2011)
17. Konev, B., Lutz, C., Walther, D., Wolter, F.: Semantic modularity and module extraction in
description logics. In: European Conf. on Artif. Intell. (ECAI). pp. 55–59 (2008)
18. Meilicke, C.: Alignment Incoherence in Ontology Matching. Ph.D. thesis, University of</p>
      <p>Mannheim (2011)
19. Meilicke, C., Svab-Zamazal, O., Trojahn, C., Jimenez-Ruiz, E., Aguirre, J., Stuckenschmidt,
H., Cuenca Grau, B.: Evaluating ontology matching systems on large, multilingual and
realworld test cases. In: ArXiv e-prints (2012), http://arxiv.org/abs/1208.3148v1
20. Meilicke, C., Castro, R.G., Freitas, F., van Hage, W.R., Montiel-Ponsoda, E., de Azevedo,
R.R., Stuckenschmidt, H., Sˇva´b-Zamazal, O., Sva´tek, V., Tamilin, A., Trojahn, C., Wang, S.:
MultiFarm: a benchmark for multilingual ontology matching. J. Web Sem. (2012)
21. Motik, B., Shearer, R., Horrocks, I.: Hypertableau reasoning for description logics. J. Artif.</p>
      <p>Intell. Res. 36, 165–228 (2009)
22. Nebot, V., Berlanga, R.: Efficient retrieval of ontology fragments using an interval labeling
scheme. Inf. Sci. 179(24), 4151–4173 (2009)
23. Schlobach, S., Huang, Z., Cornet, R., van Harmelen, F.: Debugging incoherent terminologies.</p>
      <p>J. Autom. Reasoning 39(3) (2007)
24. Suntisrivaraporn, B., Qi, G., Ji, Q., Haase, P.: A modularization-based approach to finding
all justifications for OWL DL entailments. In: Asian Sem. Web Conf. (ASWC) (2008)
25. Sˇ va´b, O., Sva´tek, V., Berka, P., Rak, D., Toma´sˇek, P.: OntoFarm: towards an experimental
collection of parallel ontologies. In: Int’l Sem. Web Conf. (ISWC). Poster Session (2005)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgida</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jagadish</surname>
            ,
            <given-names>H.V.</given-names>
          </string-name>
          :
          <article-title>Efficient management of transitive relationships in large data and knowledge bases</article-title>
          .
          <source>In: SIGMOD Rec. 18</source>
          . pp.
          <fpage>253</fpage>
          -
          <lpage>262</lpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>