Evaluating Mapping Repair Systems with
                  Large Biomedical Ontologies

Ernesto Jiménez-Ruiz1 , Christian Meilicke2 , Bernardo Cuenca Grau1 , Ian Horrocks1
             1
                Department of Computer Science, University of Oxford, Oxford UK,
         2
             Research Group Data and Web Science, University of Mannheim, Germany


        Abstract. In this paper we provide empirical evidence of the necessity of in-
        tegrating mapping repair techniques within the ontology matching process, an
        aspect that is neglected in many ontology matching systems. We also evaluate the
        feasibility of using state-of-the-art mapping repair techniques in practice, such as
        those implemented in Alcomo and LogMap. A preliminary evaluation was con-
        ducted in the context of the Ontology Alignment Evaluation Initiative (OAEI)
        2012. We extend this evaluation and report about the results in detail.


1 Introduction

OWL ontologies are extensively used in biomedical information systems. Prominent
examples of biomedical ontologies are the National Cancer Institute Thesaurus (N CI)
[14], the Foundational Model of Anatomy (F MA) [36] and the Systematized Nomen-
clature of Medicine Clinical Terms (S NOMED CT) [39]. These reference bio-medical
ontologies, however, are being developed independently by different groups of experts
and, as a result, they use different entity naming schemes in their vocabularies. For
example, N CI defines the entity “Myocardium”, whereas F MA uses the entity “Cardiac
Muscle Tissue” to describe the muscles that surround and power the human heart. Thus,
to integrate data among applications, it is crucial to establish correspondences (called
mappings or alignments) between the entities of their respective ontologies.
    In the last ten years, the Semantic Web and bio-informatics research communities
have extensively investigated the problem of automatically computing correspondences
between independently developed ontologies, usually referred to as the ontology match-
ing problem. For example, the Ontology Alignment Evaluation Initiative3 (OAEI) is an
annual international campaign for the systematic evaluation of ontology matching sys-
tems [10, 9, 40, 28, 1]. The matching problems in the OAEI are organized in several
tracks, with each track involving different kinds of test ontologies. For example, the
Large Biomed track involves the matching of F MA, N CI and S NOMED CT.
    OWL ontologies have well-defined semantics [5], however, many systems partici-
pating in the OAEI campaigns disregard the semantics of the input ontologies, and are
thus unable to detect and repair logical errors (e.g. unsatisfiabilities) that follow from
the union of the input ontologies and the mappings. Only the ontology matching sys-
tems S-Match [13], ASMOV [18], CODI [33, 17], KOSIMap [35], YAM++ [15] and
 3
     http://oaei.ontologymatching.org/
LogMap [20, 24] have implemented reasoning and repair techniques in the context of
the OAEI. Furthermore, LogMap was the only system successfully applying such tech-
niques in all tracks of the OAEI 2012 campaign [1].
    In this paper, we focus on the evaluation conducted in the OAEI 2012 Large Biomed
track and we provide an extension of the results presented in [1]. Concretely, we evalu-
ate the feasibility and impact of integrating state-of-the-art mapping repair techniques,
such as those implemented in Alcomo [29] or in LogMap, within the matching process.


2 Preliminaries
In this section, we first introduce the formal representation of ontology mappings. Next,
we present the notions of mapping coherence and (approximate) mapping repair. Fi-
nally, we discuss how ontology matching systems are evaluated within the OAEI.

2.1 Representation of ontology mappings
Mappings are conceptualised as tuples of the form hid, e1 , e2 , n, ρi, with id a unique
identifier for the mapping, e1 , e2 entities in the vocabulary of the relevant ontologies,
n a numeric confidence measure between 0 and 1, and ρ a relation between e1 and e2 ,
typically subsumption (i.e., e1 is more specific than e2 ), equivalence (i.e., e1 and e2 are
synonyms) or disjointness (i.e., no individual can be an instance of both e1 and e2 ) [8].
     RDF Alignment [6] is the main format used in the OAEI campaign to represent
mappings containing the aforementioned elements. Additionally, mappings are also rep-
resented as OWL 2 subclass, equivalence, and disjointness axioms [5]; mapping iden-
tifiers (id) and confidence values (n) are then represented as axiom annotations. Such
a representation enables the reuse of the extensive range of OWL 2 reasoning infras-
tructure that is currently available. Note that alternative formal semantics for ontology
mappings have been proposed in the literature (e.g., [4, 8, 31]).

2.2 Incoherent mappings and (approximate) mapping repair
The ontology O1 ∪ O2 ∪ M resulting from the integration of O1 and O2 via a set
of mappings M, may entail axioms that do not follow from O1 , O2 , or M alone. In
particular, classes that were satisfiable in O1 or O2 may become unsatisfiable w.r.t.
O1 ∪ O2 ∪ M. A set of mappings that leads to such logical errors is referred to as
incoherent [30].
Definition 1 (Mapping Incoherence). A set of mappings M is incoherent with respect
to O1 and O2 , if there exists a class A in the signature of O1 ∪ O2 such that O1 ∪ O2 6|=
A ⊑ ⊥ and O1 ∪ O2 ∪ M |= A ⊑ ⊥.
   An incoherent set of mappings M can be fixed by removing mappings from M.
This process is referred to as mapping repair (or repair for short).
Definition 2 (Mapping Repair). Let M be an incoherent set of mappings M w.r.t. O1
and O2 . A set of mappings R ⊆ M is a mapping repair for M w.r.t. O1 and O2 if
M \ R is coherent w.r.t. O1 and O2 .
    An incoherent set of mappings can be repaired in many different ways. A trivial
repair is R = M, since an empty set of mappings is obviously coherent (according
to Definition 1). Nevertheless, the objective is to remove as few mappings as possible,
which is consistent with the principle of minimal change in belief revision [11]. Such
minimal repairs are typically referred to in the literature as diagnosis — a term coined
by Reiter [34] and introduced to the field of ontology debugging in [37].

Definition 3 (Diagnosis). Let R be a repair for M with respect to O1 and O2 . R is a
diagnosis if each R′ ⊂ R is not a repair for M with respect to O1 and O2 .

    Standard justification-based ontology debugging techniques (e.g. [37, 38, 25, 41, 16,
22]) can be exploited to compute a repair (or a diagnosis) for an incoherent set of map-
pings. However, justification-based technologies do not scale when the number of un-
satisfiabilities is large (a typical scenario in mapping repair problems [19]). To address
this scalability issue, mapping repair systems usually compute an approximate repair
using incomplete reasoning techniques. An approximate repair R≈ does not guarantee
that M \ R≈ is coherent, but it will (in general) reduce significantly the number of
unsatisfiabilities caused by the original set of mappings M.


2.3 Evaluation of ontology matching systems in the OAEI

The evaluation in the OAEI campaign is carried out automatically using the infrastruc-
ture developed within the EU project SEALS [42].4 SEALS provides a repository to
store test data (e.g. OAEI datasets) and an interface to consume this data and generate
an output (e.g. set of mappings) following the accepted formats. OAEI participants have
wrapped their systems according to the SEALS interface. Hence, OAEI systems are ex-
ecuted using the same workflow, which facilitates reproducibility of the experiments.
    The quality of the mappings M computed by a matching system is often measured
in terms of precision and recall with respect to a reference set of mappings (also called
gold standard) MGS . Precision (Pre) is defined as |M ∩ MGS |/|M|, while recall (Rec)
is defined as |M ∩ MGS |/|MGS |. The F-score (F) combines precision and recall and
is usually defined as their harmonic mean (2 × Pre × Rec)/(Pre + Rec). The OAEI
also evaluates the coherence of the computed mappings M with respect to the number
of unsatisfiable classes obtained when reasoning with the input ontologies O1 and O2
together with M. Additionally, computation times are also recorded.


3 The OAEI Large BioMed track

In this section we give an overview of the test data, participating systems and coherence
results of the OAEI 2012 Large Biomed track.5 The track involves the matching of F MA
version 2.0 (78, 989 classes), N CI version 08.05d (66, 724 classes) and S NOMED CT
Jan. 2009 version (306, 591 classes) and exploits the UMLS Metathesaurus [3] as the
basis for the track’s reference mappings (see [23, 21] for details). UMLS is the most
 4
     Semantic Evaluation At Large Scale: http://www.seals-project.eu
 5
     Large BioMed track: http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/
Table 1: Mapping coherence of the top 7 systems in the OAEI 2012 Large BioMed track
                                F MA-N CI      F MA-SNOMED CT       SNOMED CT-N CI
                System
                              Unsat.  Ratio    Unsat.   Ratio      Unsat.    Ratio
                LogMapnoe       9      0.01%      10     0.003%     ≥0        ≥0%
                LogMap          9      0.01%      10     0.003%    ≥17       ≥0.005%
                ServOMap      48,743    27%    273,242     71%    ≥313,643    ≥84%
                ServOMapL     50,334    28%     99,726     26%    ≥314,939    ≥84%
                GOMMA         5,574     4%      10,752     3%     ≥266,051    ≥71%
                GOMMAbk       12,939    9%     119,657     31%    ≥313,015    ≥84%
                YAM++         50,550    29%    106,107     28%    ≥269,107    ≥72%


comprehensive effort for integrating independently-developed medical thesauri and on-
tologies, including F MA, N CI and S NOMED CT. Currently, the UMLS-based reference
mappings only include subsumption and equivalence correspondences between classes.
    The track consists of three matching problems: F MA-N CI, F MA-S NOMED CT and
S NOMED CT-N CI; the gold standard is provided by their corresponding UMLS-based
reference mappings; there are three tasks associated to each matching problem, each of
which involves different fragments of the input ontologies. In this paper we focus on
the tasks involving the matching of the whole F MA, N CI and S NOMED CT ontologies.
    We have evaluated the coherence of the mappings obtained by the top systems in
the OAEI 2012 Large BioMed track: LogMap, YAM++ [15], ServOMap [2], GOMMA
[27] and some of their variants (LogMapnoe, GOMMAbk and ServOMapL). LogMap’s
default algorithm uses ontology modules to reduce the search space, while the variant
LogMapnoe does not rely on module extraction. GOMMAbk , unlike GOMMA, exploits
specialised background knowledge. ServOMapL is a light version of ServOMap with
some features deactivated. Table 1 summarizes the obtained incoherence results, which
have been obtained using the OWL 2 reasoner HermiT [32].6 The table reports (i) num-
ber of unsatisfiable classes in O1 ∪ O2 ∪ M, where M represents the mappings com-
puted by a given system, and (ii) the ratio of unsatisfiable classes over the total num-
ber of classes. The best results were obtained by LogMap and its variant LogMapnoe;
LogMapnoe managed to additionally detect some unsatisfiable classes that were missed
by LogMap due to the fact that they fell outside the computed modules. The mappings
computed by the other matching systems led to a huge number of unsatisfiable classes.
For example, 71% of the classes in the integration of F MA and S NOMED CT via Ser-
vOMap mappings are unsatisfiable.


4 Mapping repair using Alcomo and LogMap
Alcomo and LogMap implement different techniques to repair incoherent mappings. In
the evaluation conducted in this paper, both Alcomo and LogMap are configured to use
incomplete reasoning. Thus, given two ontologies O1 and O2 and a set of mappings
 6
     In the case of S NOMED CT -N CI no OWL 2 reasoner could succeed in classifying the integrated
     ontology via mappings [19], so we used the OWL 2 EL reasoner ELK [26] instead to provide
     a lower bound on the number of unsatisfiable classes.
M between them, Alcomo and LogMap compute an approximate repair R≈ such that
M \ R≈ is almost coherent and only leads to a (relatively) small number of unsatisfi-
able classes. Next we present an overview of the Alcomo and LogMap mapping repair
techniques, the interested readers please refer to [29, 20, 24] for a full description of
these systems. Note that LogMap was originally implemented as an ontology match-
ing system, however, it can also operate as a stand-alone mapping repair system. From
now on we will refer to LogMap’s repair module as LogMap-Repair. Alcomo, unlike
LogMap, has specifically been designed to repair incoherent mappings.

4.1 The Alcomo mapping repair system
Alcomo implementes two reasoning components. One component is a pattern-based
reasoning technique that is incomplete for detecting all minimal incoherent mapping
subsets. However, this approach will detect a large amount of conflicting pairs of map-
pings. The basic idea is to first classify both O1 and O2 using an OWL 2 reasoner. Given
two mapping axioms A1 ≡ C2 ∈ M and B1 ≡ D2 ∈ M with A1 and B1 defined in O1
and C2 and D2 defined in O2 , Alcomo checks if O1 |= A1 ⊑ B1 and O2 |= C2 ⊑ ¬D2 .
If this is the case, it can be concluded O1 ∪ O2 ∪ M |= C2 ⊑ ⊥, i.e., C2 is unsatisfiable
in the ontology integrated via M. Thus, the mapping set {A1 ≡ C2 , B1 ≡ D2 } is inco-
herent. This basic idea is extended and four patterns are defined that take subsumption
and equivalence mappings between classes and properties into account.
     Depending on the configuration of Alcomo, these techniques can be accompanied
with complete reasoning techniques that are built on the classical black-box approaches
for repairing ontologies (e.g. [38, 25, 41, 16]). The basic idea of such a combined ap-
proach is to compute a preliminary superset of a solution based on the incomplete rea-
soning techniques. This intermediate result is then checked with complete reasoning
techniques and further reduced if required. If complete reasoning techniques are acti-
vated, it can be guaranteed that Alcomo generates a coherent mapping set by removing
R from M. Moreover, R will be a diagnosis. Without activating complete reasoning
techniques, Alcomo computes an approximate repair R≈ and it cannot guarantee the
coherence of the output mapping set. The approximate repair R≈ , however, will always
be a subset (never a superset) of the diagnosis.
     Mappings are usually annotated with confidences. Thus, the quality of a diagno-
sis can be defined in terms of the aggregated confidence of R. An intuitive idea is to
remove mapping sets with less aggregated confidence. Alcomo aims to solve two in-
terconnected problems at the same time: (i) the reasoning problem of detecting and
repairing incoherent mappings, and (ii) the optimization problem of taking confidences
into account in the appropriate way. With respect to the optimization problem, two dif-
ferent types of diagnosis have been defined. A global optimal diagnosis is introduced
as the diagnosis that removes as less confidence as possible. If all correspondences are
weighted equally with respect to their (positive) confidence values, a global optimal di-
agnosis will be a diagnosis that is a smallest diagnosis in numbers of correspondences.
This type of diagnosis is, however, computed by an exhaustive search algorithm, and
thus it is not feasible to compute the global optimal diagnosis for large repair problems.
     The second type of a diagnosis is called a local optimal diagnosis. Such a diagnosis
can be constructed by a simple greedy approach starting with an empty mapping set
Algorithm 1 Alcomo’s algorithm with local optimal diagnosis & incomplete reasoning
Input: O1 , O2 : input ontologies; M: input mappings
Output: M′ : output mappings; R≈ : approximate mapping repair.
 1: M′ := ∅
 2: R≈ := ∅
 3: classify O1 and O2
 4: C := ConflictPairs(O1 , O2 , M)
 5: for each m ∈ M do           ⊲ iterate over M in descending order with respect to confidences
 6:     coh := true
 7:     for each m′ ∈ M′ do
 8:         if (m′ , m) ∈ C then
 9:             coh := false
10:             break
11:         end if
12:     end for
13:     if coh = true then M′ := M′ ∪ {m}
14:     else R≈ := R≈ ∪ {m}
15: end for
16: return hM′ , R≈ i


M′ that is extended step by step by adding mappings from M. These mappings are
ordered with respect to the confidence values starting with the highest confidence. Each
time a mapping is added to M′ , the coherence is checked via a combination of pattern-
based and complete reasoning techniques. If M′ becomes incoherent, the mapping is
not added to M′ . The resulting diagnosis R = M \ M′ is also a minimal hitting
set [34] over all conflicts, however, in general it is not a smallest confidence weighted
diagnosis. A proof for this claim and a detailed explanation of an improved variant of
this algorithm can be found in Section 6.1 in [29].
    Both algorithms can be executed with complete reasoning activated or deactivated.
In the latter case, only those logical errors that can be detected by the pattern-based
reasoning approach are taken into account. In our experiments we have applied Al-
como in the setting that aims to compute a local optimal diagnosis using incomplete
pattern-based reasoning techniques only. The corresponding pseudocode is shown in
Algorithm 1. In Step 4 the pattern-based reasoning techniques described above are used
to compute a set of conflicting pairs of mappings. Each of these pairs is incoherent with
respect to O1 and O2 . Note that most of the computational effort is dedicated to this
(preprocessing) step. The remaining part of the algorithm requires no further reasoning
and it is only required to check whether a certain combination of two mappings appears
as a previously computed conflict pair.

4.2 The LogMap-Repair system
Algorithm 2 shows the pseudocode of the algorithm implemented by LogMap-Repair.
Steps 1 and 2 initialise the output mappings M′ with the input mappings M and the
repair set R≈ with the empty set. Note that LogMap-Repair splits equivalence map-
pings into two equivalent subsumption mappings. LogMap-Repair encodes the input
Algorithm 2 LogMap-Repair algorithm based on Horn propositional reasoning
Input: O1 , O2 : input ontologies; M: input mappings
Output: M′ : output mappings; R≈ : approximate mapping repair.
 1: M′ := M
 2: R≈ := ∅
 3: hP1 , P2 i := PropEncoding(O1 , O2 )
 4: for each C ∈ OrderedVariables(P1 ∪ P2 ) do
 5:     PC := P1 ∪ P2 ∪ M′ ∪ {true → C}
 6:     hsat, M⊥ i := DowlingGallier(PC )
 7:     if sat = false then
 8:         Rep := ∅
 9:         rep size := 1
10:         repeat
11:             for each subset RC of M⊥ of size rep size do
12:                 sat := DowlingGallier(PC \ RC )
13:                 if sat = true then Rep := Rep ∪ {RC }
14:             end for
15:             rep size := rep size + 1
16:         until Rep 6= ∅
17:         RC := element of Rep with minimum aggregated confidence.
18:         M′ := M′ \ RC
19:         R≈ := R≈ ∪ RC
20:     end if
21: end for
22: return hM′ , R≈ i


ontologies O1 and O2 as Horn propositional theories P1 and P2 (Step 3) and exploits
this encoding to subsequently detect unsatisfiable classes in an efficient and sound way
during the repair process. The theory P1 (resp. P2 ) consists of the following Horn rules:

 – A rule A → B for all distinct classes A, B such that A is subsumed by B in O1
   (resp. in O2 ); subsumption relations can be determined using either an OWL 2
   reasoner, or syntactically (in an incomplete way).
 – Rules Ai ∧ Aj → false (1 ≤ i < j ≤ n) for each disjointness axiom of the form
   DisjointClasses (A1 , . . . , An ).
 – A rule A1 ∧ . . . ∧ An → B for each subclass or equivalence axiom having the
   intersection of A1 , . . . An as subclass expression and B as superclass.

     In Step 4, propositional variables in P1 (resp. in P2 ) are ordered such that a variable
C in P1 (resp. in P2 ) comes before D whenever D is subsumed by C in O1 (resp. in
O2 ). This is a well-known repair strategy: subclasses of an unsatisfiable class are also
unsatisfiable and hence before repairing an unsatisfiable class one first needs to repair its
superclasses. Satisfiability of a propositional variable C is determined by checking sat-
isfiability of the propositional theory PC consisting of (i) the rule (true → C); (ii) the
propositional representations P1 and P2 ; and (iii) the current set of output mappings
M′ (seen as propositional implications).
Algorithm 3 Evaluation of Alcomo and LogMap-Repair
Input: O1 , O2 : input ontologies; MGS : reference mappings; MS: an ontology matching system
 1: Compute mappings M (I) between O1 and O2 using system MS
 2: Store matching time (II)
 3: Compute F-score (III) of M with respect to MGS
 4: Get unsatisfiable classes of O1 ∪ O2 ∪ M (IV) using a reasoner
 5: Compute approximate repair R≈ (V) using Alcomo system                  ⊲ See Algorithm 1
 6: Store repair time (VI)
 7: Compute F-score (VII) of M \ R≈ with respect to MGS
 8: Get unsatisfiable classes of O1 ∪ O2 ∪ M \ R≈ (VIII) using a reasoner
 9: Compute approximate repair R≈ (IX) using LogMap-Repair system          ⊲ See Algorithm 2
10: Store repair time (X)
11: Compute F-score (XI) of M \ R≈ with respect to MGS
12: Get unsatisfiable classes of O1 ∪ O2 ∪ M \ R≈ (XII) using a reasoner


    LogMap-Repair implements the classical Dowling-Gallier algorithm for proposi-
tional Horn satisfiability [7, 12]. LogMap-Repair’s implementation of Dowling-Gallier’s
algorithm also records all mappings potentially involved in an unsatisfiability. Thus, a
call to Dowling-Gallier returns a satisfiability value sat and, optionally, the (overesti-
mated) set of conflicting mappings M⊥ (see Steps 6 and 12). An unsatisfiable class
C is repaired by discarding conflicting mappings for C (Lines 8 to 19). Thus, subsets
RC of M⊥ of increasing size are then identified until a repair is found (Steps 10-16).7
Thus, LogMap-Repair does not compute a diagnosis for the unsatisfiable class C but
rather the repairs of smallest size. If several repairs of a given size exist, the one with the
lowest aggregated confidence is selected according to the confidence values assigned to
mappings (Step 17). Finally, Steps 18 and 19 update the output mappings M′ and the
approximate mapping repair R≈ by extracting and adding RC , respectively.
    Algorithm 2 ensures that P1 ∪ P2 ∪ M′ ∪ {true → C} is satisfiable for each C
occurring in P1 ∪P2 . The propositional encoding of O1 and O2 is, however, incomplete
and hence the algorithm does not ensure satisfiability of each class in O1 ∪ O2 ∪ M′ .
Nevertheless, as shown in Section 5, the number of unsatisfiable classes remaining after
computing an approximate repair R≈ is typically small.


5 Evaluation

In this section we evaluate the feasibility of integrating the Alcomo and LogMap-Repair
systems within the ontology matching process. For each of the matching problems of
the OAEI 2012 Large BioMed track and for each of the top matching systems in this
track (see Section 3) we have conducted the evaluation in Algorithm 3. The Roman
numbers refer to measurements that are stored during the evaluation. We have run the
evaluation using the SEALS interface in a high performance server with 16 CPUs and
allocating 15 Gb RAM.
 7
     The size of M⊥ and RC are in practice manageable, and thus the complexity of performing
     Step 11 in Algorithm 2 is not critical.
                         Table 2: Mapping repair in the F MA-N CI problem.
                 Matching Results OAEI 2012            Repair with Alcomo               Repair with LogMap
 System           I      II     III     IV           V     VI     VII    VIII         IX     X      XI    XII
                                                      ≈                                ≈
                |M| t (s)        F   Unsat.        |R | t (s)      F    Unsat.       |R | t (s)     F   Unsat.
 ServOMap       4,932      204    0.819   48,743     97     321     0.820    9        115     21    0.819   9
 ServOMapL      5,400      251    0.841   50,334    126     342     0.841    9        166     24    0.839   9
 GOMMA          5,686      217    0.839   5,574     123     321     0.839    15       142     20    0.838   9
 GOMMAbk        6,330      231    0.837   12,939    184     341     0.836    29       259     33    0.833   9
 YAM++          5,476     1,304   0.862   50,550    116     324     0.862    10       141     21    0.861   9
 Average        5,565     441     0.840   33,628    129     330     0.840    14       165     24    0.838   9


                 Table 3: Mapping repair in the F MA-S NOMED CT problem.
                 Matching Results OAEI 2012             Repair with Alcomo               Repair with LogMap
 System           I      II     III     IV            V     VI     VII    VIII         IX     X      XI   XII
                                                       ≈                                ≈
                |M|     t (s)    F    Unsat.        |R | t (s)      F    Unsat.       |R | t (s)     F   Unsat.
 ServOMap       12,642  532       0.770 273,242      829    2,672    0.749     0      2,640   284   0.721   0
 ServOMapL      13,210  517       0.794 99,729       975    2,779    0.767     0      2,953   274   0.752   0
 GOMMA          11,648 1,994      0.291 10,752       463    2,840    0.293     0       618    245   0.291   0
 GOMMAbk        25,660 1,893      0.708 119,657     1,726   2,809    0.698   1,363    5,295   314   0.678   0
 YAM++          14,088 23,900     0.765 106,107      783    2,800    0.760     0      3,461   262   0.720   0
 Average        15,450    5,767   0.666 121,897      955    2,780 0.653      273      2,993   276   0.632   0


    Elements in M and R≈ (I, V and IX) represent subsumption mappings. As in Table
1, the unsatisfiable classes (IV, VIII and XII) in the F MA-N CI and F MA-S NOMED
CT matching problems have been computed using the HermiT reasoner, while in the
S NOMED CT-N CI problem we have provided a lower bound using the ELK reasoner.
    Tables 2-4 shows the result of the conducted evaluation using Algorithm 3. The
results, which suggest that Alcomo and LogMap-Repair scale and produce very good
results in practice, can be summarized as follows:
  i the computed (approximate) repairs are not aggressive and the average size of the
    repairs ranges from 5% (Alcomo) to 11% (LogMap-Repair) of the input mappings,
 ii the repair process, although it requires an (appreciable) additional computation
    time, does not represent a bottleneck in the matching process,
iii the impact on the F-score is (on average) negligible,8 and
iv the incoherence of the repaired mapping sets has been significantly reduced in all
    test cases and completely removed in some of them.
    Regarding the comparison between Alcomo and LogMap-Repair, Tables 2-4 also
show that LogMap-Repair is 10 to 15 times faster compared to Alcomo, although Al-
como runtimes are slightly less affected by different mapping inputs; Alcomo is less
aggressive and its repairs involve (in general) a smaller number of mappings, neverthe-
less LogMap-Repair results are better in terms of mapping coherence; finally the impact
on the F-score is (on average) better in Alcomo.
 8
     The computed (approximate) repairs have, in general, a negative impact on the recall which is
     compensated with an increase of the precision.
               Table 4: Mapping repair in the S NOMED CT-N CI problem.
              Matching Results OAEI 2012             Repair with Alcomo            Repair with LogMap
 System       I      II     III     IV             V    VI     VII     VIII       IX    X     XI    XII
                                                    ≈                              ≈
             |M|    t (s)    F    Unsat.         |R | t (s)     F     Unsat.     |R | t (s)    F   Unsat.
 ServOMap    24,924 654       0.664   ≥313,643    937    3,223   0.663  ≥35      1,671   276   0.666 ≥296
 ServOMapL   27,928 738       0.678   ≥314,939   1,076   2,917   0.677 ≥2,055    1,656   314   0.679 ≥1,241
 GOMMA       27,386 1,820     0.606   ≥266,051   1,903   2,947   0.603 ≥2,085    2,949   303   0.607 ≥37
 GOMMAbk     34,090 1,940     0.635   ≥313,015   2,720   3,098   0.638 ≥30,583   5,003   435   0.641  ≥1
 YAM++       28,206 30,155    0.680   ≥269,107    757    2,964   0.679  ≥0       1,049   305   0.680  ≥0
 Average     28,507   7,061   0.653 ≥295,351     1,479   3,030 0.652   ≥6,952    2,465   326 0.655   ≥315


6 Conclusions

In the paper we have pointed out that many ontology matching systems participating
in the OAEI campaign do not implement or reuse methods to assess the coherence of
the generated mappings. As a consequence, a large number of classes become unsat-
isfiable when reasoning with the matched ontologies together with the mappings. We
have applied Alcomo and LogMap-Repair systems on the data sets and mapping results
of the OAEI 2012 Large Biomed track to support two claims regarding the application
of (approximate) mapping repair techniques: (i) it is feasible with respect to robustness
and runtimes, and (ii) it has a significant impact on the quality of the mappings with
respect to their logical coherence.
     Our results clearly support both claims and should encourage ontology matching
system developers to use Alcomo and LogMap-Repair, or to develop their own repair
techniques. On the one hand, Alcomo and LogMap-Repair have been successfully ap-
plied to all data sets and matching systems. LogMap-Repair requires in all cases less
time to compute a repair than the necessary time to compute the mappings; while Al-
como’s times, although slower than LogMap-Repair’s, are in many cases in line with
the required matching time. On the other hand, Alcomo and LogMap-Repair reduced
significantly the incoherence of the input mappings, and hence increasing their quality.
Furthermore, the F-score stays relatively stable when applying the repairs.
     The results also suggest that Alcomo and LogMap-Repair could complement each
other. LogMap-Repair is more efficient in terms of runtimes and mapping coherence
while Alcomo is less aggressive (i.e. removes less mappings even in those cases where
the same mapping coherence results are achieved) and its impact in the F-score is
smaller. Future work will involve the design and development of a repair algorithm
combining the techniques implemented in Alcomo and LogMap-Repair.
     Finally, we believe that our evaluation is not only beneficial for ontology matching
system developers. It also serves as a good basis to compare ontology and mapping
repair systems, in terms of efficiency and completeness, in a challenging scenario as the
one exposed in the OAEI Large Biomed track. We highly encourage developers of such
systems to compare their results against the results presented in this paper. The relevant
data sets and mappings are available online at http://www.cs.ox.ac.uk/isg/
projects/SEALS/oaei/.
Acknowledgements

The research presented in this paper was financed by the Seventh Framework Pro-
gram (FP7) of the European Commission under Grant Agreement 318338, the Optique
project. Horrocks, Jiménez-Ruiz, and Cuenca Grau were also partially supported by the
EPSRC projects ExODA, LogMap and Score!


References
 1. Aguirre, J., Eckert, K., Euzenat, J., Ferrara, A., van Hage, W.R., Hollink, L., Meilicke, C.,
    Nikolov, A., Ritze, D., Scharffe, F., Shvaiko, P., Sváb-Zamazal, O., Trojahn, C., Jiménez-
    Ruiz, E., Cuenca Grau, B., Zapilko, B.: Results of the Ontology Alignment Evaluation Ini-
    tiative 2012. In: Ontology Matching Workshop (2012)
 2. Ba, M., Diallo, G.: Large-scale biomedical ontology matching with ServOMap. IRBM 34(1),
    56 – 59 (2013), Special issue on digital technologies for healthcare
 3. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical ter-
    minology. Nucleic Acids Research 32, 267–270 (2004)
 4. Borgida, A., Serafini, L.: Distributed Description Logics: Assimilating information from peer
    sources. J. Data Sem. 1, 153–184 (2003)
 5. Cuenca Grau, B., Horrocks, I., Motik, B., Parsia, B., Patel-Schneider, P.F., Sattler, U.: OWL
    2: The next step for OWL. J. Web Sem. 6(4), 309–322 (2008)
 6. David, J., Euzenat, J., Scharffe, F., Trojahn, C.: The Alignment API 4.0. Semantic Web 2(1),
    3–10 (2011), http://alignapi.gforge.inria.fr/format.html
 7. Dowling, W.F., Gallier, J.H.: Linear-time algorithms for testing the satisfiability of proposi-
    tional Horn formulae. J. Log. Prog. 1(3), 267–284 (1984)
 8. Euzenat, J.: Semantic precision and recall for ontology alignment evaluation. In: Int’l Joint
    Conf. on Artif. Intell. (IJCAI). pp. 348–353 (2007)
 9. Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Ontology alignment
    evaluation initiative: Six years of experience. J. Data Sem. 15, 158–192 (2011)
10. Euzenat, J., Shvaiko, P.: Ontology matching. Springer (2007)
11. Gaerdenfors, P.: Belief revision: An introduction. In: Belief Revision. Cambridge University
    Press (1992)
12. Gallo, G., Urbani, G.: Algorithms for testing the satisfiability of propositional formulae. J.
    Log. Prog. 7(1), 45–61 (1989)
13. Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic Matching: Algorithms and implemen-
    tation. J. Data Sem. 9, 1–38 (2007)
14. Golbeck, J., Fragoso, G., Hartel, F.W., Hendler, J.A., Oberthaler, J., Parsia, B.: The National
    Cancer Institute’s Thésaurus and Ontology. J. Web Sem. 1(1), 75–80 (2003)
15. Hoa Ngo, D., Bellahsene, Z.: YAM++ A combination of graph matching and machine learn-
    ing approach to ontology alignment task. J. Web Sem. 16 (2012)
16. Horridge, M., Parsia, B., Sattler, U.: Laconic and precise justifications in OWL. In: Int’l Sem.
    Web Conf. (ISWC). pp. 323–338 (2008)
17. Huber, J., Sztyler, T., Noessner, J., Meilicke, C.: CODI: Combinatorial optimization for data
    integration: results for OAEI 2011. In: Ontology Matching Workshop (2011)
18. Jean-Mary, Y.R., Shironoshita, E.P., Kabuka, M.R.: Ontology matching with semantic veri-
    fication. J. Web Sem. 7(3), 235–251 (2009)
19. Jiménez-Ruiz, E., Cuenca Grau, B., Horrocks, I.: On the feasibility of using OWL 2 DL
    reasoners for ontology matching problems. In: OWL Reasoner Evaluation Workshop (2012)
20. Jiménez-Ruiz, E., Cuenca Grau, B.: LogMap: Logic-based and Scalable Ontology Matching.
    In: Int’l Sem. Web Conf. (ISWC). pp. 273–288 (2011)
21. Jiménez-Ruiz, E., Cuenca Grau, B., Horrocks, I.: Exploiting the UMLS Metathesaurus in
    the Ontology Alignment Evaluation Initiative. In: Exploiting Large Knowledge Repositories
    (E-LKR) Workshop (2012)
22. Jiménez-Ruiz, E., Cuenca Grau, B., Horrocks, I., Berlanga, R.: Ontology integration using
    mappings: Towards getting the right logical consequences. In: Eur. Sem. Web Conf. (ESWC).
    pp. 173–187 (2009)
23. Jiménez-Ruiz, E., Cuenca Grau, B., Horrocks, I., Berlanga, R.: Logic-based assessment of
    the compatibility of UMLS ontology sources. J. Biomed. Sem. 2 (2011)
24. Jiménez-Ruiz, E., Cuenca Grau, B., Zhou, Y., Horrocks, I.: Large-scale interactive ontology
    matching: Algorithms and implementation. In: European Conf. on Artif. Intell. (ECAI). pp.
    444–449 (2012), http://code.google.com/p/logmap-matcher/
25. Kalyanpur, A., Parsia, B., Horridge, M., Sirin, E.: Finding all justifications of OWL DL
    entailments. In: Int’l Sem. Web Conf. (ISWC). pp. 267–280 (2007)
26. Kazakov, Y., Krötzsch, M., Simancik, F.: Concurrent classification of EL ontologies. In: Int’l
    Sem. Web Conf. (ISWC). pp. 305–320 (2011)
27. Kirsten, T., Gross, A., Hartung, M., Rahm, E.: GOMMA: a component-based infrastructure
    for managing and analyzing life science ontologies and their evolution. J. Biomed. Sem. 2
    (2011)
28. Meilicke, C., Svab-Zamazal, O., Trojahn, C., Jiménez-Ruiz, E., Aguirre, J., Stuckenschmidt,
    H., Cuenca Grau, B.: Evaluating ontology matching systems on large, multilingual and
    real-world test cases. In: ArXiv e-prints, 1208.3148 (2012), http://arxiv.org/abs/
    1208.3148v1
29. Meilicke, C.: Alignment Incoherence in Ontology Matching. Ph.D. thesis, University of
    Mannheim (2011)
30. Meilicke, C., Stuckenschmidt, H.: Incoherence as a basis for measuring the quality of ontol-
    ogy mappings. In: Ontology Matching Workshop (2008)
31. Meilicke, C., Stuckenschmidt, H., Tamilin, A.: Reasoning support for mapping revision. J.
    Log. Comput. 19(5) (2009)
32. Motik, B., Shearer, R., Horrocks, I.: Hypertableau reasoning for description logics. J. Artif.
    Intell. Res. (JAIR) 36, 165–228 (2009)
33. Niepert, M., Meilicke, C., Stuckenschmidt, H.: A probabilistic-logical framework for ontol-
    ogy matching. In: AAAI Conf. on Artif. Intell. pp. 1413–1418 (2010)
34. Reiter, R.: A theory of diagnosis from first principles. Artificial Intelligence 32, 57–95 (1987)
35. Reul, Q., Pan, J.Z.: KOSIMap: Use of description logic reasoning to align heterogeneous
    ontologies. In: Description Logics Workshop (2010)
36. Rosse, C., Mejino Jr., J.: A reference ontology for biomedical informatics: the Foundational
    Model of Anatomy. J. Biomed. Informatics 36(6), 478–500 (2003)
37. Schlobach, S., Cornet, R.: Non-standard reasoning services for the debugging of description
    logic terminologies. In: Int’l Joint Conf. on Artif. Intell. (IJCAI). pp. 355–362 (2003)
38. Schlobach, S., Huang, Z., Cornet, R., van Harmelen, F.: Debugging incoherent terminologies.
    J. Autom. Reasoning 39(3) (2007)
39. Schulz, S., Cornet, R., Spackman, K.A.: Consolidating SNOMED CT’s ontological commit-
    ment. Applied Ontology 6(1), 1–11 (2011)
40. Shvaiko, P., Euzenat, J.: Ontology Matching: State of the art and future challenges. IEEE
    Trans. Knowledge and Data Eng. (2012)
41. Suntisrivaraporn, B., Qi, G., Ji, Q., Haase, P.: A modularization-based approach to finding
    all justifications for OWL DL entailments. In: Asian Sem. Web Conf. (ASWC) (2008)
42. Wrigley, S.N., Garcia-Castro, R., Nixon, L.J.B.: Semantic evaluation at large scale (SEALS).
    In: The 21st World Wide Web Conf., WWW (Companion Volume). pp. 299–302 (2012)