Direct computation of diagnoses for ontology alignment?

    Kostyantyn Shchekotykhin, Philipp Fleiss, Patrick Rodler, and Gerhard Friedrich

                       Alpen-Adria Universität, Klagenfurt, 9020 Austria
                             firstname.lastname@aau.at


        Abstract. Modern ontology debugging methods allow efficient identification and
        localization of faulty axioms in an ontology. However, in many use cases such as
        ontology alignment the ontologies might include many conflict sets, i.e. sets of
        axioms preserving the faults, thus making ontology diagnosis infeasible. In this
        paper we present a debugging approach based on a direct computation of diag-
        noses that omits calculation of conflict sets. The evaluation results show that the
        approach is practicable and is able to identify a fault in adequate time.


1     Algorithm details and evaluation
Most of the modern debugging approaches apply the model-based diagnosis [3] and
compute diagnoses using conflict sets CS, i.e. irreducible sets of axioms ax i in an
ontology O that preserve a fault. A user should modify at least all axioms of a diagnosis
in order to be able to formulate the intended (target) ontology Ot . The computation of
the conflict sets can be done within a polynomial number of calls to the reasoner, e.g.
by Q UICK XP LAIN algorithm [2]. To identify a diagnosis of cardinality |D| = m the
hitting set algorithm suggested in [3] requires computation of m conflict sets. In the use
cases when an ontology is generated by an ontology matching system the number of
conflict sets m can be large, thus making the ontology debugging practically infeasible.
     In this paper we present two algorithms I NV-HS-T REE and I NV-Q UICK XP LAIN,
which inverse the standard model-based approach and compute diagnoses directly, rather
than by means of conflict sets. I NV-Q UICK XP LAIN partitions the initial set of axioms
a given faulty ontology into two equal subsets. The algorithm continues to partition the
sets until it identifies that the set D0 such that O \ D0 fulfills all requirements and its
partitions are not. In further iterations the algorithm minimizes the D0 by splitting it
into sub-problems of the form D = D0 ∪ O∆ , where O∆ contains only one axiom. In
the case when D is a diagnosis and D0 is not, the algorithm decides that O∆ is a sub-
set of the sought diagnosis. Just as the original algorithm, I NV-Q UICK XP LAIN always
terminates and returns either a diagnosis D or “no diagnosis”. In order to enumerate
all possible diagnoses we modified the HS-T REE algorithm [3] to accept diagnoses as
node labels instead of conflict sets.
     In the diagnosis discrimination settings [4] the ontology debugger acquires new
knowledge by asking the user whether some axiom should be entailed by the target
ontology Ot or not. Given the answer the algorithm can invalidate some of the diagnoses
that are used as labels of tree nodes. Given such a node, I NV-HS-T REE removes its label
and places it to the list of open nodes. Moreover, the algorithm removes all the nodes of
a subtree originating from this node. After all nodes with invalid labels are cleaned-up,
?
    This research is funded by Austrian Science Fund (Project V-Know, contract 19996).
the algorithm attempts to reconstruct the tree by reusing the remaining valid diagnoses.
In the direct approach limiting the number of diagnoses used to compute a query to
some reasonable number. e.g. n = 10 results in a small size of the search tree, thus,
using less memory in comparison to the standard approach.
     We evaluated the direct ontology debugging technique using aligned ontologies gen-
erated in the framework of OAEI 2011 [1]. These ontologies represent a real-world
scenario in which a user generated ontology alignments by means of some (semi-
)automatic tools. The Conference test suite we included 146 classifiable ontologies and
computed 1, 9 and 30 diagnoses with both HS-T REE and I NV-HS-T REE. For 133 on-
tologies both approaches were able to compute the required amount of diagnoses. In the
experiment where only 1 diagnosis was requested, the direct approach outperforms the
HS-T REE as it was expected. In the next two experiments the time difference between
the approaches decreases. However, the direct approach was able to avoid a rapid in-
crease of computation time for very hard cases. In the 13 cases HS-T REE was unable to
find all requested diagnoses in each experiment. Within 2 hours the algorithm calculated
only 1 diagnosis for csa-conference-ekaw and for ldoa-conference-confof it
was able to find 1 and 9 diagnoses, whereas I NV-HS-T REE required 9 sec. for 1, 40
sec. for 9 and 107 sec. for 30 diagnoses on average.
     Moreover, in the first experiment we evaluated the efficiency of the interactive direct
debugging approach applied to the 13 “hard” ontologies. We selected the target diagno-
sis randomly among all diagnoses that included only invalid alignments suggested by a
system. The latter can be computed using the set of correct alignments provided by the
organizers of OAEI 2011. In the experiment the used the Entropy scoring function [4]
with prior fault probabilities of axioms corresponding to aliments set to 1 − v, where
v is the confidence value of the matcher. All axioms of the aligned ontologies were
assumed to be correct and were assigned small probabilities. The debugging was then
applied to the set of all alignments returned by a matcher. The experiment shows that
the system was able to identify the target diagnosis efficiently requiring less than 4 sec.
in 75% of all cases to compute a query. The system’s performance decreased only in
the cases when a reasoner required much time to verify the consistency of an ontology.
     In the second scenario we applied the direct method to unsatisfiable and classifiable
within 2 hours ontologies, generated for the Anatomy problem. The source ontologies
O1 and O2 include 11545 and 4838 axioms correspondingly, whereas the size of the
alignments varies between 1147 and 1461 axioms. The target diagnosis selection pro-
cess was performed in the same way as in the first experiment. The results of the exper-
iment show that the target diagnosis can be computed within 40 second in an average
case. Moreover, I NV-HS-T REE slightly outperformed HS-T REE.
References
1. Euzenat, J., Ferrara, A., van Hage, W.R., Hollink, L., Meilicke, C., Nikolov, A., Ritze, D.,
   Scharffe, F., Shvaiko, P., Stuckenschmidt, H., Sváb-Zamazal, O., dos Santos, C.T.: Final re-
   sults of the Ontology Alignment Evaluation Initiative 2011. In: Proceedings of the 6th Inter-
   national Workshop on Ontology Matching. pp. 1–29. CEUR-WS.org (2011)
2. Junker, U.: QUICKXPLAIN: Preferred Explanations and Relaxations for Over-Constrained
   Problems. In: Proc. 19th National Conference on Artificial Intelligence. pp. 167–172 (2004)
3. Reiter, R.: A Theory of Diagnosis from First Principles. Artificial Intelligence 32(1), 57–95
   (1987)
4. Shchekotykhin, K., Friedrich, G., Fleiss, P., Rodler, P.: Interactive ontology debugging : two
   query strategies for efficient fault localization. Journal of Web Semantics 12, 88–103 (2012)