OntoMerge: A System for Merging DL-Lite Ontologies Zhe Wang1 , Kewen Wang2 , Yifan Jin2 , and Guilin Qi3,4 1 University of Oxford, United Kingdom 2 Griffith University, Australia 3 Southeast University, China 4 State Key Laboratory for Novel Software Technology Nanjing University, Nanjing, China Abstract. Merging multi-sourced ontologies in a consistent manner is an important and challenging research topic. In this paper, we propose a novel approach for merging DL-LiteN bool ontologies by adapting the classical model-based belief merging approach, where the minimality of changes is realised via a semantic notion, model distance. Instead of using classical DL models, which may be infinite structures in general, we define our merging operator based on a new semantic characterisation for DL-Lite. We show that subclass relation w.r.t. the result of merging can be checked efficiently via a QBF reduction. We present our system OntoMerge, which effectively answers subclass queries on the resulting ontology of merging, without first computing the merging results. Our system can be used for answering subclass queries on multiple ontologies. 1 Introduction Ontologies are widely used for sharing and reasoning over domain knowledge, and their underlying formalisms are often description logics (DLs). To effec- tively answer queries, ontologies from heterogeneous sources and contributed by various authors are often needed. However, ontologies developed by multiple authors under different settings may contain overlapping, conflicting and inco- herent domain knowledge. The ultimate goal of ontology merging is to obtain a single consistent ontology that preserves as much knowledge as possible from two or more heterogeneous ontologies. This is in contrast to ontology matching [5], whose goal is to align entities (with different name) between ontologies, and which is often a pre-stage of ontology merging. Existing merging systems often adopt formula-based approaches to deal with logical inconsistencies [10; 9; 14]. Most of such approaches can be described as follows: the system first combine the ontologies by taking their union; then, if any inconsistency is detected (through a standard reasoning), it pinpoints the axioms which (may) cause inconsistency; and finally, remove certain axioms to retain consistency. However, such an approach is sometimes unsatisfactory because it is not fine-grained either in the way it measures the minimality of changes, and thus it is often unclear how close the result of merging is to the source ontologies semantically; or in the way it resolve inconsistency. In [12], an attempt is made to provide some semantic justification for the minimality of changes, however, the result of merging is still syntax-dependant and is often a set of ontologies. On the other hand, model-based merging operators have been intensively studied in propositional logic, which are syntax-independent and usually satisfy more rationality postulates than formula-based ones. However, a major chal- lenge in adapting model-based merging techniques to DLs is that DL models are generally infinite structures and the number of models of a DL ontology is infinite. Several notions of model distance are defined on classical DL models for ontology revision [13]. Mathematically, it is possible to define a distance o classical DL models. Such a distance is computationally limited as it is unclear how to develop an algorithm for the resulting merging operator. A desirable so- lution is to define ontology merging operators based on a suitable finite semantic characterisation instead of classical DL models. In this paper, we focus on merging ontologies expressed as DL-Lite TBoxes, which can be also accompanied with the ontology-based data access (OBDA) framework for data integration [2]. We propose a novel approach for merging ontologies by adapting a classical model-based belief merging approach, where the minimality of changes is realised via a semantic notion, model distance. Instead of using classical DL models, which may be infinite structures in general, we define our merging operator based on the notion of types. We show that subclass relation w.r.t. the result of merging can be checked efficiently via a QBF reduction, which allows us to make use of the off-the-shelf QBF solvers [8]. We present our system OntoMerge, which effectively answers subclass queries on merging results, without first computing the merging results. Our system can be used to answer subclass queries on multiple ontologies. 2 A New Semantic Characterisation In our approach, it is sufficient to consider a finite yet large enough signature. A signature S is a union of four disjoint finite sets SC , SR , SI and SN , where SC is the set of atomic concepts, SR is the set of atomic roles, SI is the set of individual names and SN is the set of natural numbers in S. We assume 1 is always in SN . Formally, given a signature S, a DL-LiteN bool language has the following syntax [1]: R ← P | P− S ← P | ¬P B←>|A| >nR C ← B | ¬C | C1 u C2 where n ∈ SN , A ∈ SC and P ∈ SR . B is called a basic concept and C is called a general concept. BS denotes the set of basic concepts on S. We write ⊥ for ¬>, ∃R for ≥ 1 R, and C1 t C2 for ¬(¬C1 u ¬C2 ). Let R+ = P , where P ∈ SR , whenever R = P or R = P − . A TBox T is a finite set of concept axioms of the form C1 v C2 , where C1 and C2 are general concepts. An ABox A is a finite set of membership assertions of the form C(a) or S(a, b), where a, b are individual names. In this paper, an ontology is represented as a DL TBox. The classical DL semantics are given by models. A TBox T is consistent with an ABox A if T ∪ A has at least one model. A concept or role is satisfiable in T if it has a non-empty interpretation in some model of T . A TBox T is coherent if all atomic concepts and atomic roles in T are satisfiable. Note that a coherent TBox must be consistent. TBox T entails an axiom C v D, written T |= C v D, if all models of T satisfy C v D. Two TBoxes T1 , T2 are equivalent, written T1 ≡ T2 , if they have the same models. Now, we introduce a semantic characterisation for DL-Lite TBoxes in terms of types. A type τ ⊆ BS is a set of basic concepts over S, such that > ∈ τ , and > n R ∈ τ implies > m R ∈ τ for each pair m, n ∈ SN with m < n and each (inverse) role R ∈ SR ∪ { P − | P ∈ SR }. Type τ satisfies basic concept B if B ∈ τ , ¬C if τ does not satisfy C, and C1 u C2 if τ satisfies both C1 and C2 . Given a TBox T , type τ satisfies T if τ satisfies concept ¬C1 t C2 for each axiom C1 v C2 in T . For a TBox T , define TM(T ) to be the maximal set of types satisfying the following conditions: (1) all the types in TM(T ) satisfy T ; (2) for each type τ ∈ TM(T ) and each ∃R in τ , there exists a type τ 0 ∈ TM(T ) (possibly τ 0 = τ ) containing ∃R− . A type τ is called a type model (T-model) of T if τ ∈ TM(T ). Note that TM(T ) is uniquely defined for each TBox T . Note that for a coherent TBox T , TM(T ) is exactly the set of all types satisfying T . Let TM(Π) = TM(T1 ) × · · · × TM(Tn ) for Π = hT1 , . . . , Tn i. Proposition 1. Given a TBox T , we have the following results: – T is consistent iff TM(T ) 6= ∅. – For a general concept C, C is satisfiable wrt T iff there exists a T-model in TM(T ) satisfying C. – For two general concepts C, D, T |= C v D iff either TM(T ) = ∅ or all T-models in TM(T ) satisfy C v D. – T ≡ T 0 iff TM(T ) = TM(T 0 ), for any TBox T 0 . Given a type τ , an individual a and an ABox A, we say τ is a type of a w.r.t. A if there is a model I of A such that τ = {B | aI ∈ B I , B ∈ BS }. For example, given A = {A(a), ¬B(b), C(c)}, type τ = {A, B} is a type of a, but not a type of either b or c in A. For convenience, we will say a type of a when the ABox A is clear from the context. Let TMa (A) be the set of all the types of a in A if a occurs in A; and otherwise, TMa (A) be the set of all the types. A set M of T-models satisfies an ABox A if there is a type of a in M , i.e., M ∩ TMa (A) 6= ∅, for each individual a in A. Proposition 2. Given a TBox T and an ABox A, T ∪ A is consistent iff TM(T ) ∩ TMa (A) 6= ∅ for each a in A. 3 Merging Operator In this section, we introduce an approach to merging DL-Lite ontologies to obtain a coherent unified ontology. An ontology profile is of the form Π = hT1 , . . . , Tn i, where Ti is the ontology from the source n.o. i (1 ≤ i ≤ n). There are two standard definitions of integrity constraints (ICs) in the classical belief change literature [3], the consistency- and entailment-based definitions. We also allow two types of ICs for merging, namely the consistency constraint (CC), expressed as a set Ac of data, and the entailment constraint (EC), expressed as a TBox Te . We assume the IC is self- consistent, that is, Te ∪ Ac is always consistent. For an ontology profile Π, a CC Ac and a EC Te , an ontology merging operator is a mapping (Π, Te , Ac ) 7→ ∇(Π, Te , Ac ), where ∇(Π, Te , Ac ) is a TBox, s.t. ∇(Π, Te , Ac )∪Ac is consistent, and ∇(Π, Te , Ac ) |= Te . In classical model-based merging approaches, merging operators are often defined by certain notions of model distances [11; 6]. We use S4S 0 to denote the symmetric difference between two sets S and S 0 , i.e., S4S 0 = (S \ S 0 ) ∪ (S 0 \ S). Given a set S and a tuple S = hS1 , . . . , Sn i of sets, the distance between S and S is defined to be a tuple d(S, S) = hS4S1 , . . . , S4Sn i. For two n-element dis- tances d and d0 , d  d0 if di ⊆ d0i for each 1 ≤ i ≤ n, where di is the i-th element in d. Given two sets S and S 0 , define σ(S, S 0 ) = S if S 0 is empty, and otherwise, σ(S, S 0 ) = { e0 ∈ S | ∃e00 ∈ S 0 s.t. ∀e ∈ S, ∀e0 ∈ S 0 , d(e, e0 ) 6≺ d(e0 , e00 ) }. In [6], given a collection Ψ = {ϕ1 , . . . , ϕn } of propositional formulas, and some ECs expressed as a propositional theory µ, the result of merging ϕ1 , . . . , ϕn w.r.t. µ is the theory whose models are exactly σ(mod(µ), mod(Ψ )), i.e., those models satisfying µ and having minimal distance to Ψ . Inspired by classical model-based merging, we introduce a merging opera- tor in terms of T-models. For an ontology profile Π and an EC Te , we could define the T-models of the merging to be a subset of TM(Te ) (so that Te is entailed) consisting of those T-models which have minimal distance to Π, i.e., σ(TM(Te ), TM(Π)). However, this straightforward adoption does not take the CC into consideration, and the merging result obtained in this way may not be coherent. For example, let T1 = {A v ¬B}, T2 = {> v B}, Te = ∅, and Ac = {A(a), B(a)}. Then, σ(TM(Te ), TM(hT1 , T2 i)) consists of only one type {B}. Clearly, the corresponding TBox {A v ⊥, > v B} does not satisfy the CC, and it is not coherent. Note that in the above example, once the merging result satisfies the CC, then it is also coherent, because both concepts A and B are satisfiable. In general, it is also the case that coherency can be achieved by applying certain CC to merging. We introduce an auxiliary ABox A† in addition to the initial CC Ac , in which each concept and each role is explicitly asserted with a member. That is, A† = {A(a) | A ∈ SC , a ∈ SI is a fresh individual for A} ∪ {P (b, c) | P ∈ SR , b, c ∈ SI are fresh individuals for P }. As assumed, SI is large enough for us to take these auxiliary individuals. From the definition of CCs, the merged TBox T must be consistent with all the assertions in A† , which assures all the concepts and roles in T to be satisfiable. Based on this observation, we have the following lemma. Lemma 1. T is coherent iff T ∪ A† is consistent for any TBox T . To ensure the coherence of merging, we only need to include A† into the CC. For the merging to be consistent with the CC Ac , from Proposition 2, the T-model set M of the merging needs to satisfy Ac . That is, M needs to contain a type of a for each individual a in Ac . However, σ(TM(Te ), TM(Π)) does not nec- essarily satisfy this condition, as can be seen from the above example: TMa (Ac ) consists of a single type {A, B} and σ(TM(Te ), TM(Π)) ∩ TMa (Ac ) = ∅. Intu- itively, for the merging to satisfy the CC, type {A, B} need to be added to the T-models of merging. In general, the T-models of merging can be obtained by extending (if necessary) the set σ(TM(Te ), TM(Π)) with at least one type of a w.r.t. Ac for each individual a in Ac , and if there are multiple such types, choose those with minimal distances. Based on the above intuitions, the definition of TBox merging is presented as follows. Definition 1. Let Π be an ontology profile, Te be a TBox, and Ac be an ABox. Denote A∗ = Ac ∪ A† . The result of merging Π w.r.t. the EC Te and the CC Ac , denoted ∇(Π, Te , Ac ), is defined as follows TM(∇(Π, Te , Ac )) = σ(TM(Te ), TM(Π))∪ [ σ(TM(Te ) ∩ TMa (A∗ ), TM(Π)). a occurs in A∗ From the definition, the T-models of the merging are constituted with two parts. The first part consists of those T-models of Te (for the satisfaction of the EC) with minimal distances to Π. The second part consists of types of a, for each individual a in A∗ , which are added to the first part for the satisfaction of the CC. These types are also required to be T-models of Te and have minimal distances to Π. It is clear from Proposition 1 that the result of merging is unique up to TBox equivalence. 4 QBF Reduction In this section, we consider a standard reasoning problem for ontology merg- ing, namely the subclass queries: whether or not the result of merging entails a subclass relation C v D. We present a QBF reduction for this problem, which allows us to make use of the off-the-shelf QBF solvers [8]. We assume that every TBox in the ontology profile is coherent, and in this case, the T-models of a TBox T are exactly those satisfying T . We achieve the reduction in three steps. Firstly, we introduce a novel proposi- tional transformation for DL-Lite TBoxes. The transformation is inspired by [1], which contains a transformation from a DL-Lite TBox into a theory in the one variable fragment of first order logic. Considering T-models instead of classical DL models allows us to obtain a simpler transformation to propositional logic than theirs to first order logic. Function φ(·) maps a basic concept to a propositional variable, and a general concept (resp., a TBox axiom) to a propositional formula. φ(⊥) = ⊥, φ(A) = pA , φ(> n R) = pnR , φ(¬C) = ¬φ(C), φ(C1 u C2 ) = φ(C1 ) ∧ φ(C2 ), φ(C1 v C2 ) = φ(C1 ) → φ(C2 ). Here, pA and pnR are propositional variables. We use VS to denote the set of propositional variables corresponding to the basic concepts over S, and we omit the subscript S in what follows for simplicity. We can see that φ(·) is a bijective mapping between the set of DL-LiteN bool general concepts and the set of propositional formulas only referring to symbols in VS and boolean operators ¬, ∧ and →. Naturally, given the mapping φ(·), an arbitrary propositional model may not correspond to a type. We define a formula η whose models are exactly the set of types. Let ^ ^ η= pnR → pmR . R+ ∈SR m,n∈SN with m