Description Logics over Multisets Yuncheng Jiang1,2 1 School of Computer Science, South China Normal University, Guangzhou 510631, P.R. China 2 State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, P.R. China ycjiang@scnu.edu.cn Abstract. Description Logics (DLs) are a family of knowledge representation languages that have gained considerable attention the last 20 years. It is well- known that the interpretation domain of classical DLs is a classical set. However, in Science and in the ordinary life the situation is not at all like this. In order to handle these types of knowledge in DLs, in this paper we present a DL framework based on multiset theory. Concretely, we present the DL over multisets ALCmsets which is a semantic extension of the classical DL ALC. The syntax and semantics of ALCmsets are presented. Moreover, we investigate the logical properties of ALCmsets and provide a sound and terminating reasoning algorithm for satisfiability problem of ALCmsets. Keywords: Classical Description Logics; Extended Description Logics; Multisets; Satisfiability 1 Introduction In the last 20 years a substantial amount of work has been carried out in the context of Description Logics (DLs for short) [1][20]. DLs are a family of logic-based knowledge representation formalisms that are tailored towards representing the terminological knowledge of an application domain in a structured and formally well- understood way. DLs have been applied to numerous problems in computer science such as information integration, databases, software engineering and soft sets. Recent interest in DLs has been spurred by their application in the Semantic Web [2]: the DL SHOIN(D) provides the logical underpinning for the Web Ontology Language (OWL), and the DL SROIQ(D) is used in OWL 2 [6][11][15][16]. A main point is that DLs are considered as to be attractive logics in knowledge based applications as they are a good compromise between expressive power and computational complexity. From the semantics of DLs [1] we know that the interpretation domain of classical DLs is a classical set (Zermelo-Fraenkel set) [12]. That is to say, the interpretation of classical DLs is based on classical set theory from a semantics point of view. It is well-known that classical set theory states that a given element can appear only once in a set, it assumes that all mathematical objects occur without repetition. However, in Science and in the ordinary life the situation is not at all like this. In the physical world it is observed that there is an enormous repetition [7]. As a matter of fact, in order to process the collections with repetition, multi-set theory (MST for short) has been presented and several operations as the addition, the union and the intersection of multisets have been defined and their properties investigated in several papers [3][27][28]. Intuitively, multisets (sometimes also called bags[13][28]) are set-like structures where an element can appear more than once [3]. Thus, a multiset differs from a set in that each element has a multiplicity, which is a natural number indicating (lossely speaking) how many times it is a member of the multiset [7]. We must note that the word multiset was coined by N. G. de Bruijin [18], but the first person that actually used multisets was Richard Dedekind in his well-known paper “Was sind und was sollen die Zahlen?” (“The nature and meaning of numbers”) [4]. This paper was published in 1888 [27]. More concretely, a multiset is a collection of objects in which repetition of elements is significant [9]. We confront a number of situations in life when we have to deal with collections of elements in which duplicates are significant. An example may be cited to prove this point. While handling a collection of employees’ ages or details of salary in a company, we need to handle entries bearing repetitions and consequently our interest may be diverted to the distribution of elements. In such situations the classical definition of set proves inadequate for the situation presented [9]. Thus, from a practical point of view multisets are very useful structures as they arise in many areas of mathematics and computer science [8][9][19][22][23][27]. A complete survey of the development of multi-set theory can be found in [3]. Naturally, a problem is raised: how we can interpret the concepts and the roles of DLs using multi-set theory? Furthermore, what are the benefits of doing so? After careful thought, we find that it is feasible to interpret the concepts and the roles of DLs using multi-set theory. Moreover, it is a more accurate interpretation for the concepts and the roles of DLs. For example, when we interpret the concept commended-students (students who are commended), we can say that Zhangsan, Lisi and Wangwu are the instances of the concept commended-students. More formally, we can say commended-studentsI={Zhangsan, Lisi, Wangwu} in classical DLs. However, if we consider more accurate situation, e.g., Zhangsan is commended three times, Lisi is commended twice, and Wangwu is commended once, obviously, the classical interpretation of DLs cannot process this situation. Here we can interpret the concept commended-students using multi-set theory. Formally, commended- studentsMI={Zhangsan, Zhangsan, Zhangsan, Lisi, Lisi, Wangwu}, where {Zhangsan, Zhangsan, Zhangsan, Lisi, Lisi, Wangwu} is a multiset. In this paper we extend DLs allow to express that interpretation of a concept (resp., a role) is not a subset of classical set (traditional interpretation domain DI) (resp., a subset of DI´DI) like in classical DLs, but a subset of multisets (resp., a subset of Cartesian product of multisets). That is, we will extend the interpretation domain of DLs to multisets. More concretely, we will present the DL ALCmsets, which is a semantic extension of the DL ALC [10][14][17][24][26] based on multiset theoretic operations presented in [5][9][13]. Moreover, we will provide a sound and incomplete reasoning algorithm for the satisfiability reasoning problem of the DL ALCmsets. It is worth noting that classical set is a special case of multisets [9], hence, the DL ALC [10] 2 [14][17][24][26] is a special case of the DL ALCmsets presented in this paper from a semantics point of view. 2 Multisets The current section provides some background on multisets. A naive concept of multiset was formalized by Blizard [5]. It has the following properties: (i) a multiset is a collection of elements in which certain elements may occur more than once; (ii) occurrences of a particular element in a multiset are indistinguishable; (iii) each occurrence of an element in a multiset contributes to the cardinality of the multiset; (iv) the number of occurrences of a particular element in a multiset is a (finite) positive integer; (v) the number of distinguishable (distinct) elements in a multiset need not be finite; and (vi) a multiset is completely determined if we know the elements that belong to it and the number of times each element belongs to it [9]. In the following, we introduce the basic definitions and notations of multisets [5][9][13]. A collection of elements containing duplicates is called a multiset. Formally, if X is a set of elements, a multiset M drawn from the set X is represented by a function count M or CM: X®N where N represents the set of non-negative integers. For each xÎX, CM(x) is the characteristic value of x in M and indicates the number of occurrences of the element x in M. A multiset M is a set if "xÎX, CM(x)=0 or 1. The word “multiset” often shortened to “mset” abbreviates the term “multiple membership set”. Let M1 and M2 be two msets drawn from a set X. M1 is a sub mset of M2 (M1ÍM2) if "xÎX, CM1(x)£CM2(x). M1 is a proper sub mset of M2 (M1ÌM2) if CM1(x)£CM2(x) "xÎX and there exists at least one "xÎX such that CM1(x)0 and individual a such that {ám/a:Cñ} is consistent w.r.t. T; (5) A is consistent w.r.t. T iff A⊭Tám/a:⊥ñ for any m>0 and individual a. Note 2. It needs to be noted that the polynomial time reductions for instance problem to (in)consistency (i.e., A⊨Tám/a:Cñ iff AÈ{ám/a:ØCñ} is inconsistent w.r.t. T) and subsumption problem to (un)satisfiability (i.e., áC⊑TDñ iff C⊓ØD is unsatisfiable w.r.t. T), are satisfied in ALC, however, these reductions are not satisfied in ALCmsets. Lastly, we have to point out that in the rest of this paper we only consider unfoldable TBoxes. More concretely, a concept definition is of the form áAºCñ where A is a concept name and C is a concept description. Given a set T of concept definitions, we say that the concept name A directly uses the concept name B if T contains a concept definition áAºCñ such that B occurs in C. Let uses be the transitive closure of the relation “directly uses”. We say that T is cyclic if there is a concept name A that uses itself, and acyclic otherwise. A TBox T is a finite, possibly empty, set of terminological axioms of the form áA⊑Cñ, called inclusion introductions, and of the form áAºCñ, called equivalence introductions. A TBox is unfoldable if it contains no cycles and contains only unique introductions, i.e., terminological axioms with only concept names appearing on the left hand side and, for each concept name A, there is at most one axiom in T of which A appears on the left side. In classical DLs [1], a knowledge base with an unfoldable TBox can be transformed into an equivalent one with an empty TBox by a transformation called unfolding or expansion [21][25]: Concept inclusion introductions áA⊑Cñ are replaced by concept equivalence introductions áAºA¢⊓Cñ, where A¢ is a new concept name, which stands for the qualities that distinguish the elements of A from the other elements of C. Subsequently, if C is a complex concept expression, which is defined in terms of concept names, defined in the TBox, we replace their definitions in C. It has been proved that the initial TBox with the expanded one are equivalent. In DLs over msets such as ALCmsets presented in this paper, we also can prove that a knowledge base with an unfoldable TBox can be transformed into an equivalent one with an empty TBox. Firstly, we can transform an ALCmsets-TBox T into a regular ALCmsets-TBox T¢, containing equivalence introductions only, such that T¢ is equivalent to T in a sense that will be specified below. We obtain T¢ from T by choosing for every concept inclusion introduction áA⊑Cñ in T a new concept name A¢ and by replacing the inclusion introduction áA⊑Cñ with the equivalence introduction áAºA¢⊓Cñ. The TBox T¢ is the normalization of T. Now we show that T and T¢ are equivalent. Proposition 3. Let T be a TBox and T¢ its normalization. Then 7 (1) Every model of T¢ is a model of T. (2) For every model MI of T there is a model MI¢ of T¢ that has the same domain as MI and agrees with MI on the concept names and roles in T. Thus, in theory, inclusion introductions do not add to the expressivity of TBoxes. However, in practice, they are a convenient means to introduce concepts into a TBox that cannot be defined completely. In fact, this case is the same as classical DLs [1]. Now we show that, if T is an unfoldable TBox, we can always reduce reasoning problems w.r.t. T to problems w.r.t. the empty TBox. Instead of saying “w.r.t. f” one usually says “without a TBox”, and omits the index T for subsumption, equivalence, and instance, i.e., writes º, ⊑, ⊨ instead of ºT, ⊑T, and ⊨T. As we have seen in Proposition 3, T is equivalent to its expansion T¢. Recall that in the expansion every equivalence introduction áAºDñ such that D contains only concept names, but no concept descriptions. Now, for each concept description C we define the expansion of C w.r.t. T as the concept description C² that is obtained from C by replacing each occurrence of a concept name A in C by the concept description D, where áAºDñ is the equivalence introduction of A in T¢, the expansion of T. Proposition 4. Let T be an unfoldable TBox, C, D concept descriptions, C² expansion of C, and D² expansion of D. Then (1) áCºTC²ñ; (2) C is satisfiable w.r.t. T iff C² is satisfiable; (3) áC⊑TDñ iff áC²⊑D²ñ; (4) áCºTDñ iff áC²ºD²ñ. 4 Reasoning in ALCmsets In this section, we will provide a detailed presentation of the reasoning algorithm for the ALCmsets-satisfiability problem and the properties for the termination and soundness of the procedure. There is one point we have to point out here. Since we restrict the maximal number of occurrences of an element (i.e., an individual) in a multiset (i.e., subset of interpretation domain), it is obvious to know that the satisfiability reasoning algorithm (see below) is incomplete. In the following, we will present a tableau algorithm for testing satisfiability of an ALCmsets-concept. Before we can describe the tableau-based satisfiability algorithm for ALCmsets more formally, we need to introduce some basic notions firstly. A constraint (denoted by a) is an expression of the form ám/a:Cñ, or á(m/a, n/b):Rñ, where a and b are individuals, C is a concept, and R is a role. Our calculus, determining whether a finite set S of constraints or not, is based on a set of constraint propagation rules transforming a set S of constraints into “simpler” satisfiability preserving sets Si until either all Si contain a clash (indicating that from all the Si no model of S can be build) or some Si is completed and clash-free, that is, no rule can 8 further be applied to Si and Si contains no clash (indicating that from Si a model of S can be build). A set of constraints S contains a clash iff {ám/a:Cñ, á0/a:Cñ}ÍS for some m>0, individual a, and concept description C. The ®Ø-rule Condition: Si contains ám/a:ØCñ, but it does not contain á1/a:Cñ, á2/a:Cñ, …, or ánmax-m/a:Cñ. Action: Si,1=SiÈ{á1/a:Cñ}, Si,2=SiÈ{á2/a:Cñ}, …, Si,nmax-m=SiÈ{ánmax-m/a:Cñ}. The ®⊓-rule Condition: Si contains ám/a:C1⊓C2ñ, but neither {ám/a:C1ñ, áj/a:C2ñ} nor {ám/a:C2ñ, áj/a:C1ñ}, where m£j£nmax. Action: Si,1¢=SiÈ{ám/a:C1ñ, ám/a:C2ñ}, Si,2¢=SiÈ{ám/a:C1ñ, ám+1/a:C2ñ}, ..., Si,nmax+1¢ =SiÈ{ám/a:C1ñ, ánmax/a:C2ñ}, Si,1²=SiÈ{ám/a:C2ñ, ám/a:C1ñ}, Si,2²=SiÈ {ám/a:C2ñ, ám+1/a: C1ñ}, ..., Si,nmax+1²=SiÈ{ám/a:C2ñ, ánmax/a:C1ñ}. The ®⊔-rule Condition: Si contains ám/a:C1⊔C2ñ, but neither {ám/a:C1ñ, áj/a:C2ñ} nor {ám/a:C2ñ, áj/a:C1ñ}, where 1£j£m. Action: Si,1¢=SiÈ{ám/a:C1ñ, ám/a:C2ñ}, Si,2¢=SiÈ{ám/a:C1ñ, ám-1/a:C2ñ}, ..., Si,m¢= SiÈ{ám/a:C1ñ, á1/a:C2ñ}, Si,1²=SiÈ{ám/a:C2ñ, ám/a:C1ñ}, Si,2²=SiÈ{ám/a: C2ñ, ám-1/a:C1ñ}, ..., Si,m²= SiÈ{ám/a:C2ñ, á1/a:C1ñ}. The ®$-rule Condition: Si contains ám/a:$R.Cñ, but there are no individuals 1/b, 2/b, …, nmax/b such that á1/b:Cñ and á(m/a, 1/b):Rñ, á2/b:Cñ and á(m/a, 2/b): Rñ, …, or ánmax/b:Cñ and á(m/a, nmax/b):Rñ are in Si. Action: Si,1=SiÈ{á1/b:Cñ, á(m/a, 1/b):Rñ}, Si,2=SiÈ{á2/b:Cñ, á(m/a, 2/b):Rñ}, …, Si,nmax=SiÈ {ánmax/b:Cñ, á(m/a, nmax/b):Rñ}, where 1/b, 2/b, ..., nmax/b are individuals not occurring in Si. The ®"-rule Condition: Si contains ám/a:"R.Cñ and á(m/a, n/b):Rñ, but it does not contain án/b:Cñ. Action: Si¢=SiÈ{án/b:Cñ}. Fig. 1. Transformation rules of the satisfiability algorithm The tableau-based satisfiability algorithm for ALCmsets works as follows. Let C by an ALCmsets-concept. In order to test satisfiability of C, the algorithm starts with a finite set of constraints {S1, S2, …, Snmax}, and applies satisfiability preserving transformation rules (see Figure 1) (in arbitrary order) to the set of constraints Si (1£i£nmax) until no more rules apply, where S1={á1/a:Cñ}, S2={á2/a:Cñ}, …, Snmax= {nmax/a:C}. If the “complete” constraint obtained this way does not contain an 9 obvious contradiction (called clash), then S is consistent (and thus C is satisfiable), and inconsistent (unsatisfiable) otherwise. The transformation rules that handle negation, conjunction, disjunction, and exists restrictions are non-deterministic in the sense that a given set of constraints is transformed into finitely many new sets of constraints such that the original set of constraints is consistent iff one of the new sets of constraints is so. For this reason we will consider finite sets of constraints S={S1, S2, …, Sk} instead of the original set of constraints {S1, S2, …, Snmax}, where k³nmax. Such a set is consistent iff there is some i, 1£i£k, such that Si is consistent. A rule of Figure 1 is applied to a given finite set of constraints S as follows: it takes an element Si of S, and replaces it by one set of constraints Si¢, by two sets of constraints Si¢ and Si², or by finitely many sets of constraints Si,j. Termination and soundness of the procedures can be shown. Proposition 5 (Termination). Let C be an ALCmsets-concept. There cannot be some infinite sequences of rule applications {áj/a:Cñ}®S1®S2®…, where 1£j£nmax. Proof. The main reasons for this proposition to hold are the following. (1) The original sets of constraints {áj/a:Cñ} is finite. Namely, there exist nmax original sets of constraints {á1/a:Cñ}, {á2/a:Cñ}, …, {ánmax/a:Cñ}. (2) Without loss of generality, we consider the original set of constraints {áj/a:Cñ}. Let S¢ be a set of constraints contained in Si for some i³1. For every individual m/b¹j/a occurring in S¢, there is a unique sequence R1, …, Rk (k³1) of role names and a unique sequence of individuals of the form 1/b1, 1/b2, …, 1/bk-1, or 1/b1, 1/b2, …, 2/bk-1, …, or 1/b1, 1/b2, …, nmax/bk-1, …, or nmax/b1, nmax/b2, …, nmax/bk-1, such that {á(j/a, 1/b1):R1ñ, á(1/b1, 1/b2):R2ñ, …, á(1/bk-1, m/b):Rkñ}ÍS¢, {á(j/a, 1/b1):R1ñ, á(1/b1, 1/b2):R2ñ, …, á(2/bk-1, m/b):Rkñ}ÍS¢, …, or {á(j/a, nmax/b1):R1ñ, á(nmax/b1, nmax/b2):R2ñ, …, á(nmax/bk-1, m/b):Rkñ}ÍS¢. In this case, we say that m/b occurs on the level k in S¢. (3) If ám/b:C¢ñÎS¢ for an individual m/b on level k, then the maximal role depth of C¢ (i.e., the maximal nesting of constructors involving roles) is bounded by the maximal role depth of C minus k. Consequently, the level of any individual in S¢ is bounded by the maximal role depth of C. (4) If ám/b:C¢ñÎS¢, then C¢ is a subdescription of C. Consequently, the number of different concept assertions on m/b is bounded by the size of C. (5) The number of different role successors of m/b in S¢ (i.e., individuals l/c such that á(m/b, l/c):RñÎS¢ for a role name R) is bounded by the number of different existential restrictions in C. Proposition 6 (Soundness). Assume that S¢ is obtained from the finite set of constraints S by application of a transformation rule. If S is consistent, then S¢ is consistent. 10 Proof. [Sketch] Given the termination property (see Proposition 5), it is easily verified, by case analysis, that the transformation rules of the satisfiability algorithm are sound. For example, the ®Ø-rule: Assume that MI is an mset interpretation satisfying ám/a:ØCñ, where 0