Terminological Systems Revisited: Terminology = Schema + Views M. Buchheit1 and F. M. Donini2 and W. Nutt1 and A. Schaerf2 1. German Research Center for Arti cial Intelligence (DFKI), Saarbrucken, Germany fbuchheit,nuttg@dfki.uni-sb.de 2. Dipartimento di Informatica e Sistemistica, Universita di Roma \La Sapienza", Italy fdonini,aschaerfg@assi.dis.uniroma1.it Abstract relationship. This abstract architecture has been the Traditionally, the core of a Termino- basis for the design of systems, the development of logical Knowledge Representation System algorithms, and the investigation of the computa- (TKRS) consists of a so-called TBox, where tional properties of inferences. concepts are introduced, and an ABox, Given this setting, there are three parameters that where facts about individuals are stated characterize a terminological system: (i) the lan- in terms of these concepts. This design guage for concept descriptions, (ii) the form of the has a drawback because in most applica- statements allowed, and (iii) the semantics given to tions the TBox has to meet two functions concepts and statements. Research tried to improve at a time: on the one hand, similar to a systems by modifying these three parameters. But in database schema, framelike structures with all existing systems and almost all theoretical studies typing information are introduced through language and semantics have been kept uniform.1 primitive concepts and primitive roles; on The results of these studies were unsatisfactory in the other hand, views on the objects in the at least two respects. First, it seems that tractable knowledge base are provided through de- inferences are only possible for languages with lit- ned concepts. tle expressivity. Second, no consensus has been reached about the semantics of terminological cycles, We propose to account for this conceptual although in applications the need to model cyclic separation by partitioning the TBox into dependencies between classes of objects arises con- two components for primitive and de ned stantly. concepts, which we call the schema and the Based on an ongoing study of applications of ter- view part. We envision the two parts to minological systems, we suggest to re ne the two- di er with respect to the language for con- layered architecture consisting of TBox and ABox. cepts, the statements allowed, and the se- Our goal is twofold: on the one hand we want to mantics. achieve more conceptual clarity about the role of We argue that by this separation we primitive and de ned concepts and the semantics of achieve more conceptual clarity about the terminological cycles; on the other hand, we want to role of primitive and de ned concepts improve the tradeo between expressivity and worst and the semantics of terminological cycles. case complexity. Since our changes are not primar- Moreover, three case studies show the com- ily motivated by mathematical considerations but by putational bene ts to be gained from the the way systems are used, we expect to come up with re ned architecture. a more practical system design. In the applications studied we found that the 1 Introduction TBox has to meet two functions at a time. One is to declare frame-like structures by introducing primi- Research on terminological reasoning usually pre- tive concepts and roles together with typing infor- supposes the following abstract architecture, which mation like isa-relationships between concepts, or re ects quite well the structure of existing systems. range restrictions and number restrictions of roles. There is a logical representation language that allows E.g., suppose we want to model a company environ- for two kinds of statements: in the TBox or termi- ment. Then we may introduce the concept Employee nology , concept descriptions are introduced, and in as a specialization of Person, having exactly one the ABox or world description , individuals are char- name of type Name and at least one aliation of acterized in terms of concept membership and role type Department. This is similar to class declara-  This work was partly supported by the Commis- tions in object-oriented systems. For this purpose, a sion of the European Union under ESPRIT BRA 6810 simple language is sucient. Cycles occur naturally (Compulog 2), by the German Ministry of Research and in modeling tasks, e.g., the boss of an Employee is Technology under grant ITW 92-01 (TACOS), and by 1 In [Lenzerini and Schaerf, 1991] a combination of the CNR (Italian Research Council) under Progetto Fi- nalizzato Sistemi Informatici e Calcolo Parallelo, LdR a weak language for ABoxes and a strong language for \Ibridi." queries has been investigated. also an Employee. Such declarations have no de ni- ing. tional import, they just restrict the set of possible interpretations. 2 The Re ned Architecture The second function of a TBox is to de ne new We start this section by a short reminder on concept concepts in terms of primitive ones by specifying languages. Then we discuss the form of statements necessary and sucient conditions for concept mem- and their semantics in the di erent components of bership. This can be seen as de ning abstractions or a TKRS. Finally, we specify the reasoning services views on the objects in the knowledge base. De ned provided by each component and introduce di erent concepts are important for querying the knowledge complexity measures for analyzing them. base and as left-hand sides of trigger rules. For this purpose we need more expressive languages. If cy- 2.1 Concept Languages cles occur in this part they must have de nitional In concept languages, complex concepts (ranged over import. by C , D) and complex roles (ranged over by Q, R) As a consequence of our analysis we propose to can be built up from simpler ones using concept and split the TBox into two components: one for declar- role forming constructs (see Tables 1 and 2 a set ing frame structures and one for de ning views. By of common constructs). The basic syntactic sym- analogy to the structure of databases we call the bols are (i) concept names, which are divided into rst component the schema and the second the view schema names (ranged over by A) and view names part. We envision the two parts to di er with re- (ranged over by V ), (ii) role names (ranged over by spect to the language, the form of statements, and P ), and (iii) individual names (ranged over by a, b). the semantics of cycles. An interpretation I = (I ; I ) consists of the do- The schema consists of a set of primitive concept main I and the interpretation function I , which introductions, formulated in the schema language, maps every concept to a subset of I , every role and the view part by a set of concept de nitions, for- to a subset of I  I , and every individual to an mulated in the view language. In general, the schema element of I such that aI 6= bI for di erent indi- language will be less expressive than the view lan- viduals a, b (Unique Name Assumption). Complex guage. Since the role of statements in the schema concepts and roles are interpreted according to the is to restrict the interpretations we want to admit, semantics given in Tables 1 and 2, respectively. rst order semantics, which is also called descriptive In our architecture, there are two di erent con- semantics in this context (see Nebel 1991), is ade- cept languages in a TKRS, a schema language for quate for cycles occurring in the schema. For cycles expressing schema statements and a view language in the view part, we propose to choose a semantics for formulating views and queries to the system. that de nes concepts uniquely, e.g., least or greatest xpoint semantics. 2.2 The Three Components The purpose of this work is not to present the We rst focus our attention to the schema. The full- edged design of a new system but to explore schema introduces concept and role names and states the options that arise from the separation of TBoxes elementary type constraints. This can be achieved into schema and views. Among the bene ts to be by inclusion axioms having one of the forms: gained from this re nement are the following three. A v D; P v A1  A2 ; First, the new architecture has more parameters for improving systems, since language, form of state- where A, A1 , A2 are schema names, P is a role name, ments, and semantics can be speci ed di erently for and D is a concept of the schema language. Intu- schema and views. So we found a combination of itively, the rst axiom states that all instances of A schema and view language with polynomialinference are also instances of D. The second axiom states procedures whereas merging the two languages into that the role P has domain A1 and range A2 . A one would have led to intractability. Second, we be- schema S consists of a nite set of schema axioms. lieve that one of the obstacles to a consensus about Inclusion axioms impose only necessary conditions the semantics of terminological cycles has been pre- for being an instance of the schema name on the cisely the fact that no distinction has been made left-hand side. For example, the axiom \Employee v between primitive and de ned concepts. Moreover, Person" declares that every employee is a person, intractability results for cycles mostly refer to infer- but does not give a sucient condition for being an ences with de ned concepts. We proved that rea- employee. soning with cycles is easier when only primitive con- A schema may contain cycles through inclusion cepts are considered. Third, the re ned architecture axioms (see Nebel 1991 for a formal de nition). allows for more di erentiated complexity measures, So one may state that the bosses of an employee as shown later in the paper. are themselves employees, writing \Employee v In the following section we outline our re ned ar- 8boss.Employee." In general, existing systems do chitecture for a TKRS, which comprises three parts: not allow for terminological cycles, which is a seri- the schema, the view taxonomy, and the world de- ous restriction, since cycles are ubiquitous in domain scription, which comprise primitive concepts, de- models. ned concepts and assertions in traditional systems. There are two questions related to cycles: the rst In the third section we show by three case studies is to x the semantics and the second, based on that adding a simple schema with cycles to existing this, to come up with a proper inference procedure. systems does not increase the complexity of reason- As to the semantics, we argue that axioms in the Construct Name Syntax Semantics top > I singleton set fag faI g intersection C uD C \ DI I union C tD C I [ DI negation :C I n C I universal quanti cation 8R.C fd1 j 8d2 : (d1 ; d2) 2 RI ! d2 2 C I g existential quanti cation 9R.:C fd1 j 9d2 : (d1 ; d2) 2 RI ^ d2 2 C I g existential agreement 9Q = R fd1 j 9d2.(d1 ; d2) 2 QI ^ (d1; d2) 2 RI g number restrictions ( n R) fd1 j ]fd2 j (d1; d2) 2 RI g  ng ( n R) fd1 j ]fd2 j (d1; d2) 2 RI g  ng Table 1: Syntax and semantics of concept forming constructs. Construct Name Syntax Semantics inverse role P 1 f(d1; d2) j (d2 ; d1) 2 P I g role restriction (R: C ) f(d1; d2) j (d1 ; d2) 2 RI ^ d2 2 C I g role chain Q  R f(d1; d3) j 9d2 .(d1; d2) 2 QI ^ (d2; d3) 2 RI g self  f(d1; d1) j d1 2 I g Table 2: Syntax and semantics of role forming constructs. schema have the role of narrowing down the mod- 9boss-of.BillsSuperBosses." But note that this does els we consider possible. Therefore, they should be not yield a de nition if we assume descriptive se- interpreted under descriptive semantics, i.e., like in mantics because for a xed interpretation of BILL rst order logic: an interpretation I satis es an ax- and of the role boss-of there may be several ways iom A v D if AI  DI , and it satis es P v A1  A2 to interpret BillsSuperBosses in such a way that the if P I  AI1  AI2 . The interpretation I is a model above equality holds. In this example, we only ob- of the schema S if it satis es all axioms in S . The tain the intended meaning if we assume least xpoint problem of inferences will be dealt with in the next semantics. This observation holds more generally: if section. cycles are intended to uniquely de ne concepts then The view part contains view de nitions of the form descriptive semantics is not suitable. However, least or greatest xpoint semantics or, more generally, a V =: C; semantics based on the -calculus yield unique de - where V is a view name and C is a concept in the nitions (see Schild 1994). Unfortunately, algorithms view language. Views provide abstractions by de n- for subsumption of views under such semantics are ing new classes of objects in terms of the concept known only for fragments of the concept language and role: names introduced in the schema. We refer de ned in Tables 1 and 2. to \V = C " as the de nition of V . The distinc- In this paper, we only deal with acyclic view tax- tion between schema and view names is crucial for onomies. In this case, the semantics of view de ni- tions is straightforward. An interpretation I satis es our architecture. It ensures the separation between schema and views. the de nition V =: C if V I = C I , and it is a model A view taxonomy V is a nite set of view de ni- for a view taxonomy V if I satis es all de nitions in tions such that (i) for each view name there is at V. most one de nition, and (ii) each view name oc- A state of a airs in the world is described by as- curring on the right hand side of a de nition has a sertions of the form de nition in V . C (a); R(a; b); Di erently from schema axioms, view de nitions give necessary and sucient conditions. As an ex- where C and R are concept and role descriptions in ample of a view, one can describe the bosses of the view language. Assertions of the form A(a) or the employee Bill as the instances of \BillsBosses =: P (a; b), where A and P are names in the schema, 9boss-of.fBILLg." resemble basic facts in a database. Assertions in- Whether or not to allow cycles in view de ni- volving complex concepts are comparable to view tions is a delicate design decision. Di erently from updates. the schema, the role of cycles in the view part A world description W is a nite set of asser- is to state recursive de nitions. For example, if tions. The semantics is as usual: an interpretation we want to describe the group of individuals that I IsatisI es CI(a) if aI 2 AI and it satis es R(a; b) if are above Bill in the hierarchy of bosses we can (a ; b ) 2 R ; it is a model of W if it satis es every use the de nition \BillsSuperBosses =: BillsBosses t assertion in W . Summarizing, a knowledge base is a triple  = because usually the schema is much bigger than the hS ; V ; Wi, where S is a schema, V a view taxonomy, two views which are compared. Similarly, one might and W a world description. An interpretation I is be interested in the world description complexity of a model of a knowledge base if it is a model of all instance checking whenever one can expect W to be three components. much larger than the schema and the view part. It is worth noticing that for every problem com- 2.3 Reasoning Services bined complexity, taking into account the whole in- For each component, there is a prototypical reason- put, is at least as high as the other three. For exam- ing service to which the other services can be re- ple, if the complexity of a problem is O(jSjjVjjWj), duced. its combined complexity is cubic, whereas the other Schema Validation : Given a schema S , check ones are linear. Similarly, if the complexity of a given whether there exists a model of S that inter- problem is O(jSjjVj), both its combined complexity prets every schema name as a nonempty set. and its view complexity are exponential, its schema complexity is polynomial, and its world description View Subsumption : Given a schema S , a view tax- complexity is constant. onomy V , and view names V1 and V2 , check In this paper, we use combined complexity to com- whether V1I  V2I for every model I of S and pare the complexity of reasoning in our architec- V; ture with the traditional one. Moreover, we use Instance Checking : Given a knowledge base , an schema complexity to show how the presence of a individual a, and a view name V , check whether large schema a ects the complexity of the reason- aI 2 V I holds in every model I of . ing services previously de ned. View and world de- Schema validation supports the knowledge engineer scription complexity have been investigated (under by checking whether the skeleton of his domain di erent names) in [Nebel, 1990, Baader, 1990] and [Schaerf, 1993, Donini et al., 1994], respectively. model is consistent. Instance checking is the basic operation in querying a knowledge base. View sub- sumption helps in organizing and optimizing queries 3 The Case Studies (see e.g. Buchheit et al. 1994). Note that the schema We studied some illustrative examples that show the S has to be taken into account in all three services advantages of the architecture we propose. We ex- and that the view taxonomy V is relevant not only tended three systems by a simple cyclic schema lan- for view subsumption, but also for instance checking. guage and analyzed their computational properties. In systems that forbid cycles, one can get rid of S As argued before, a schema language should at and V by expanding de nitions. This is not possible least be expressive enough for declaring subconcept when S and V are cyclic. relationships, restricting the range of roles, and spec- 2.4 Complexity Measures ifying roles to be necessary (at least one value) or sin- gle valued (at most one value). These requirements The separation of the core of a TKRS into three are met by the language SL, which was introduced components allows us to introduce re ned complex- in [Buchheit et al., 1994] and that is de ned by the ity measures for analyzing the diculty of inferences. following syntax rule: The complexity of a problem is generally measured with respect to the size of the whole input. However, C; D ! A j 8P .A j ( 1 P ) j ( 1 P ): with regard to our setting, three di erent pieces of input are given, namely the schema, the view tax- Obviously, it is impossible to express in SL that a onomy, and the world description. For this reason, concept is empty. Therefore, schema validation in di erent kinds of complexity measures may be de- SL is trivial. Also, subsumption of concept names ned, similarly to what has been suggested in [Vardi, is polynomially decidable. 1982] for queries over relational databases. We con- We proved that inferences become harder for ex- sider the following measures (where jX j denotes the tensions of SL. If we add inverse roles, schema val- size of X ): idation remains trivial, but subsumption of schema Schema Complexity : the complexity as a function names becomes NP-hard. If we add any construct by of jSj; which one can express the empty concept|like dis- jointness axioms|schema validation becomes NP- View Complexity : the complexity as a function of hard. However, in our opinion this does not mean jVj; that extensions of SL are not feasible. For some ex- World Description Complexity : the complexity as a tensions, there are natural restrictions on the form function of jWj; of schemas that decrease the complexity. Also, it is not clear whether realistic schemas will contain Combined Complexity : the complexity as a function structures that require complex computations. of jSj + jVj + jWj. In all the three cases studied, the schema lan- Combined complexity takes into account the guage is SL. For the view language, we propose whole input. The other three instead consider only a three di erent languages derived from three actual part of the input, so they are meaningful only when systems described in the literature, namely the de- it is reasonable to suppose that the size of the other ductive object-oriented database system Concept- parts is negligible. For instance, it is sensible to an- Base [Jarke, 1992], and the terminological systems alyze the schema complexity of view subsumption kris [Baader and Hollunder, 1991] and classic [Borgida et al., 1989]. We investigated the com- in the knowledge base. The second alternative can putational properties of the reasoning services with be given a logical semantics in terms of epistemic respect to SL-schemas. We aimed at showing two operators (see Donini et al. 1992). results: (i) reasoning w.r.t. schema complexity is al- ways tractable, (ii) combined complexity is not in- creased by the presence of terminological cycles in References the schema. [Baader and Hollunder, 1991] Franz Baader and In all three cases, we assume that view names Bernhard Hollunder. A terminological knowledge are allowed in membership assertions and that the representation system with complete inference al- view taxonomy is acyclic. In this setting, every view gorithm. In Proc. PDK-91, LNAI, pages 67{86, name can be substituted with its de nition. For this 1991. reason, from this point on, we suppose that view [Baader, 1990] Franz Baader. Terminological cycles concepts are completely expanded. Therefore, when in KL-ONE-based knowledge representation lan- evaluating the complexity, we replace the size of the guages. In Proc. AAAI-90, pages 621{626, 1990. view part by the size of the concept representing the [Borgida et al., 1989] Alexander Borgida, Ronald J. view. Brachman, Deborah L. McGuinness, and Lori We have found the following results for the three Alperin Resnick. CLASSIC: A structural data systems in which SL is the schema language and the model for objects. In Proc. ACM SIGMOD, pages concept language the abstraction of the query lan- 59{67, 1989. guage of ConceptBase introduced in [Buchheit et [Buchheit et al., 1994] Martin al., 1994], or the language o ered by kris or clas- Buchheit, Manfred A. Jeusfeld, Werner Nutt, and sic, respectively. Martin Staudt. Subsumption between queries to ConceptBase: instance checking is in PTIME object-oriented databases. Information Systems, w.r.t. combined complexity (view subsumption 19(1):33{54, 1994. has been proved in PTIME in [Buchheit et al., [Donini et al., 1992] Francesco M. Donini, Maurizio 1994]). Lenzerini, Daniele Nardi, Werner Nutt, and An- kris: view subsumption and instance checking are drea Schaerf. Adding epistemic operators to con- PSPACE-complete problems w.r.t. combined cept languages. In Proc. KR-92, pages 342{353, complexity and PTIME problems w.r.t. schema 1992. complexity. [Donini et al., 1994] Francesco M. Donini, Maurizio classic: view subsumption and instance checking Lenzerini, Daniele Nardi, and Andrea Schaerf. De- are problems in PTIME w.r.t. combined com- duction in concept languages: From subsumption plexity. to instance checking. Journal of Logic and Com- We conclude that adding (possibly cyclic) schema putation, 4(92{93):1{30, 1994. information does not change the complexity of rea- [Jarke, 1992] M. Jarke. ConceptBase V3.1 User soning within the systems taken into account. Manual. Aachener Informatik-Berichte 92-17, RWTH Aachen, 1992. 4 Conclusion [Lenzerini and Schaerf, 1991] Maurizio Lenzerini and Andrea Schaerf. Concept We have proposed to replace the traditional TBox languages as query languages. In Proc. AAAI-91, in a terminological system by two components: a pages 471{476, 1991. schema, where primitive concepts describing frame- like structures are introduced, and a view part that [Nebel, 1990] Bernhard Nebel. Terminological rea- contains de ned concepts. We feel that this architec- soning is inherently intractable. Arti cial Intelli- ture re ects adequately the way terminological sys- gence, 43:235{249, 1990. tems are used in most applications. [Nebel, 1991] Bernhard Nebel. Terminological cy- We also think that this distinction can clarify the cles: Semantics and computational properties. In discussion about the semantics of cycles. Given the John F. Sowa, editor, Principles of Semantic Net- di erent functionalities of the schema and view part, works, pages 331{361. Morgan Kaufmann, Los Al- we propose that cycles in the schema are interpreted tos, 1991. with descriptive semantics while for cycles in the [Schaerf, 1993] Andrea Schaerf. On the complexity view part a de nitional semantics should be adopted. of the instance checking problem in concept lan- In three case studies we have shown that the re- guages with existential quanti cation. Journal of vised architecture yields a better tradeo between Intelligent Information Systems, 2:265{278, 1993. expressivity and the complexity of reasoning. The schema language we have introduced might [Schild, 1994] Klaus Schild. Terminological cycles be sucient in many cases. Sometimes, however, and the propositional -calculus. In Proc. KR-94, one might want to impose more integrity constraints 1994. on primitive concepts than those which can be ex- [Vardi, 1982] M. Vardi. The complexity of relational pressed in it. We see two solutions to this problem: query languages. In Proc. STOC-82, pages 137{ either enrich the language and have to pay by a more 146, 1982. costly reasoning process, or treat such constraints in a passive way by only verifying them for the objects