1. Introduction

Knowledge Representation and Automated Formal Reasoning in Description Logic ALC

Vasyl Lenko

vs.lenko@gmail.com 1

Volodymyr Pasichnyk

vpasichnyk@gmail.com 1

Natalia Kunanets

Yurii Shcherbyna

yshcherbyna@yahoo.com 0 0 Ivan Franko National University of Lviv , 1 Universytetska str., Lviv, 79000 , Ukraine 1 Lviv Polytechnic National University , 12 S. Bandery str., Lviv, 79013 , Ukraine

The paper presents a systematic analysis of the description logic formalism that has proved to be the distinguished approach for automated formal reasoning. The focus is on the principles of knowledge representation and reasoning within the framework of basic description logic ALC. An application of the specified formalism towards the domain of mass media reveals the benefits of automated reasoning for decision making.

1 Description logic automated formal reasoning tableaux algorithm ALC

1. Introduction

Description logics (DLs) play a fundamental role in modern ontologies that are based on OWL 2. They form the basis of direct model-theoretic semantics [1] for OWL 2 ontologies, which are de facto the standard in knowledge bases engineering. The historical evolution of ontologies from a frame model towards logic-based knowledge representation models is explained by the issues of an ambiguous interpretation of syntactic constructions in frames and limited generality of the associated reasoning methods [2]. Description logics, as decidable fragments of first-order logic, provide formal modeltheoretic semantics and sound methods (forms) of reasoning over syntactic expressions.

For OWL 2, the most appropriate semantics among description logics is SROIQ, while for OWL it is description logic SHOIN. Description logics are named using abbreviations of sets of constructors (axioms), which form the basis for their grammar. For instance, SROIQ consists of the following constructors: S – denotes the set of constructors of the basic description logic ALC, extended with the transitive roles; R – denotes an extended set of role axioms, also called RBox; O – denotes nominals; I – denotes the inverse roles; Q – denotes qualified cardinality restrictions. The variety of description logics is explained by the combinatorial possibilities of the constructors’ set selection but obtaining the description logic with attractive expressiveness and computability is a difficult research task. The reason is a fundamental trade-off between expressiveness and decidability of formal language – higher expressiveness leads to a higher probability of undecidability of reasoning problems. Adoption of a new description logic requires formal proofs of its properties as a decision procedure, as well as careful selection of a set of constructors.

We will conduct a systematic analysis of 1) the structure of description logics, based on the example of the description logic ALC; 2) the main reasoning problems that can be stated and solved in ALC; 3) the concept of a knowledge base in the context of description logics; 4) fundamental properties of description logics; 5) model-theoretic semantics and tableaux-based method for the inference of logical conclusion.

2. Description logics

Description logics are a family of logic-based formal languages for knowledge representation and reasoning that date back to the late 1980s. The notion “description logic” corresponds to the ontological commitments of this formalism: the main aspects of the domain of interest are presented in the form of a “description” of its entities (concepts) and relationships (roles), which are created by the constructors (axioms) of the selected description logic. For the sake of knowledge reuse, syntactic expressions are classified by their generality and divided into two parts – a set of terminological axioms, TBox, and a set of statements about individuals, ABox, which together form the knowledge base, KB, also known as an ontology.

A fundamental property of description logics is the presence of formal semantics, which allows to interpret correctly syntactic expressions, axioms, and operators, thus creating an unambiguous, shared understanding of when the statement itself is a logical consequence from the knowledge base. Since description logics are decidable fragments of predicate logic, the logical inference is executed using automated reasoning methods, which are based on the principles of semantic admissibility and deductive derivability of a statement. The peculiarity of reasoning in descriptive logics is that computation is always carried out over the entire knowledge base. The conceptual trade-off between the expressiveness of the representation language and the computational complexity of reasoning problems, as well as the number of statements in the knowledge base, give rise to the need for a variety of description logics. In particular, the family of description logics EL is characterized by limited expressiveness of the representation language but guarantees a polynomial time solution for some reasoning problems in large ontologies.

The expressiveness of description logics is always limited. This property is forced by the need to ensure the decidability of reasoning problems and can be tightened to ensure the decidability of reasoning problems in polynomial time with regards to the size of the knowledge base. Decidability, especially in polynomial time, is a major factor that promotes the use of description logics in knowledge base engineering, therefore the design of varieties of specialized description logics requires careful analysis and formal proofs of their properties. This need is caused by the fact that a synthesis of decidable description logic extensions can lead to undecidability. If the expressiveness of description logic is insufficient to represent the domain of interest, then instead of using undecidable description logics, it is better to build a custom application around the decidable DL or use alternative knowledge representation models. The definition of semantic structures that can be represented in description logic is carried out within the framework of formal model theory, which in particular defines the semantics of first-order logic.

Some kinds of description logics go beyond first-order logic and can use operators of modality, the concepts from fuzzy logic, or probability theory. Syntactically, the statements of description logics are similar by form to the expressions of modal logics, but their truth values are established according to A. Tarski’s semantic truth theory [3], which is the paradigm of truth definition in model theory. Despite the differences that are present in the varieties of description logics, the process of ontology engineering is mostly the same. At first, we define the dictionary of the domain of interest, which is later formalized into the TBox statements. Some applications may use only TBox, while others may the whole ontology and even databases. The logical consistency of an ontology structure, as well as processing of user queries to a knowledge base, are ensured by a module of automatic or semi-automatic logical reasoning, which is usually provided as part of an ontology development environment (i.e., HermiT reasoner [4] in Protégé IDE [5]).

3. Description logic ALC

The use of description logics for the conceptual modeling of the domain of interest involves the description of important abstractions that are inherent to it, as well as the individuals who are the instances of these abstractions. Description logics have four major components that participate in the process of creating a formal representation of the domain [2]: 1. Concepts are abstractions that are built from the concept names and role names using constructors of description logic. In model theory, the concept is interpreted as a set of elements, and it corresponds to a unary predicate in first-order logic. 2. Roles (role names) are the relationships between two concepts or individuals. In model theory, the role is interpreted as a set of pairs of elements that belong to the appropriate concept extensions, and it corresponds to a binary predicate in first-order logic. 3. Individuals (instances) are the embodiments of concept abstractions in the form of objects and are designed to name the object of a particular abstraction. In model theory, the individual is interpreted as an element of a concept’s extension set, and it corresponds to a constant in first-order logic. 4. Concept language is a formal language that creates concepts and roles descriptions from primitives like concept names and role names, by the means of description logic operators. An interpretation of the concept language operators in a model theory puts them in line with the operators on sets, while in first-order logic they are compared with the operators on logical statements.

An attributive language with complements, ALC, belongs to the family of description logics AL – attributive languages that provide operators of atomic negation, concepts intersection, universal and existential quantifiers. Unlike the AL family, where only the complements of atomic concepts are available, the ALC language allows the complement of any well-formed concept [3]. The set of atomic concepts consists of the concept names, “top” ⊤ and “bottom” ⊥ concepts; the rest are compound (complex) concepts because they are formed using the concept language operators. Thus, a set of ALC concept descriptions consists of concept names, role names, individual names, ⊤ and ⊥, concepts intersection and union, concept negation (complement), existential and value restrictions. The syntax of a formal language allows distinguishing well-formed expressions from arbitrary ones. An inductive definition of ALC concepts’ syntax is as follows [2]: • Every concept name is an ALC concept description; • ⊤ and ⊥ are ALC concepts descriptions; • If C and D are ALC concept descriptions and r is a role name, then the following are also ALC concept descriptions: o C ⊓ D (conjunction or intersection of concepts); o C ⊔ D (disjunction or union of concepts); o ¬C (negation or complement of concept); o ∃r.C (existential restriction); o ∀r.C (values restriction).

The formal semantics of a language allows to establish unambiguously the meaning of its wellformed expressions and is also required for the introduction of a truth predicate, which assigns a truth value to the logical expressions in the language. Semantics can be considered as a metalanguage, which contains the structures that are described by the syntax of the corresponding formal language. Since metalanguage is also a language, it is possible to construct its own semantics, and so on. A. Tarski’s semantic truth theory [3] states: to avoid the Russell’s or liar’s paradoxes in a formal language, the truth predicate must be defined exclusively in a metalanguage. This paradigm preserves the consistency of the logical systems, which allow forming the statements like “This sentence is a lie”; the truth value of this statement is determined outside the language that was used for its formulation.

The semantics of description logics is an “interpretation” – a structure that is characterized by the following properties: 1) consists of a non-empty set of elements, which is called interpretation domain; 2) for each concept name defines its extension, i.e., a subset of elements from the interpretation domain; 3) for each role name defines a set of pairs of elements from the interpretation domain, which are related to each other by this role. Formally, an interpretation = (∆ , · ) consists of a non-empty set ∆ , which is called the interpretation domain, and a mapping · , which maps each concept name ∈ to a set ⊆ ∆ , and each role name ∈ to a binary relation ⊆ ∆ × ∆ . Interpretation of the other concept descriptions is carried out by extending the mapping · with the following rules [2]: • ⊤ = ∆ (entire interpretation domain), • ⊥ = ∅ (empty set); • ( ⊓ ) = C ∩ (intersection of concepts extensions); • ( ⊔ ) = C ∪ (union of concepts extensions); • (¬С) = ∆ \C (interpretation domain without concept extension); • (∃ . ) = { ∈ ∆ | exists ∈ ∆ with ( , ) ∈ and ∈ С }; • (∀ . ) = { ∈ ∆ | for all ∈ ∆ , if ( , ) ∈ , then ∈ С }.

The properties of an interpretation are limited only by the rules that are explicitly specified in its definition, namely: the interpretation domain should not be empty, but can be of arbitrary power, even infinite; concept extension may contain any number of elements, from zero to the entire interpretation domain; the role name extension may contain any number of pairs of elements, from zero to all possible pairs. Modification of the interpretation , by adding or removing elements from interpretation domain, concepts extensions, or roles extensions, gives rise to the alternative interpretations. For the sake of interpretation structure analysis, it is convenient to visualize it in the form of an oriented graph, where the arcs and vertices are labeled. In this graph, a vertex represents an element from the interpretation domain, while its labels denote the concepts that contain the specified element in their extensions; an arc represents a pair of elements contained in the extension of some role r, and the arc label corresponds to the name of that role.

Example 1. Create a model of the domain “IT project structure” using the description logic ALC and define its interpretation in the context of the “Smart City” project.

Solution. The first step of modeling is to determine the concept and role names that are important for us in the domain of interest. Let the set of concept names = {Engineer, Technology, Project}, the set of role names = {is_part_of, has}. The second step of modeling is to define an interpretation for the elements of these sets. Concepts and roles extensions usually adhere to the desired context and purposes of modeling: ∆ = {Andy, Mary, Javascript, UML, Smart_City}; Engineer = {Andy, Mary}; Technology = {JavaScript, UML, Smart_City}; Project = {Smart_City}; is_part_of = {(Andy, Smart_City), (UML, Smart_City)}; has = {(Mary, UML), (Andy, Javascript)}.

The third step of modeling consists in the formalization of interpretation elements using the syntax of description logic ALC. It should be noted that there might be multiple options for interpretation elements representation and each of them might be valid, but their efficiency for achieving the goals of modeling might vary. For example, the instance “Smart_city” belongs to the extensions of the atomic concept Project and the compound concept Project ⊓ Technology, as well as the extensions of the concepts like ¬Engineer and ⊤. The instance “Andy” belongs to the extensions of the concepts Engineer, ∀has. Technology, ∃is_part_of.Project and many others. Moreover, all elements of the interpretation domain always belong to the extension of the top concept “⊤” and at the same time, none of them belong to the extension of the bottom concept “⊥”, which corresponds to the state of absurdity in the systems of logic.

The specified interpretation of the conceptual model of the domain of interest can be visualized in the form of a directed graph, which facilitates the analysis of the relationships between the elements of concepts extensions and simplifies the task of clusters identification: • • • • • • • • • • • properties of interpretation [2]: ⊤

= ( ⊔ ¬С) (the law of excluded middle, LEM); ⊥ = ( ⊓ ¬С) (absurdity); (¬¬С) = C (double negation elimination); ¬( ⊓ ) = (¬ ⊔ ¬ ) (de Morgan’s law); ¬( ⊔ ) = (¬ ⊓ ¬ ) (de Morgan’s law); (¬(∃ . )) = (∀ . ¬ ) (variation of de Morgan’s law); (¬(∀ . )) = (∃ . ¬ ) (variation of de Morgan’s law).

4. Knowledge bases in ALC

The ability to construct formal descriptions of the domain concepts and to combine them using concept language operators is an important, but not sufficient achievement. The sequences of syntactic transformations and combinations of formal descriptions can be infinite, so having limited time and memory resources, it makes sense to remember useful (by naming) and connected (by relations) descriptions, to avoid repetitive calculations. As a result, the following syntactic constructions are introduced to ALC description logic: ⊑

≡ : ( , ): = C

⊆ = C ⊆ and ⊆ C = ∈ С = ( , ) ∈ (concept inclusion); (concepts equivalence); (concept assertion); (role assertion).

These relations allow structuring the elements of the knowledge base in accordance with the classical approaches to systematization and reuse of concepts and their properties. For example, the equivalence operator ≡ establishes the meaning of the concept through the description of the concept , which corresponds to the encyclopedic format of the concept’s definition. The representation of background knowledge is conveniently carried out through the operator of concept inclusion ⊑, which marks that concept belongs to the more general concept , and thus implicitly inherits from some properties and operational restrictions. Representation of concrete contexts requires the specification of concept and role instances in the form of assumptions. The operator “:” establishes the affiliation of individual to the concept and a pair of individuals ( , ) to the role , respectively. The usage of the specified operators is presented in the following examples:

Technology ⊑ Methods ⊔ Tools Engineer ≡ Human ⊓ ∃has. Technology Mary ∶ Engineer UML ∶ Technology

(Mary, UML) ∶ has

In description logics, the knowledge base is usually divided into two structural parts: terminological, TBox, and declarative, ABox. In general-purpose knowledge bases or terminologies, the declarative part may be missing. The terminological part contains statements that describe the concepts and relationships between them (⊑, ≡), while declarative consists of statements that describe individuals of concepts and relationships between them in the form of roles (:). Comparing the structure of the knowledge base and the database, it is possible to establish a significant similarity between the TBox and the database schema, as both contain general limitations: the form of the conceptual model of the domain and the form of the data, respectively. The ABox is similar to the database content because it contains expressions related to the concrete instances of concepts and roles.

Formally, TBox consists of a finite set of statements of the form ⊑ , where and can be descriptions of both atomic and compound concepts. If ⊑ and ⊑ , the notation ≡ is used for the ease of operating. Each of these statements is called a general concept inclusion (GCI) or an axiom [2]. An interpretation satisfies GCI if C ⊆ is satisfied. If interpretation satisfies every GCI containing TBox , then it is called a model of . The larger number of GCI TBox contains, the fewer models exist for it, because for two , ′, if ⊆ ′ then each model ′ is a model . TBox divides a set of interpretations into two parts: a set of models and a set of interpretations that do not satisfy all GCIs in . Thus, focuses the modeling efforts exclusively on interpretations that meet the designer’s intuitive expectations about the structure of the domain.

The declarative part of the knowledge base, ABox, consists of a finite set of assertions, which are represented in the form : , the concept assertion, or ( , ): , the role assertion. It is assumed that an interpretation mapping · maps every instance to the element ∈ ∆ . An interpretation satisfies the concept assertion : , if ∈ С , and satisfies the role assertion ( , ): , if ( , ) ∈ . If interpretation satisfies all concept and role assertions contained in the ABox , then it is called a model of .

The combination of TBox and ABox forms an ALC knowledge base = ( , ), in which concepts and their definitions are specified in the terminological part, and then used in the declarative part. An interpretation , which is both a model of and a model of , is called a model of . A knowledge base is similar to a database by structure, but they significantly differ in the approach of ensuring data consistency. If the data is not present in the database, then its validity is considered as “false” and can be interpreted as a violation of data integrity. For the knowledge base, it is quite acceptable to have a description of a concept in TBox without a description of the concept instances in ABox. This paradigm is called the “open world assumption” [6].

The syntax of description logic ALC allows the creation of cyclic concept descriptions. For instance, the concept ≡ ℎ ⊓ ∀ . is cyclic since the concept name appears in the description of its own definition. Such expressions can be satisfied by multiple interpretations and can lead to ambiguity of solutions for some reasoning problems. This issue can be mitigated by the introduction of a syntactic restriction, which increases the defining power of TBox and introduces the notion of acyclic TBox. It is worth noting that the form of cyclicity of the concept is simple and obvious, but there are hidden forms of cyclicity that arise due to the nesting of concept descriptions.

Let the expression ≡ be a definition of the concept , where is a possibly compound concept, and is a finite set of concept descriptions. If ≡ ∈ and the concept name appears in , then directly uses . The concept uses if directly uses , or there is a concept name ′, such that uses ′, and ′directly uses . A finite set of concepts descriptions is called an acyclic TBox, if: 1) there are no concept names in that use themselves; 2) there are no concept names that appear more than once in the left part of the definition of the concept in . If is an acyclic TBox with ≡ ∈ , then is called strictly defined in , and is called the definition of in .

The importance of an acyclic TBox is that it can be unfolded into ABox by interpreting concept descriptions as macros. The mechanism of the process is to recursively replace all concept names in ABox with their definitions from TBox. Let = ( , ) be a knowledge base with acyclic TBox, which has the form = { ≡ | 1 ≤ ≤ }. Let 0 = and +1 be the result of the following substitutions: 1) find some : ∈ , in which directly uses , 1 ≤ ≤ ; 2) replace in all with . If no more substitutions can be applied in , then is the result of unfolding into . Unfolding is an important step in the process of solving many reasoning problems, but it can cause an exponential explosion of ABox size. Therefore, it is important to remember the strategies of the lazy and greedy concept descriptions unfolding.

5. Reasoning problems in ALC

The structure of a knowledge base in description logics allows to model concepts, roles, individuals, and relations between them. The number of statements in the knowledge base, the nesting of concept descriptions, model discovery, establish the need for reliable and effective methods of logical reasoning, which would allow obtaining answers about the important properties of a knowledge base. Let = ( , ) is an ALC knowledge base, and are possibly compound ALC concepts, and is an individual name. Then, the basic reasoning problems inherent to the ALC knowledge base can be defined as follows [2]: • is satisfied with respect to if there exists a model of and some element ∈ ∆ such that ∈ ; • is included in with respect to , and is written as ⊨ ⊑ if C ⊆ in all models of ; • is equivalent to with respect to , and is written as ⊨ ≡ if C = in all models of ; • is consistent if there is a model of ; • is an instance of with respect to , and is written as ⊨ : if ∈ С in all models of .

The sign of the semantic logical consequence ⊨, in contrast to the sign of the deductive logical consequence ⊢, is based on the inclusion of models of statements that it connects. When is fixed, inclusion and equivalence with respect to are binary relations of possibly compound concepts, so to denote this fact, the notations ⊑ and ≡ are used, respectively. The problems of satisfaction, inclusion and equivalence are defined with respect to TBox only, but the problems of consistency between the knowledge base and concept’s instance are defined with respect to both, TBox and ABox. The consistency property of TBox only, or ABox only, is considered as a case of consistency of the knowledge base of the specific form: = ( , ∅) for TBox and = (∅, ) for ABox.

The relation of concept inclusion ⊑ is transitive, and the ALC description logic is monotonic, i.e., the more statements the knowledge base contains, the more logical consequences it allows. Equivalences that can be derived from every TBox are called tautologies, for example, ⊨ ⊑ if and only if ⊨ ⊤ ⊑ (¬ ⊔ ). It is important to note that there are also hidden relationships between the basic problems of reasoning, which contribute to the process of their solution. Let = ( , ) is an ALC knowledge base, and are possibly compound ALC concepts, and is an individual name. Then the following statements hold: • C ≡T D if and only if C ⊑T D and D ⊑T C; • C ⊑T D if and only if C ⊓ ¬D is not satisfied with respect to T; • C is satisfied with respect to if and only if C ⋢ ⊥; • C is satisfied with respect to if and only if (T, {b: C}) is consistent; • (T, A) ⊨ b: C if and only if (T, A ∪ {b: ¬C}) is inconsistent; • If T is acyclic and A′ is the result of the unfolding of T into A, then K is consistent if and only if (∅, A′) is consistent.

The outcome of the specified relationships between the reasoning problems is the fact that all the basic problems can be reduced to a common one – the consistency of a knowledge base. Thus, the presence of an efficient algorithm for calculating the consistency property of a knowledge base allows solving the other basic reasoning problems. It is worth noting that there are also reasoning problems that can’t be reduced to the problem of knowledge base consistency or cause an exponential increase of the problem size during the reduction, i.e., query answering using conjunction.

Knowledge base engineering usually involves gradual filling of the TBox and ABox with statements about the domain of interest. Since the formal syntax, and concepts nesting, introduce some complexity for an engineer on establishing the properties of a knowledge base, there is a need for the auxiliary tools that will evaluate the consequences of knowledge base extension or modification. In description logics, this task is performed by the reasoning services, which injectively correspond to the basic problems of reasoning: • Given a TBox and a concept , determine whether is satisfied with respect to ; • Given a TBox and the concepts and D, determine whether is included in with respect to ; • Given a TBox and the concepts and , determine whether is equivalent to with respect to ; • Given a knowledge base K = ( , ), determine whether is consistent; • Given a knowledge base K = ( , ), a concept , and an individual name , determine whether is an instance of with respect to .

These descriptions are specifications of the reasoning services, which might be implemented in multiple ways. Additionally, the basic reasoning services can be reused to create more intelligent services that reveal the potential of an automated retrieval of a set of homogeneous objects and solving reasoning problems about them. The “generalized” reasoning services include the following ones [2]: • Classification of a TBox: given a TBox , determine the hierarchy of inclusions of all concept names that are present in with respect to . That is, for each pair of concept names and that are defined in , determine whether ⊑ and whether ⊑ . The result of the service is an inclusion hierarchy, which can be conveniently presented in the form of a graph or Hesse diagram for partial order; • Checking the satisfiability of concepts in T: given a TBox , for each concept name , determine whether is satisfied with respect to . The dissatisfaction of a concept is the sign of a modeling error; • Instance retrieval: given a concept and a knowledge base , determine all the individual names such that is an instance of with respect to . The result of the service is a set of individual names that affiliates with the concept ; • Realisation of an individual name: given an individual name and a knowledge base , determine all the concept names that are defined in , for which is an instance of with respect to . The result of the service is a set of concept names that are the realisation of an individual name.

6. Reasoning in ALC with tableaux algorithm

Reasoning over the statements in a knowledge base is a process that facilitates the achievement of several goals: 1) substantiation or refutation of the existing knowledge; 2) generation of the new knowledge; 3) finding solutions to decision-making problems; 4) shared ontologies engineering [7]. In logic, the reasoning process is considered as a “mechanized” syntactic calculus over the statements of a formal system and consists in the application of deductive derivation rules or constructive properties of the associated semantics. A reasoning method should possess three qualitative properties, decidability, completeness, and soundness, to be considered a decision-making procedure in a system of formal logic. Decidability requires the presence of proof that a reasoning method terminates for all admissible kinds of input data, and potentially proof of the existence of an “effective” algorithm, which is characterized by acceptable time complexity. The completeness requires proof that a “valid” result will be returned for all admissible input data with the desired property, while soundness requires proof that a “valid” result will be returned only for the admissible input data with the desired property.

There are a variety of reasoning methods associated with the description logic ALC, which are based on different calculus approaches. The most widespread reasoning methods are the resolution method, sequent calculus, automata-based calculus, tableaux methods, query rewriting [8]. For the expressive description logics, the tableaux methods are a default choice, i.e., the hyper-tableaux method is a core of HermiT reasoner, which integrates as a module into the Protégé ontology development environment. Since most of the basic reasoning problems in the description logic ALC can be reduced to the problem of knowledge base consistency, the properties of the tableaux method as a decision-making procedure will be formulated relative to the input data of the knowledge base consistency problem. The proofs of the tableaux method properties are based mainly on the semantics of the description logic ALC, namely the structural features of those models that can be expressed in the ALC concept language.

By definition, a knowledge base is consistent if there is a model of ; this model is called a witness to the consistency of K and is denoted by ⊨ . The core idea of tableaux methods is to prove the consistency of a knowledge base = ( , ) by constructing a model of . A knowledge base is inconsistent, if during the construction of a witness to the consistency of some obvious contradictions are revealed that refute the possibility of the existence of model of . The process of witness construction starts with an and, if necessary, extends it with the outcomes of restrictions that are imposed by the semantics and the axioms contained in or . The efficiency and validity of tableaux methods are ensured by the structure of a witness they are designed to build, namely the tree model. The tree model significantly limits the number of potential witnesses (efficiency), facilitates detection of structural loops in tree branches (decidability), and allows proving the existence of possibly infinite models using the method of structural induction (soundness). Moreover, the proof of absence of a tree model for a knowledge base indicates the absence of any model for (completeness).

Tableaux method’s principles can be conveniently analyzed in the context of three situations: for = (∅, ), for = ( , ), where is acyclic, and in the general case for = ( , ). Since all the concepts contained in or can be transformed to a negated normal form using de Morgan’s laws or duality between the quantifiers of existence and universality, the expression ¬̇ С will correspond to the negated normal form of ¬ . An ABox is considered normalized if the following conditions are met: • all concepts contained in or are in the negated normal form, so the negation operator is applied only to the concept names and is not preceded by parentheses, quantifiers, or operators; • each individual name contained in is present in at least one statement of the form : ; • is not empty, thus contains statements.

For the case = (∅, ), when TBox is empty, the expansion rules mirror the semantics of compound concepts and apply it to the statements in . Since the main principle of the tableaux method is to decompose compound concepts into sub-concepts, there always exists an iteration, where all compound concepts are completely decomposed into atomic ones, and the algorithm terminates. Each satisfied ALC concept has a tree-like model, and since the concept instances can be related by roles, a consistent ALC knowledge base has a forest-like model. If is consistent, then it has a model that consists of a nonempty set of disjointed trees, where the root of a tree represents some individual name in and can be connected by edges with the roots of other trees. The tableaux method builds a forestlike model of an ABox using syntactic expansion rules, which extend with the new statements only if their conditions are met.

The tableaux method applies the expansion rules to the ABox statements until is complete because it is easier to detect logical contradictions or clashes in the complete ABox. An ABox contains a clash if for some individual name and for some concept , { : , : ¬ } ⊆ ; if does not contain a clash, then is called clash-free. is called complete if it contains a clash or none of the expansion rules can be applied due to unmet conditions.

Since any ALC ABox can be transformed to a normalized form, without reducing the generality, the algorithm for determining the consistency of an ABox will take as an input a normalized ABox. Then, the tableaux algorithm for ALC ABox consistency can be presented via the following pseudo-code [2]: Algorithm ( ):

If ( ) ≠ ∅ Then return «consistent»

Else return «inconsistent» Procedure ( ):

If is not complete Then select expansion rule , whose conditions are satisfied in , and select few statements from , which are eligible for application If exists ′ ∈ exp( , , ), such that ( ′) ≠ ∅

Then return ( ′)

Else return ∅ Else

If contains a clash

Then return ∅

Else return

Here, the auxiliary function exp( , , ) returns a set of ABox, which are formed by the application of rule to the statement in . All the expansion rules, except the ⊔-rule, are deterministic and return a set with one ABox. Depending on the way of its application, the ⊔-rule can generate two different ABox, and it is mandatory to verify their consistency. The way of selecting the sequence of expansion rules and statements does not affect the result of the tableaux method but can drastically affect its performance.

The tableaux method for establishing the consistency of a knowledge base = ( , ), where is an acyclic TBox, is largely based on the expansion rules and procedures used for = (∅, ). This property is due to the fact that an acyclic TBox can be entirely unfolded into ABox , and it actually reduces the current consistency problem to the previous case. The greedy strategies of unfolding into can lead to an exponential increase of the knowledge base size, so it is recommended to use the “lazy” unfolding strategies, which unfold concept descriptions only when necessary [9]. Below are the lazy unfolding rules, which together with the previous expansion rules form the basis of a tableaux method for establishing the consistency of a knowledge base with an acyclic TBox.

Establishing the consistency in the general case of a knowledge base = ( , ), where may contain cyclic concept descriptions, requires some auxiliary constructions that will ensure decidability. Without reducing the generality, the input parameter of a tableaux algorithm is a normalized knowledge base = ( , ) that consists of a normalized TBox and a normalized ABox . A TBox is called normalized if all its axioms have the form ⊤ ⊑ , where is reduced to the negated normal form. Reduction of axioms to the expected form can be carried out using the following properties: • satisfies ⊑ ⟺ satisfies ⊤ ⊑ ⊔ ¬С; • satisfies ≡ ⟺ satisfies ⊤ ⊑ ( ⊔ ¬С) ⊓ ( ⊔ ¬ ).

The set of expansion rules for a knowledge base consistency consists of five elements, and three of them, namely ⊓-rule, ⊔-rule, ∀-rule, are completely identical to the ABox expansion rules. The other two rules are also similar to their counterparts but have some important changes: ⊑-rule uses a normalized form of TBox axioms, while ∃-rule contains an additional requirement for a “non-blocking” individual, which is a property that ensures decidability or algorithm termination.

A termination problem of the tableaux algorithm for general knowledge base consistency is caused by the possibility to combine ⊑-rule and ∃-rule, which without the blocking mechanism can generate the infinite repetitive sequences of individuals [10]. If an ABox does not contain clashes, then its model can describe the infinite sequences using loops. During the execution of the tableaux algorithm, only the ∃-rule adds new vertices (individuals) and edges (roles) to the ABox tree model. If the ∃-rule adds a new individual and a new role assertion ( , ): , then is the successor of and is the predecessor of . The transitive closure of roles between successor and predecessor creates the notions of descendant and ancestor, respectively. Let ( ) = { | : ∈ } is a set of concept names that are present in the concept assertions of the form : . The individual name is blocked in ALC ABox if the following conditions are met: 1) is the ancestor of ; 2) ( ) ⊇ ( ). Thus, if an individual name is blocked, then all its descendants are also blocked. Since root individuals don’t have ancestors, they are never blocked.

Example 2. Verify the consistency of an ALC knowledge base = ( , ) of the “Mass media” domain using the tableaux method. The structure of the knowledge base is as follows: = {

News ⊑ ∀has. Author; Author ≡ Human ⊔ Organization ⊔ Bot; Bot ⊑ ¬Human ⊓ ∀has. Author;

Organization ⊑ ∃has. Human; }; = { a: News; (a, b): has; b: Bot; (b, c): has; c: Organization; }.

Solution. The first step consists in a normalization of the knowledge base, which serves as an input argument to the tableaux method. Note that the ABox is already normalized, but the statements from TBox still require normalization, which can be performed using the properties stated above and de Morgan’s laws: = { ⊤ ⊑ ∀has. Author ⊔ ¬News; ⊤ ⊑ (Human ⊔ Organization ⊔ Bot) ⊔ ¬Author; ⊤ ⊑ (¬Human ⊓ ∀has. Author) ⊔ ¬Bot; ⊤ ⊑ ∃has. Human ⊔ ¬Organization;

Upon providing the normalized knowledge base to the tableaux algorithm, its execution begins. At each iteration, the algorithm selects an expanding rule and a statement (or a set of statements) that satisfy its conditions. Let’s select ⊑-rule and apply it to the statement ⊤ ⊑ ∀has. Author ⊔ ¬News. The specified statement belongs to T, a: News ∈ A, and a: ∀has. Author ∉ A, thus a new concept assertion is added to ABox A = A ∪ {a: ∀has. Author}.

Let’s apply the ⊑-rule to the rest of the statements contained in the TBox. The statement ⊤ ⊑ (Human ⊔ Organization ⊔ Bot) ⊔ ¬Author does not satisfy all conditions of the ⊑-rule, namely d: Author ∉ A. The other statements do satisfy the rule conditions, so they can be expanded. After an application of the ⊑-rule to the statements in T, the ABox A is as follows: = { a: News; (a, b): has; b: Bot; (b, c): has; c: Organization; a: ∀has. Author; b: ¬Human ⊓ ∀has. Author; c: ∃has. Human; }.

Next, let’s apply the ∃-rule to the statement c: ∃has. Human. Since the individual name c ∈ A and corresponds to the root in a tree model, then c is not blocked. Because ∄d, {(c, d): has, d: Human} ⊆ A, the statement c: ∃has. Human can be expanded by ∃-rule. As a result, the following statements are added to A: (c, d): has and d: Human, where is the new individual name in A.

= { a: News; (a, b): has; b: Bot; (b, c): has; c: Organization; a: ∀has. Author; b: ¬Human ⊓ ∀has. Author; c: ∃has. Human; (c, d): has; d: Human; }.

At the next iteration, let’s select the ⊓-rule and apply it to the statement b: ¬Human ⊓ ∀has. Author. Both conditions of the ⊓-rule are satisfied, b: ¬Human ∉ A and b: ∀has. Author ∉ A, therefore A = A ∪ {b: ¬Human, b: ∀has. Author}. Then, let’s apply the ∀-rule to the statements a: ∀has. Author and b: ∀has. Author. Both of them satisfy the conditions of ∀-rule, (a, b): has ∈ A, b: Author ∉ A and (b, c): has ∈ A, c: Author ∉ A, thus A = A ∪ {b: Author; c: Author}. After the application of ⊓-rule and ∀-rule, the ABox A is as follows: = { a: News; (a, b): has; b: Bot; (b, c): has; c: Organization; a: ∀has. Author; b: ¬Human ⊓ ∀has. Author; c: ∃has. Human; (c, d): has; d: Human; b: ¬Human; b: ∀has. Author; b: Author; c: Author; }.

Now let’s apply the ⊑-rule to the statement ⊤ ⊑ (Human ⊔ Organization ⊔ Bot) ⊔ ¬Author twice, considering that {b: Author; c: Author} ∈ A. Therefore, A = A ∪ {b: Human ⊔ Organization ⊔

Bot; c: Human ⊔ Organization ⊔ Bot}. Conditions of the ⊔-rule are not satisfied because b: Bot ∈ A and c: Organization ∈ A. The other rules are not applicable to the statements in A or T, so A is complete. Moreover, A does not have any explicit clashes of the form {x: D; x: ¬D} ⊆ A, so A is clash-free.

Answer. The ALC knowledge base = ( , ) of the “Mass media” domain is consistent. The tableaux method has created a witness to the consistency of the knowledge base , namely a model of . Its visual representation contains all individuals and the atomic concepts that they belong to, as well as the roles that connect them.

7. Conclusion

System analysis of the description logic ALC, which acts as a basis for more expressive description logics like SROIQ(D), identified the main principles of knowledge representation in the description logics, their model-theoretic semantics, the notion and structure of knowledge bases, core reasoning problems and reasoning over them with the tableaux method. An application of the ALC formalism to the representation of the “Mass media” domain highlighted the peculiarities of knowledge specification and the inference of implicit knowledge from the existing one. The benefits of automated reasoning are especially noticeable in large and complex knowledge bases.

8. Acknowledgements

The authors would like to thank Franz Baader, Ian Horrocks, Carsten Lutz, and Uli Sattler for writing “An Introduction to Description Logic” [2]. Many of the formal definitions and notations associated with the described logic are given in this excellent textbook.

9. References

[1] W3C, OWL 2 Web Ontology Language Direct Semantics (Second Edition), 2012. URL: https://www.w3.org/TR/owl2-direct-semantics. [2] F. Baader, I. Horrocks, C. Lutz, U. Sattler, An introduction to description logic, Cambridge

University Press, Cambridge, 2017. doi:10.1017/9781139025355. [3] A. Tarski, Pojęcie prawdy w językach nauk dedukcyjnych, Prace Towarzystwa Naukowego

Warszawskiego, Wydział III Nauk Matematyczno-Fizycznych, volume 34, 1933. [4] The Data and Knowledge Group at the Oxford University, HermiT OWL Reasoner. URL: http://www.hermit-reasoner.com. [5] Stanford Center for Biomedical Informatics Research at the Stanford University School of Medicine, Protégé: a free, open-source ontology editor & framework for building intelligent systems. URL: https://protege.stanford.edu. [6] C. M. Keet, Open World Assumption, in: W. Dubitzky, O. Wolkenhauer, K. H. Cho, H. Yokota (Eds.), Encyclopedia of Systems Biology, Springer, New York, NY, 2013, p. 1567. doi:10.1007/978-1-4419-9863-7_734. [7] R. Brachman, H. Levesque, Knowledge Representation and Reasoning, Morgan Kaufmann

Publishers Inc, San Francisco, CA, 2004. [8] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, P. F. Patel-Schneider, The description logic handbook: theory, implementation, and applications, 2nd. ed., Cambridge University Press, Cambridge, 2007. doi:10.1017/CBO9780511711787. [9] D. Scott, The lattice of flow diagrams, in: E. Engeler (Ed.), Symposium on Semantics of Algorithmic Languages, volume 188 of Lecture Notes in Mathematics, Springer, Berlin, Heidelberg, 1971, pp. 311–366. doi:10.1007/BFb0059703. [10] H. Levesque, R. Brachman, Expressiveness and tractability in knowledge representation and reasoning, Computational Intelligence, volume 3, issue 1, (1987) 78–93. doi:10.1111/j.14678640.1987.tb00176.x.