=Paper=
{{Paper
|id=Vol-258/paper-33
|storemode=property
|title=Completing Description Logic Knowledge Bases using Formal Concept Analysis
|pdfUrl=https://ceur-ws.org/Vol-258/paper16.pdf
|volume=Vol-258
|dblpUrl=https://dblp.org/rec/conf/owled/BaaderGSS07
}}
==Completing Description Logic Knowledge Bases using Formal Concept Analysis==
Completing Description Logic Knowledge Bases
using Formal Concept Analysis?
Franz Baader1 , Bernhard Ganter1 , Ulrike Sattler2 and Barış Sertkaya1
1
TU Dresden, Germany
2
The University of Manchester, UK
Abstract. We propose an approach for extending both the terminolog-
ical and the assertional part of a Description Logic knowledge base by
using information provided by the knowledge base and by a domain ex-
pert. The use of techniques from Formal Concept Analysis ensures that,
on the one hand, the interaction with the expert is kept to a minimum,
and, on the other hand, we can show that the extended knowledge base
is complete in a certain, well-defined sense.
1 Introduction
Description Logics (DLs) [1] are employed in various application domains, such
as natural language processing, configuration, databases, and bio-medical ontolo-
gies, but their most notable success so far is due to the fact that DLs provide
the logical underpinning of OWL, the standard ontology language for the seman-
tic web [6]. As a consequence of this standardization, several ontology editors
support OWL [9, 10, 8], and ontologies written in OWL are employed in more
and more applications. As the size of these ontologies grows, tools that sup-
port improving their quality become more important. The tools available until
now use DL reasoning to detect inconsistencies and to infer consequences, i.e.,
implicit knowledge that can be deduced from the explicitly represented knowl-
edge. There are also promising approaches that allow to pinpoint the reasons
for inconsistencies and for certain consequences, and that help the ontology en-
gineer to resolve inconsistencies and to remove unwanted consequences [13, 7].
These approaches address the quality dimension of soundness of an ontology,
both within itself (consistency) and w.r.t. the intended application domain (no
unwanted consequences). In the present paper, we are concerned with a different
quality dimension: completeness. We provide a basis for formally well-founded
techniques and tools that support the ontology engineer in checking whether an
ontology contains all the relevant information about the application domain, and
to extend the ontology appropriately if this is not the case.
A DL knowledge base (nowadays often called ontology) usually consists of
two parts, the terminological part (TBox), which defines concepts and also states
?
Supported by DFG (GRK 334/3) and the EU (IST-2005-7603 FET project TONES
and NoE 507505 Semantic Mining).
additional constraints (so-called general concept inclusions, GCIs) on the inter-
pretation of these concepts, and the assertional part (ABox), which describes
individuals and their relationship to each other and to concepts. Given an appli-
cation domain and a DL knowledge base (KB) describing it, we can ask whether
the KB contains all the relevant information about the domain: Are all the rel-
evant constraints that hold between concepts in the domain captured by the
TBox? Are all the relevant individuals existing in the domain represented in the
ABox?
As an example, consider the OWL ontology for human protein phosphatases
that has been described and used in [16]. This ontology was developed based on
information from peer-reviewed publications. The human protein phosphatase
family has been well characterised experimentally, and detailed knowledge about
different classes of such proteins is available. This knowledge is represented in the
terminological part of the ontology. Moreover, a large set of human phosphatases
has been identified and documented by expert biologists. These are described as
individuals in the assertional part of the ontology. One can now ask whether the
information about protein phosphatases contained in this ontology is complete.
Are all the relationships that hold among the introduced classes of phosphatases
captured by the constraints in the TBox, or are there relationships that hold
in the domain, but do not follow from the TBox? Are all possible kinds of
human protein phosphatases represented by individuals in the ABox, or are
there phosphatases that have not yet been included in the ontology or even not
yet been identified?
Such questions cannot be answered by an automated tool alone. Clearly,
to check whether a given relationship between concepts—which does not follow
from the TBox—holds in the domain, one needs to ask a domain expert, and the
same is true for questions regarding the existence of individuals not described in
the ABox. The rôle of the automated tool is to ensure that the expert is asked as
few questions as possible; in particular, she should not be asked trivial questions,
i.e., questions that could actually be answered based on the represented knowl-
edge. In the above example, answering a non-trivial question regarding human
protein phosphatases may require the biologist to study the relevant literature,
query existing protein databases, or even carry out new experiments. Thus, the
expert may be prompted to acquire new biological knowledge.
Attribute exploration is an approach developed in Formal Concept Analysis
(FCA) [5] that is used to acquire knowledge about an application domain by
querying an expert. One of the earliest applications of this approach is described
in [15], where the domain is lattice theory, and the goal of the exploration process
is to find, on the one hand, all valid relationships between properties of lattices
(like being distributive), and, on the other hand, to find counterexamples to all
the relationships that do not hold. To answer a query whether a certain rela-
tionship holds, the lattice theory expert must either confirm the relationship (by
using results from the literature or carrying out a new proof for this fact), or give
a counterexample (again, by either finding one in the literature or constructing
a new one).
Although this sounds very similar to what is needed in our context, we can-
not directly use this approach. The main reason is the open-world semantics of
description logic knowledge bases. In DLs, if we cannot deduce from the knowl-
edge base that an individual i is an instance of the concept C, we do not assume
that i belongs to ¬C. In contrast classical FCA assumes complete knowledge in
the sense that, whenever it is not explicitly mentioned that an object o has a
property p, we assume that o does not have p. There has been some work on how
to extend FCA and attribute exploration from complete knowledge to the case
of partial knowledge [11, 4]. However, this work is based on assumptions that
are different from ours. In particular, it assumes that the expert cannot answer
all queries and, as a consequence, the knowledge obtained after the exploration
process may still be incomplete. In contrast, our intention is to complete the
KB, i.e., in the end we want to have complete knowledge about these relation-
ships. What may be incomplete is the description of individuals used during the
exploration process.
Due to space limitation, we introduce our variant of FCA that can deal
with the open-world semantics of DL knowledge bases only briefly. For a more
comprehensive description, we refer the reader to the long version of this paper
[3] or to the technical report [2], which also contains full proofs of our results.
2 Exploring Partial Contexts
In this section, we extend the classical approach to Formal Concept Analysis
(FCA), as described in detail in [5], to the case of individuals (called objects
in FCA) that have only a partial description in the sense that, for some prop-
erties (called attributes in FCA), it is not known whether they are satisfied by
the individual or not. The connection between this approach and the classical
approach is explained in [2]. In the following, we assume that we have a finite
set M of attributes and a (possibly infinite) set of objects.
Definition 1. A partial object description (pod) is a tuple (A, S) where A, S ⊆
M are such that A ∩ S = ∅. We call such a pod a full object description (fod)
if A ∪ S = M . A set of pods is called a partial context and a set of fods a full
context.
Intuitively, the pod (A, S) says that the object it describes satisfies all attributes
from A and does not satisfy any attribute from S. For the attributes not con-
tained in A ∪ S, nothing is known w.r.t. this object.
Definition 2. We say that the pod (A0 , S 0 ) extends the pod (A, S), and write
this as (A, S) ≤ (A0 , S 0 ), if A ⊆ A0 and S ⊆ S 0 . Similarly, we say that the partial
context K0 extends the partial context K, and write this as K ≤ K0 , if every pod
in K is extended by some pod in K0 . If K is a full context and K ≤ K, then K
is called a realizer of K. If (A, S) is a fod and (A, S) ≤ (A, S), then we also say
that (A, S) realizes (A, S).
The following notion of implication formalizes the informal notion “relation-
ship between properties” used in the introduction.
Definition 3. An implication is of the form L → R where L, R ⊆ M . This
implication is refuted by the pod (A, S) if L ⊆ A and R ∩ S 6= ∅. It is refuted
by the partial context K if it is refuted by at least one element of K. The set
of implications that are not refuted by a given partial context K is denoted by
Imp(K). The set of all fods that do not refute a given set of implications L is
denoted by Mod (L).
For a set of implications L and a set P ⊆ M , the implicational closure of P with
respect to L, denoted by L(P ), is the smallest subset Q of M such that
– P ⊆ Q, and
– L → R ∈ L and L ⊆ Q imply R ⊆ Q.
A set P ⊆ M is called L-closed if L(P ) = P .
Definition 4. The implication L → R is said to follow from a set J of impli-
cations if R ⊆ J (L). The set of implications J is called complete for a set of
implications L if every implication in L follows from J . It is called sound for L
if every implication that follows from J is contained in L. A set of implications
J is called a base for a set of implications L if it is both sound and complete
for L, and no strict subset of J satisfies this property.
The following is a trivial fact regarding the connection between partial con-
texts and the implications they do not refute.
Proposition
S 1. For a given set P ⊆ M and a partial context K, K(P ) :=
M \ {S | (A, S) ∈ K, P ⊆ A} is the largest subset of M such that P → K(P )
is not refuted by K.
Attribute exploration with partial contexts The classical attribute ex-
ploration algorithm of FCA assumes that there is a domain expert that can
answer questions regarding the validity of implications in the application do-
main. Accordingly, our approach requires an expert that can decide whether an
implication is refuted in the application domain or not. In contrast to existing
work on extending FCA to the case of partial knowledge [11, 4], we do not as-
sume that the expert has only partial knowledge and thus cannot answer all
implication questions.
To be more precise, we consider the following setting. We are given an initial
(possibly empty) partial context K, an initially empty set of implications L, and
a full context K that is a realizer of K. The expert answers implication questions
“L → R?” w.r.t. the full context K. More precisely, if the answer is “yes,” then K
does not refute L → R. The implication L → R is then added to L. Otherwise,
the expert extends the current context K such that the extended context refutes
L → R and still has K as a realizer. Consequently, the following invariant will be
satisfied by K, K, L: K ≤ K ⊆ Mod (L). Our aim is to enrich K and L such that
eventually L is not only sound, but also complete for Imp(K), and K refutes all
other implications (i.e., all the implications refuted by K). As in the classical
case, we want to do this by asking as few as possible questions to the expert.
Definition 5. Let L be a set of implications and K a partial context. An im-
plication is called undecided w.r.t. K and L if it neither follows from L nor is
refuted by K. It is decided w.r.t. K and L if it is not undecided w.r.t. K and L.
In principle, our attribute exploration algorithm tries to decide each undecided
implications by either adding it to L or extending K such that it refutes the
implication. If all implications are decided, then our goal is achieved.
Proposition 2. Assume that K ≤ K ⊆ Mod (L) and that all implications are
decided w.r.t. K and L. Then L is complete for Imp(K) and K refutes all impli-
cations not belonging to Imp(K).
How can we find—and let the expert decide—all undecided implications with-
out considering all implications? The following proposition motivates why it is
sufficient to consider implications whose left-hand sides are L-closed.
Proposition 3. Let L be a set of implications and L → R an implication. Then,
L → R follows from L iff L(L) → R follows from L.
Concerning right-hand sides, Proposition 1 says that the largest right-hand
side R such that L → R is not refuted by K is R = K(L). Putting these
two observations together, we only need to consider implications of the form
L → K(L) where L is L-closed. In order to enumerate all relevant left-hand sides,
we can thus use the well-known approach from FCA for enumerating closed sets
in the lectic order [5].
Definition 6. Assume that M = {m1 , . . . , mn } and fix some linear order m1 <
m2 < · · · mn on M . The lectic order < is defined as follows: for mi ∈ M
and A, B ⊆ M we define A