Bisimulation-Based Concept Learning
                   in Description Logics

         Thanh-Luong Tran1,3 , Quang-Thuy Ha2 , Thi-Lan-Giao Hoang1 ,
                Linh Anh Nguyen3,2 , and Hung Son Nguyen3,2
                       1
                        Department of Information Technology,
           Hue University of Sciences, 77 Nguyen Hue, Hue city, Vietnam
                          {ttluong,hlgiao}@hueuni.edu.vn
                        2
                           Faculty of Information Technology,
    VNU University of Engineering and Technology, 144 Xuan Thuy, Hanoi, Vietnam
                                 thuyhq@vnu.edu.vn
               3
                  Faculty of Mathematics, Informatics and Mechanics,
             University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
                             {nguyen,son}@mimuw.edu.pl


        Abstract. Concept learning in description logics (DLs) is similar to
        binary classification in traditional machine learning. The difference is
        that in DLs objects are described not only by attributes but also by bi-
        nary relationships between objects. In this paper, we develop the first
        bisimulation-based method of concept learning in DLs for the following
        setting: given a knowledge base KB in a DL, a set of objects standing
        for positive examples and a set of objects standing for negative exam-
        ples, learn a concept C in that DL such that the positive examples are
        instances of C w.r.t. KB, while the negative examples are not instances
        of C w.r.t. KB.


1     Introduction

In this paper we continue our study [12, 15, 7, 4] on concept learning in descrip-
tion logics (DLs). This problem is similar to binary classification in traditional
machine learning. The difference is that in DLs objects are described not only by
attributes but also by binary relationships between objects. The major settings
of concept learning in DLs are as follows:

1. Given a knowledge base KB in a DL L and sets E + , E − of individuals,
   learn a concept C in L such that: (a) KB |= C(a) for all a ∈ E + , and
   (b) KB |= ¬C(a) for all a ∈ E − . The set E + (resp. E − ) contains positive
   (resp. negative) examples of C.
2. The second setting differs from the previous one only in that the condition
   (b) is replaced by the weaker one: KB 6|= C(a) for all a ∈ E − .
3. Given an interpretation I and sets E + , E − of individuals, learn a concept
   C in L such that: (a) I |= C(a) for all a ∈ E + , and (b) I |= ¬C(a) for all
   a ∈ E − . Note that I 6|= C(a) is the same as I |= ¬C(a).
422     T.-L. Tran et al.

    As an early work on concept learning in DLs, Cohen and Hirsh [3] studied
PAC-learnability of the CLASSIC description logic (an early DL formalism) and
its sublogic C-CLASSIC. They proposed a concept learning algorithm based on
“least common subsumers”. In [9] Lambrix and Larocchia proposed a simple
concept learning algorithm based on concept normalization.

    Badea and Nienhuys-Cheng [1], Iannone et al. [8], Fanizzi et al. [6], Lehmann
and Hitzler [10] studied concept learning in DLs by using refinement operators as
in inductive logic programming. The works [1, 8] use the first mentioned setting,
while the works [6, 10] use the second mentioned setting. Apart from refinement
operators, scoring functions and search strategies also play important roles in
algorithms proposed in those works. The algorithm DL-Learner [10] exploits
genetic programming techniques, while DL-FOIL [6] considers also unlabeled
data as in semi-supervised learning.

    Nguyen and Szalas [12] applied bisimulation in DLs [5] to model indiscerni-
bility of objects. Their work is pioneering in using bisimulation for concept learn-
ing in DLs. It also concerns concept approximation by using bisimulation and
Pawlak’s rough set theory [13, 14]. In [15] Tran et al. generalized and extended
the concept learning method of [12] for DL-based information systems. They
took attributes as basic elements of the language. An information system in a
DL is a finite interpretation in that logic. Thus, both the works [12, 15] use the
third mentioned setting. In [7] Ha et al. developed the first bisimulation-based
method, called BBCL, for concept learning in DLs using the first mentioned
setting. Their method uses models of KB and bisimulation in those models to
guide the search for the concept to be learned. It is formulated for a large class
of useful DLs, with well-known DLs like ALC, SHIQ, SHOIQ, SROIQ. The
work [7] also introduced dual-BBCL, a variant of BBCL, for concept learning in
DLs using the first mentioned setting.

    In this paper, we develop the first bisimulation-based method, called BBCL2,
for concept learning in DLs using the second mentioned setting, i.e., for learning
a concept C such that: KB |= C(a) for all a ∈ E + , and KB 6|= C(a) for all
a ∈ E − , where KB is a given knowledge base in the considered DL, and E + ,
E − are given sets of examples of C. This method is based on the dual-BBCL
method (of concept learning in DLs using the first mentioned setting) from
our joint work [7]. We make appropriate changes for dealing with the condition
“KB 6|= C(a) for all a ∈ E − ” instead of “KB |= ¬C(a) for all a ∈ E − ”.

    The rest of this paper is structured as follows. In Section 2, we recall notation
and semantics of DLs. We present our BBCL2 method in Section 3 and illustrate
it by examples in Section 4. We conclude in Section 5. Due to the lack of space,
we will not recall the notion of bisimulation in DLs [5, 7], but just mention the
use of the largest auto-bisimulation relations and list the bisimulation-based
selectors [7].
                  Bisimulation-Based Concept Learning in Description Logics     423

2     Notation and Semantics of Description Logics
A DL-signature is a finite set Σ = ΣI ∪ ΣdA ∪ ΣnA ∪ ΣoR ∪ ΣdR , where ΣI is
a set of individuals, ΣdA is a set of discrete attributes, ΣnA is a set of numeric
attributes, ΣoR is a set of object role names, and ΣdR is a set of data roles.4 All
the sets ΣI , ΣdA , ΣnA , ΣoR , ΣdR are pairwise disjoint.
    Let ΣA = ΣdA ∪ ΣnA . Each attribute A ∈ ΣA has a domain dom(A), which
is a non-empty set that is countable if A is discrete, and partially ordered by
≤ otherwise.5 (For simplicity we do not subscript ≤ by A.) A discrete attribute
is a Boolean attribute if dom(A) = {true, false}. We refer to Boolean attributes
also as concept names. Let ΣC ⊆ ΣdA be the set of all concept names of Σ.
    An object role name stands for a binary predicate between individuals. A
data role σ stands for a binary predicate relating individuals to elements of a
set range(σ).
    We denote individuals by letters like a and b, attributes by letters like A and
B, object role names by letters like r and s, data roles by letters like σ and %,
and elements of sets of the form dom(A) or range(σ) by letters like c and d.
    We will consider some (additional) DL-features denoted by I (inverse), O
(nominal), F (functionality), N (unquantified number restriction), Q (quantified
number restriction), U (universal role), Self (local reflexivity of an object role).
A set of DL-features is a set consisting of some or zero of these names.
    Let Σ be a DL-signature and Φ be a set of DL-features. Let L stand for ALC,
which is the name of a basic DL. (We treat L as a language, but not a logic.)
The DL language LΣ,Φ allows object roles and concepts defined as follows:
 – if r ∈ ΣoR then r is an object role of LΣ,Φ
 – if A ∈ ΣC then A is concept of LΣ,Φ
 – if A ∈ ΣA \ ΣC and d ∈ dom(A) then A = d and A 6= d are concepts of LΣ,Φ
 – if A ∈ ΣnA and d ∈ dom(A) then A ≤ d, A < d, A ≥ d and A > d are
   concepts of LΣ,Φ
 – if C and D are concepts of LΣ,Φ , R is an object role of LΣ,Φ , r ∈ ΣoR ,
   σ ∈ ΣdR , a ∈ ΣI , and n is a natural number then
     • >, ⊥, ¬C, C u D, C t D, ∀R.C and ∃R.C are concepts of LΣ,Φ
     • if d ∈ range(σ) then ∃σ.{d} is a concept of LΣ,Φ
     • if I ∈ Φ then r− is an object role of LΣ,Φ
     • if O ∈ Φ then {a} is a concept of LΣ,Φ
     • if F ∈ Φ then ≤ 1 r is a concept of LΣ,Φ
     • if {F, I} ⊆ Φ then ≤ 1 r− is a concept of LΣ,Φ
     • if N ∈ Φ then ≥ n r and ≤ n r are concepts of LΣ,Φ
     • if {N, I} ⊆ Φ then ≥ n r− and ≤ n r− are concepts of LΣ,Φ
     • if Q ∈ Φ then ≥ n r.C and ≤ n r.C are concepts of LΣ,Φ
     • if {Q, I} ⊆ Φ then ≥ n r− .C and ≤ n r− .C are concepts of LΣ,Φ
     • if U ∈ Φ then U is an object role of LΣ,Φ
4
    Object role names are atomic object roles.
5
    One can assume that, if A is a numeric attribute, then dom(A) is the set of real
    numbers and ≤ is the usual linear order between real numbers.
424     T.-L. Tran et al.


            (r− )I = (rI )−1     U I = ∆I × ∆I          >I = ∆I       ⊥I = ∅
           (A = d)I = {x ∈ ∆I | AI (x) = d}           (A 6= d)I = (¬(A = d))I
                  (A ≤ d)I = {x ∈ ∆I | AI (x) is defined, AI (x) ≤ d}
                  (A ≥ d)I = {x ∈ ∆I | AI (x) is defined, d ≤ AI (x)}
       (A < d)I = ((A ≤ d) u (A 6= d))I       (A > d)I = ((A ≥ d) u (A 6= d))I
        (¬C)I = ∆I \ C I       (C u D)I = C I ∩ DI          (C t D)I = C I ∪ DI
  {a}I = {aI }    (∃r.Self)I = {x ∈ ∆I | rI (x, x)}    (∃σ.{d})I = {x ∈ ∆I | σ I (x, d)}
                     (∀R.C)I = {x ∈ ∆I | ∀y [RI (x, y) ⇒ C I (y)]}
                      (∃R.C)I = {x ∈ ∆I | ∃y [RI (x, y) ∧ C I (y)]
                 (≥ n R.C)I = {x ∈ ∆I | #{y | RI (x, y) ∧ C I (y)} ≥ n}
                 (≤ n R.C)I = {x ∈ ∆I | #{y | RI (x, y) ∧ C I (y)} ≤ n}
                  (≥ n R)I = (≥ n R.>)I       (≤ n R)I = (≤ n R.>)I


        Fig. 1. Interpretation of complex object roles and complex concepts.


      • if Self ∈ Φ then ∃r.Self is a concept of LΣ,Φ .
    An interpretation in LΣ,Φ is a pair I = ∆I , ·I , where ∆I is a non-empty set
called the domain of I and ·I is a mapping called the interpretation function of
I that associates each individual a ∈ ΣI with an element aI ∈ ∆I , each concept
name A ∈ ΣC with a set AI ⊆ ∆I , each attribute A ∈ ΣA \ ΣC with a partial
function AI : ∆I → dom(A), each object role name r ∈ ΣoR with a binary
relation rI ⊆ ∆I × ∆I , and each data role σ ∈ ΣdR with a binary relation
σ I ⊆ ∆I × range(σ). The interpretation function ·I is extended to complex
object roles and complex concepts as shown in Figure 1, where #Γ stands for
the cardinality of the set Γ .
    Given an interpretation I = ∆I , ·I in LΣ,Φ , we say that an object x ∈ ∆I
has depth k if k is the maximal natural number such that there are pairwise
different objects x0 , . . . , xk of ∆I with the properties that:
 – xk = x and x0 = aI for some a ∈ ΣI ,
 – xi 6= bI for all 1 ≤ i ≤ k and all b ∈ ΣI ,
 – for each 1 ≤ i ≤ k, there exists an object role Ri of LΣ,Φ such that
   hxi−1 , xi i ∈ RiI .
    By I|k we denote the interpretation obtained from I by restricting the do-
main to the set of objects with depth not greater than k and restricting the
interpretation function accordingly.
    A role inclusion axiom in LΣ,Φ is an expression of the form R1 ◦ . . . ◦ Rk v r,
where k ≥ 1, r ∈ ΣoR and R1 , . . . , Rk are object roles of LΣ,Φ different from
U . A role assertion in LΣ,Φ is an expression of the form Ref(r), Irr(r), Sym(r),
                 Bisimulation-Based Concept Learning in Description Logics        425

Tra(r), or Dis(R, S), where r ∈ ΣoR and R, S are object roles of LΣ,Φ different
from U . Given an interpretation I, define that:


               I |= R1 ◦ . . . ◦ Rk v r   if R1I ◦ . . . ◦ RkI ⊆ rI
               I |= Ref(r)                if rI is reflexive
               I |= Irr(r)                if rI is irreflexive
               I |= Sym(r)                if rI is symmetric
               I |= Tra(r)                if rI is transitive
               I |= Dis(R, S)             if RI and S I are disjoint,


where the operator ◦ stands for the composition of binary relations. By a role
axiom in LΣ,Φ we mean either a role inclusion axiom or a role assertion in LΣ,Φ .
We say that a role axiom ϕ is valid in I (or I validates ϕ) if I |= ϕ.
    A terminological axiom in LΣ,Φ , also called a general concept inclusion (GCI)
in LΣ,Φ , is an expression of the form C v D, where C and D are concepts in
LΣ,Φ . An interpretation I validates an axiom C v D, denoted by I |= C v D,
if C I ⊆ DI .
    An individual assertion in LΣ,Φ is an expression of one of the forms C(a)
(concept assertion), r(a, b) (positive role assertion), ¬r(a, b) (negative role asser-
tion), a = b, and a 6= b, where r ∈ ΣoR and C is a concept of LΣ,Φ . We will
write, for example, A(a) = d instead (A = d)(a). Given an interpretation I,
define that:


                          I |= a = b      if aI = bI
                          I |= a 6= b     if aI 6= bI
                          I |= C(a)       if aI ∈ C I
                          I |= r(a, b)    if aI , bI ∈ rI
                          I |= ¬r(a, b)   if aI , bI ∈/ rI .


   We say that I validates an individual assertion ϕ if I |= ϕ.
    An RBox (resp. TBox, ABox) in LΣ,Φ is a finite set of role axioms (resp.
terminological axioms, individual assertions) in LΣ,Φ . A knowledge base in LΣ,Φ
is a triple hR, T , Ai, where R (resp. T , A) is an RBox (resp. a TBox, an ABox)
in LΣ,Φ . An interpretation I is a model of a “box” if it validates all the ax-
ioms/assertions of that “box”. It is a model of a knowledge base hR, T , Ai if it
is a model of R, T and A. A knowledge base is satisfiable if it has a model. An
individual a is said to be an instance of a concept C w.r.t. a knowledge base
KB , denoted by KB |= C(a), if, for every model I of KB , aI ∈ C I .
426      T.-L. Tran et al.

                      P1 : 2010
                                        / P2 : 2009             / P5 : 2006
                      Awarded             ¬Awarded               6 ¬Awarded
                                                                    A


                                                            &
                      P3 : 2008
                                         / P4 : 2007            / P6 : 2006
                     ¬Awarded              Awarded                 Awarded
                                                                 7


          Fig. 2. An illustration for the knowledge base given in Example 2.1


Example 2.1. This example is based on an example of [15, 7]. Let

 Φ = {I, O, N, Q}, ΣI = {P1 , P2 , P3 , P4 , P5 , P6 }, ΣC = {Pub, Awarded , Ad },
       ΣdA = ΣC ,      ΣnA = {Year },           ΣoR = {cites, cited by},      ΣdR = ∅,
                      −                              −
         R = {cites       v cited by, cited by           v cites},   T = {> v Pub},
        A0 = {Awarded (P1 ), ¬Awarded (P2 ), ¬Awarded (P3 ), Awarded (P4 ),
         ¬Awarded (P5 ), Awarded (P6 ), Year (P1 ) = 2010, Year (P2 ) = 2009,
      Year (P3 ) = 2008, Year (P4 ) = 2007, Year (P5 ) = 2006, Year (P6 ) = 2006,
              cites(P1 , P2 ), cites(P1 , P3 ), cites(P1 , P4 ), cites(P1 , P6 ),
              cites(P2 , P3 ), cites(P2 , P4 ), cites(P2 , P5 ), cites(P3 , P4 ),
              cites(P3 , P5 ), cites(P3 , P6 ), cites(P4 , P5 ), cites(P4 , P6 )},

where the concept Pub stands for “publication”. Then KB 0 = hR, T , A0 i is
a knowledge base in LΣ,Φ . The axiom > v Pub states that the domain of
any model of KB 0 consists of only publications. The knowledge base KB 0 is
illustrated in Figure 2 (on page 426). In this figure, nodes denote publications
and edges denote citations (i.e., assertions of the role cites), and we display only
information concerning assertions about Year , Awarded and cites.                  C

    An LΣ,Φ logic is specified by a number of restrictions adopted for the language
LΣ,Φ . We say that a logic L is decidable if the problem of checking satisfiability of
a given knowledge base in L is decidable. A logic L has the finite model property
if every satisfiable knowledge base in L has a finite model. We say that a logic
L has the semi-finite model property if every satisfiable knowledge base in L has
a model I such that, for any natural number k, I|k is finite and constructable.
    As the general satisfiability problem of context-free grammar logics is unde-
cidable [2], the most general LΣ,Φ logics (without restrictions) are also undecid-
able. The considered class of DLs contains, however, many decidable and useful
logics. One of them is SROIQ - the logical base of the Web Ontology Language
OWL 2. This logic has the semi-finite model property.
                  Bisimulation-Based Concept Learning in Description Logics         427

3    Concept Learning for Knowledge Bases in DLs

Let L be a decidable LΣ,Φ logic with the semi-finite model property, Ad ∈ ΣC
be a special concept name standing for the “decision attribute”, and KB 0 =
hR, T , A0 i be a knowledge base in L without using Ad . Let E + and E − be
disjoint subsets of ΣI such that the knowledge base KB = hR, T , Ai with A =
A0 ∪ {Ad (a) | a ∈ E + } ∪ {¬Ad (a) | a ∈ E − } is satisfiable. The set E + (resp. E − )
is called the set of positive (resp. negative) examples of Ad . Let E = hE + , E − i.
    The problem is to learn a concept C as a definition of Ad in the logic L
restricted to a given sublanguage LΣ † ,Φ† with Σ † ⊆ Σ \ {Ad } and Φ† ⊆ Φ such
that: KB |= C(a) for all a ∈ E + , and KB 6|= C(a) for all a ∈ E − .
    Given an interpretation I in LΣ,Φ , by ≡Σ † ,Φ† ,I we denote the equivalence re-
lation on ∆I with the property that x ≡Σ † ,Φ† ,I x0 iff x is LΣ † ,Φ† -equivalent to x0
(i.e., for every concept D of LΣ † ,Φ† , x ∈ DI iff x0 ∈ DI ). By [7, Theorem 3], this
equivalence relation coincides with the largest LΣ † ,Φ† -auto-bisimulation ∼Σ † ,Φ† ,I
of I (see [7] for the definition of this notion).
    Let I be an interpretation. We say that a set Y ⊆ ∆I is divided by E if there
exist a ∈ E + and b ∈ E − such that {aI , bI } ⊆ Y . A partition P = {Y1 , . . . , Yk }
of ∆I is said to be consistent with E if, for every 1 ≤ i ≤ n, Yi is not divided
by E. Observe that if I is a model of KB then:

 – since C is a concept of LΣ † ,Φ† , by [7, Theorems 2 and 3], C I should be the
   union of a number of equivalence classes of ∆I w.r.t. ≡Σ † ,Φ† ,I
 – we should have that aI ∈ C I for all a ∈ E + , and aI ∈/ C I for all a ∈ E − .

    The idea is to use models of KB and bisimulation in those models to guide the
search for C. We now describe our method BBCL2 (Bisimulation-Based Concept
Learning for knowledge bases in DLs using the second setting). It constructs a
set of E0− of individuals and sets of concepts C, C0 . E0− will cover more and
more individuals from E − . The meaning of C is to collect concepts D such
that KB |= D(a) for all a ∈ E + . The set C0 is auxiliary for the construction
of C. When a concept D does not satisfy the mentioned condition but is a
“good” candidate for that, we put it into C0 . Later, when necessary, we take
disjunctions of some concepts from C0 and check whether they are good for
adding to C. During the learning process, we will always have that:
             d
  – KB |= (d C)(a) for all a ∈ E + ,
  – KB 6|= ( C)(a) for all a ∈ E0− ,
       d                                      d
where d  {D1 , . . . , Dn } = D1 u. . .uDn and ∅ = >. We try to extend C to satisfy
KB 6|= ( C)(a) for more and more a ∈ E − . Extending       C enables extension of
E0− . When E0− reaches E − , we return the concept C after normalization and
                                                      d
simplification. Our method is not a detailed algorithm, as we leave some steps
at an abstract level, open to implementation heuristics. In particular, we assume
that it is known whether L has the finite model property, how to construct
models of KB , and how to do instance checking KB |= D(a) for arbitrary D
and a. The steps of our method are as follows.
428      T.-L. Tran et al.

1. Initialize E0− := ∅, C := ∅, C0 := ∅.
2. (This is the beginning of a loop controlled by “go to” at Step 6.) If L has
   the finite model property then construct a (next) finite model I of KB .
   Otherwise, construct a (next) interpretation I such that either I is a finite
                                0
   model of KB or I = I|K         , where I 0 is an infinite model of KB and K is a
   parameter of the learning method (e.g., with value 5).
3. Starting from the partition {∆I }, make subsequent granulations to reach
   the partition {Yi1 , . . . , Yik } corresponding to ≡Σ † ,Φ† ,I , where each Yij is
   characterized by an appropriate concept Cij (such that Yij = CiIj ).
4. For each 1 ≤ j ≤ k, if Yij contains some aI with a ∈ E − and no aI with
   a ∈ E + then:
     – if KB d  |= ¬Cij (a) for all a ∈ E + then                         d
             if C is not subsumed by ¬Cij w.r.t. KB (i.e. KB 6|= ( C v ¬Cij ))
             then add ¬Cij to C and add to E0− all a ∈ E − such that aI ∈ Yij
     – else add ¬Cij to C0 .
5. If E0− = E − then go to Step 8.
6. If it was hard to extend C during a considerable number of iterations of the
   loop (with different interpretations I) then go to Step 7, else go to Step 2.
7. Repeat the following:
   (a) Randomly select some concepts D1 , . . . , Dl from C0 and
        let D = (D1 t . . . t Dl ).           d
                                           +
   (b) If KB |= D(a)  d for all a ∈ E ,− C−is not subsumed by D w.r.t. KB
         d KB 6|= ( C) v D), and E \ E0 contains some a such that KB 6|=
        (i.e.,
        ( C)(a), then:
          i. add D to C,
         ii. add to E0− all a ∈ E − \ E0− such that KB 6|= ( C)(a),
                                                                d
        iii. if E0− = E − then go to Step 8.
   (c) If it was still too hard to extend C during a considerable number of
        iterations of the current loop, or C is already too big, then stop the
        process with failure. d
                                                              −
8. For each D ∈ C, if KB 6|= (C\{D})(a)    d 6 for all a ∈ E then delete D from C.
9. Let C be a normalized form of C. Observe that KB |= C(a) for all
   a ∈ E + , and KB 6|= C(a) for all a ∈ E − . Try to simplify C while preserving
   this property, and then return it.

    For Step 2, if L is one of the well known DLs, then I can be constructed
by using a tableau algorithm (see [7] for references). During the construction,
randomization is used to a certain extent to make I different from the interpre-
tations generated in previous iterations of the loop.
    We describe Step 3 in more details:

 – In the granulation process, we denote the blocks created so far in all steps by
   Y1 , . . . , Yn , where the current partition {Yi1 , . . . , Yik } consists of only some
   of them. We do not use the same subscript to denote blocks of different
6
    Normalizing concepts can be done, e.g., as in [11].
                   Bisimulation-Based Concept Learning in Description Logics              429


                      †
  – A, where A ∈ ΣC
                          †       †
  – A = d, where A ∈ ΣA      \ ΣC   and d ∈ dom(A)
                                           †
  – A ≤ d and A < d, where A ∈ ΣnA             , d ∈ dom(A) and d is not a minimal
    element of dom(A)
                                           †
  – A ≥ d and A > d, where A ∈ ΣnA             , d ∈ dom(A) and d is not a maximal
    element of dom(A)
                           †
  – ∃σ.{d}, where σ ∈ ΣdR      and d ∈ range(σ)
                                             †
  – ∃r.Ci , ∃r.> and ∀r.Ci , where r ∈ ΣoR       and 1 ≤ i ≤ n
                                                         †
  – ∃r .Ci , ∃r .> and ∀r .Ci , if I ∈ Φ† , r ∈ ΣoR
       −         −           −
                                                            and 1 ≤ i ≤ n
                   †            †
  – {a}, if O ∈ Φ and a ∈ ΣI
                                  †
  – ≤ 1 r, if F ∈ Φ† and r ∈ ΣoR
          −               †             †
  – ≤ 1 r , if {F, I} ⊆ Φ and r ∈ ΣoR
                                †        †
  – ≥ l r and ≤ m r, if N ∈ Φ , r ∈ ΣoR     , 0 < l ≤ #∆I and 0 ≤ m < #∆I
         −             −                †          †
  – ≥ l r and ≤ m r , if {N, I} ⊆ Φ , r ∈ ΣoR          , 0 < l ≤ #∆I and 0 ≤ m < #∆I
                                         †           †
  – ≥ l r.Ci and ≤ m r.Ci , if Q ∈ Φ , r ∈ ΣoR , 1 ≤ i ≤ n, 0 < l ≤ #Ci and
    0 ≤ m < #Ci
                                                             †
  – ≥ l r− .Ci and ≤ m r− .Ci , if {Q, I} ⊆ Φ† , r ∈ ΣoR       , 1 ≤ i ≤ n, 0 < l ≤ #Ci
    and 0 ≤ m < #Ci
                                      †
  – ∃r.Self, if Self ∈ Φ† and r ∈ ΣoR


Fig. 3. Selectors. Here, n is the number of blocks created so far when granulating ∆I ,
and Ci is the concept characterizing the block Yi . It was proved in [15] that using these
selectors is sufficient to granulate ∆I to obtain the partition corresponding to ≡Σ † ,Φ† ,I .


   contents (i.e. we always use new subscripts obtained by increasing n for new
   blocks). We take care that, for each 1 ≤ i ≤ n, Yi is characterized by an
   appropriate concept Ci (such that Yi = CiI ).
 – Following [12, 15] we use the concepts listed in Figure 3 (on page 429) as
   selectors for the granulation process. If a block Yij (1 ≤ j ≤ k) is divided by
   DI , where D is a selector, then partitioning Yij by D is done as follows:
     • s := n + 1, t := n + 2, n := n + 2,
     • Ys := Yij ∩ DI , Cs := Cij u D,
     • Yt := Yij ∩ (¬D)I , Ct := Cij u ¬D,
     • the new partition of ∆I becomes {Yi1 , . . . , Yik } \ {Yij } ∪ {Ys , Yt }.
 – Which block from the current partition should be partitioned first and which
   selector should be used to partition it are left open for heuristics. For exam-
   ple, one can apply some gain function like the entropy gain measure, while
   taking into account also simplicity of selectors and the concepts character-
   izing the blocks. Once again, randomization is used to a certain extent. For
   example, if some selectors give the same gain and are the best then randomly
   choose any one of them.

    As a modification for Step 3, the granulation process can be stopped as soon
as the current partition is consistent with E (or when some criteria are met).
430     T.-L. Tran et al.

But, if it is hard to extend C during a considerable number of iterations of the
loop (with different interpretations I), then this loosening should be discarded.
   Observe that, when ¬Cij is added to C, we have that aI ∈ (¬Cij )I for all
a ∈ E + . This is a good point for hoping that KB |= ¬Cij (a) for all a ∈ E + .
We check it, for example, by using some appropriate tableau decision procedure,
and if it holds then we add ¬Cij to the set C. Otherwise, we add ¬Cij to C0 .
To increase the chance to have Cij satisfying the mentioned condition and being
added to C, we tend to make Cij strong enough. For this reason, we do not use
the technique with LargestContainer introduced in [12], and when necessary, we
do not apply the above mentioned loosening for Step 3.
   Note that any single concept D from C0 does not satisfy the condition KB |=
D(a) for all a ∈ E + , but when we take a number of concepts D1 , . . . , Dl from C0
we may have that KB |= (D1 t . . . t Dl )(a) for all a ∈ E + . So, when it is really
hard to extend C by directly using concepts ¬Cij (where Cij are the concepts
used for characterizing blocks of partitions of the domains of models of KB ), we
change to using disjunctions D1 t . . . t Dl of concepts from C0 as candidates for
adding to C.


4     Illustrative Examples
Example 4.1. Let KB 0 = hR, T , A0 i be the knowledge base given in Exam-
ple 2.1. Let E + = {P4 , P6 }, E − = {P1 , P2 , P3 , P5 }, Σ † = {Awarded , cited by}
and Φ† = ∅. As usual, let KB = hR, T , Ai, where A = A0 ∪ {Ad (a) | a ∈
E + } ∪ {¬Ad (a) | a ∈ E − }. Execution of our BBCL2 method on this example is
as follows.
 1. E0− := ∅, C := ∅, C0 := ∅.
 2. KB has infinitely many models, but the most natural one is I specified
    below, which will be used first:

       ∆I = {P1 , P2 , P3 , P4 , P5 , P6 }, xI = x for x ∈ {P1 , P2 , P3 , P4 , P5 , P6 },
                          Pub I = ∆I , Awarded I = {P1 , P4 , P6 },
         cites I = {hP1 , P2 i , hP1 , P3 i , hP1 , P4 i , hP1 , P6 i , hP2 , P3 i , hP2 , P4 i ,
               hP2 , P5 i , hP3 , P4 i , hP3 , P5 i , hP3 , P6 i , hP4 , P5 i , hP4 , P6 i},
          cited by I = (cites I )−1 , the function Year I is specified as usual.

 3. Y1 := ∆I , partition := {Y1 }
 4. Partitioning Y1 by Awarded :
     – Y2 := {P1 , P4 , P6 }, C2 := Awarded ,
     – Y3 := {P2 , P3 , P5 }, C3 := ¬Awarded ,
     – partition := {Y2 , Y3 }.
 5. Partitioning Y2 :
     – All the selectors ∃cited by.>, ∃cited by.C2 and ∃cited by.C3 partition
        Y2 in the same way. We choose ∃cited by.>, as it is the simplest one.
     – Y4 := {P4 , P6 }, C4 := C2 u ∃cited by.>,
                Bisimulation-Based Concept Learning in Description Logics      431

     – Y5 := {P1 }, C5 := C2 u ¬∃cited by.>,
     – partition := {Y3 , Y4 , Y5 }.
6. The obtained partition is consistent with E, having Y3 = {P2 , P3 , P5 } ⊂ E − ,
   Y4 = {P4 , P6 } = E + and Y5 = {P1 } ⊂ E − . (It is not yet the partition
   corresponding to ≡Σ † ,Φ† ,I .)
7. Since Y3 ⊂ E − and KB |= ¬C3 (a) for all a ∈ E + , we add ¬C3 to C and add
   the elements of Y3 to E0− . Thus, C = {¬C3 } and E0− =d{P2 , P3 , P5 }.
8. Since Y5 ⊂ E − and KB |= ¬C5 (a) for all a ∈ E + and C is not subsumed
                                                                                −
   by ¬C5 w.r.t. KB , we add   d ¬C5 to C and add the elements of Y5 to E0 .
   Thus, C = {¬C3 , ¬C5 }, C = ¬¬Awarded u ¬(Awarded u ¬∃cited by.>)
   and E0− = {P1 , P2 , P3 , P5 }.
9. Since E0− = E − , we normalize C to Awarded u ∃cited by.> and return
                                     d
   it as the result. (This concept denotes the set of publications which were
   awarded and cited.)                                                           C

Example 4.2. Let KB 0 , E + , E − , KB and Φ† be as in Example 4.1, but let
Σ † = {cited by, Year }. Execution of the BBCL2 method on this new example
has the same first two steps as in Example 4.1, and then continues as follows.
1. Granulating {∆I } as in [15, Example 11] we reach the partition
   {Y4 , Y6 , Y7 , Y8 , Y9 } consistent with E and have that:
     – Y4 = {P4 }, Y6 = {P1 }, Y7 = {P2 , P3 }, Y8 = {P6 }, Y9 = {P5 },
     – C2 = (Year ≥ 2008), C3 = (Year < 2008),
        C5 = C3 u (Year < 2007), C6 = C2 u (Year ≥ 2010),
        C7 = C2 u (Year < 2010), C9 = C5 u ¬∃cited by.C6 .
2. We have C6 = (Year ≥ 2008) u (Year ≥ 2010). Since Y6 ⊂ E − and KB |=
   ¬C6 (a) for all a ∈ E + , we add ¬C6 to C and add the elements of Y6 to E0− .
   Thus, C = {¬C6 } and E0− = {P1 }.
                                                                     −
3. We have C7 := (Year ≥ 2008)        d u (Year < 2010). Since Y7 ⊂ E and KB |=
                               +
   ¬C7 (a) for all a ∈ E and C is not subsumed by ¬C7 w.r.t. KB , we add
   ¬C7 to C and add the elements of Y7 to E0− . Thus, C = {¬C6 , ¬C7 } and
   E0− = {P1 , P2 , P3 }.
4. We have C9 := (Year < 2008)u(Year < 2007)u¬∃cited by.((Year ≥ 2008)u
   (Year ≥ 2010)). Since Y9 ⊂ E − and KB |= ¬C9 (a) for all a ∈ E + and C
                                                                               d
   is not subsumed by ¬C9 w.r.t. KB , we add ¬C9 to C and add the elements
   of Y9 to E0− . Thus, C = {¬C6 , ¬C7 , ¬C9 } anddE0− = {P1 , P2 , P3 , P5 }.
5. Since E0− = E − , we normalize and simplify C before returning it as dthe
   result. Without exploiting the fact that publication years are integers, C
   can be normalized to

           (Year < 2008) u [(Year ≥ 2007) t ∃cited by.(Year ≥ 2010)].

   C = (Year < 2008) u ∃cited by.(Year ≥ 2010) is a simplified form of the
   above concept, which still satisfies that KB |= C(a) for all a ∈ E + and
   KB 6|= C(a) for all a ∈ E − . Thus, we return it as the result. (The returned
   concept denotes the set of publications released before 2008 that are cited
   by some publications released since 2010.)                                  C
432       T.-L. Tran et al.

5     Discussion and Conclusion
We first compare the BBCL2 method with the BBCL and dual-BBCL methods
from our joint work [7]. First of all, BBCL2 is used for the second setting of con-
cept learning in DLs, while BBCL and dual-BBCL are used for the first setting.
BBCL2 is derived from dual-BBCL, but it contains substantial modifications
needed for the change of setting. BBCL2 differs from BBCL at Steps 1, 4, 5, 7,
8, 9, and differs from dual-BBCL by the use of E0− at Steps 1, 4, 5 and 7.
    Comparing the examples given in this paper and in [7], apart from detailed
technical differences in concept learning, it can be seen that the first setting re-
quires more knowledge7 in order to obtain similar effects as the second setting.
In other words, the second setting has effects of a kind of closed world assump-
tion, while the first setting does not. The overall impression is that the second
setting is more convenient than the first one.
    Recall that our BBCL2 method is the first bisimulation-based method for
concept learning in DLs using the second setting. As for the case of BBCL and
dual-BBCL, it is formulated for the class of decidable ALC Σ,Φ DLs that have the
finite or semi-finite model property, where Φ ⊆ {I, O, F, N, Q, U, Self}. This class
contains many useful DLs. For example, SROIQ (the logical base of OWL 2)
belongs to this class. Our method is applicable also to other decidable DLs with
the finite or semi-finite model property. The only additional requirement is that
those DLs have a good set of selectors (in the sense of [15, Theorem 10]).
    Like BBCL and dual-BBCL, the idea of BBCL2 is to use models of the
considered knowledge base and bisimulation in those models to guide the search
for the concept. Thus, it is completely different from the previous works [6,
10] on concept learning in DLs using the second setting. As bisimulation is the
notion for characterizing indiscernibility of objects in DLs, our BBCL2 method is
natural and very promising. We intend to implement BBCL2 in the near future.

Acknowledgments
This paper was written during the first author’s visit at Warsaw Center
of Mathematics and Computer Science (WCMCS). He would like to thank
WCMCS for the support. This work was also supported by Polish National
Science Centre (NCN) under Grant No. 2011/01/B/ST6/02759 as well as by
Polish National Center for Research and Development (NCBiR) under Grant
No. SP/I/1/77065/10 by the strategic scientific research and experimental de-
velopment program: “Interdisciplinary System for Interactive Scientific and
Scientific-Technical Information”.

References
 1. L. Badea and S.-H. Nienhuys-Cheng. A refinement operator for description logics.
    In Proceedings of ILP’2000, volume 1866 of LNCS, pages 40–59. Springer, 2000.
7
    like the assertions (¬∃cited by.>)(P1 ) and (∀cited by.{P2 , P3 , P4 })(P5 ), which state
    that P1 is not cited by any publication and P5 is only cited by P2 , P3 and P4
                 Bisimulation-Based Concept Learning in Description Logics        433

 2. M. Baldoni, L. Giordano, and A. Martelli. A tableau for multimodal logics and
    some (un)decidability results. In Proceedings of TABLEAUX’1998, volume 1397
    of LNCS, pages 44–59. Springer, 1998.
 3. W.W. Cohen and H. Hirsh. Learning the Classic description logic: Theoretical and
    experimental results. In Proceedings of KR’1994, pages 121–133.
 4. A.R. Divroodi, Q.-T. Ha, L.A. Nguyen, and H.S. Nguyen. On C-learnability in
    description logics. In Proceedings of ICCCI’2012 (1), volume 7653 of LNCS, pages
    230–238. Springer, 2012.
 5. A.R. Divroodi and L.A. Nguyen. On bisimulations for description logics. In Pro-
    ceedings of CS&P’2011, pages 99–110 (see also arXiv:1104.1964).
 6. N. Fanizzi, C. d’Amato, and F. Esposito. DL-FOIL concept learning in description
    logics. In Proc. of ILP’2008, volume 5194 of LNCS, pages 107–121. Springer, 2008.
 7. Q.-T. Ha, T.-L.-G. Hoang, L.A. Nguyen, H.S. Nguyen, A. Szalas, and T.-L. Tran. A
    bisimulation-based method of concept learning for knowledge bases in description
    logics. In Proceedings of SoICT’2012, pages 241–249. ACM, 2012.
 8. L. Iannone, I. Palmisano, and N. Fanizzi. An algorithm based on counterfactuals
    for concept learning in the Semantic Web. Appl. Intell., 26(2):139–159, 2007.
 9. P. Lambrix and P. Larocchia. Learning composite concepts. In Proc. of DL’1998.
10. J. Lehmann and P. Hitzler. Concept learning in description logics using refinement
    operators. Machine Learning, 78(1-2):203–250, 2010.
11. L.A. Nguyen. An efficient tableau prover using global caching for the description
    logic ALC. Fundamenta Informaticae, 93(1-3):273–288, 2009.
12. L.A. Nguyen and A. Szalas. Logic-based roughification. In A. Skowron and
    Z. Suraj, editors, Rough Sets and Intelligent Systems (To the Memory of Professor
    Zdzislaw Pawlak), Vol. 1, pages 529–556. Springer, 2012.
13. Z. Pawlak. Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer
    Academic Publishers, Dordrecht, 1991.
14. Z. Pawlak and A. Skowron. Rudiments of rough sets. Inf. Sci., 177(1):3–27, 2007.
15. T.-L. Tran, Q.-T. Ha, T.-L.-G. Hoang, L.A. Nguyen, H.S. Nguyen, and A. Szalas.
    Concept learning for description logic-based information systems. In Proceedings
    of KSE’2012, pages 65–73. IEEE Computer Society, 2012.