=Paper= {{Paper |id=None |storemode=property |title=Optimizations for the Role-Depth Bounded Least Common Subsumer in EL+ |pdfUrl=https://ceur-ws.org/Vol-849/paper_30.pdf |volume=Vol-849 |dblpUrl=https://dblp.org/rec/conf/owled/EckeT12 }} ==Optimizations for the Role-Depth Bounded Least Common Subsumer in EL+== https://ceur-ws.org/Vol-849/paper_30.pdf
 Optimizations for the role-depth bounded least
           common subsumer in EL+

                      Andreas Ecke and Anni-Yasmin Turhan?

               TU Dresden, Institute for Theoretical Computer Science

        Abstract. Computing the least common subsumer (lcs) yields a gen-
        eralization of a collection of concepts, computing such generalizations is
        a useful reasoning task for many ontology-based applications. Since the
        lcs need not exist, if computed w.r.t. general TBoxes, an approximative
        approach, the role-depth bounded lcs, has been proposed. Recently, this
        approach has been extended to the Description logic EL+ , which covers
        most of the OWL 2 EL profile.
        In this paper we present two kinds of optimizations for the computation
        of such approximative lcs: one to obtain succinct rewritings of EL+ -
        concepts and the other to speed-up the k-lcs computation. The eval-
        uation of these optimizations give evidence that they can improve the
        computation of the role-depth bounded lcs by orders of magnitude.


1     Introduction
Intuitively, the lcs is a new (complex) concept description that captures the
commonalities of all of its input concept descriptions. This reasoning service has
turned out to be useful for the augmentation of TBoxes and as a subtask when
computing the (dis)similarity of concept descriptions [8] or other non-standard
inferences.
     In particular, bio-medical TBoxes are often written in extensions of EL that
allow to model roles in a more detailed way, such as SNOMED [13] which allows
to use role inclusions and is written in ELH or the Gene Ontology [6] and the
FMA ontology [12] which are both written in (sub-languages of) EL+ . For these
extensions of EL standard DL reasoning can still be done in polynomial time
[3]. These TBoxes are known to be very large and are mostly built by hand.
     If computed w.r.t. general or just cyclic EL-TBoxes, the lcs need not exist [1],
since resulting cyclic concept descriptions cannot be expressed in EL. We follow
the approach from [10] and compute as an approximative solution the role-depth
bounded lcs: k-lcs which has a maximal nesting of quantifiers limited to k. The
method to compute the k-lcs is to employ the completion method that is used to
classify the TBox. This method builds a graph structure, which is saturated by
completion rules [2, 3]. In case of EL the k-lcs is more or less the cross-product of
the tree-unravelings of different nodes of the saturated completion graph [10]. It
turns out that for EL+ the computation algorithm is the same as for EL [7]. Un-
fortunately, the concept descriptions obtained in this way turn out to be highly
?
    Partially supported by the German Research Foundation (DFG) in the Collaborative
    Research Center 912 “Highly Adaptive Energy-Efficient Computing”.
Name                       Syntax          Semantics
top                        >               ∆I
conjunction                C uD            C I ∩ DI
existential restriction    ∃r.C            {x ∈ ∆I | ∃y ∈ ∆I : (x, y) ∈ rI ∧ y ∈ C I }
general concept inclusion C v D               C I ⊆ DI
role inclusion axiom      r1 ◦ . . . ◦ rk v r r1I ◦ . . . ◦ rkI ⊆ rI
                     Table 1. Constructors and axioms for EL+ .


redundant and thus take a long time to compute. In order to obtain concise
and readable concept descriptions for the k-lcs, we devise a method to obtain
smaller, but equivalent rewritings for EL+ -concept descriptions. Furthermore,
we devise two methods that avoid the cross-product computation for those parts
of the unraveling that would yield a redundant part in the resulting concept de-
scription. These optimization methods are highly effective and expedient, when
dealing with ontologies with large role hierarchies such as Galen.
    This paper is organized as follows: After preliminaries on DLs, we briefly
sketch how completion is employed to compute the k-lcs in EL+ in Section 3. In
Section 4 we introduce the optimization methods and in Section 5 we report on
an evaluation of the optimization methods on (variants of) medical ontologies.


2    Preliminaries

We assume that the reader is familiar with the basic notions of DLs, for an
introduction see [4]. We introduce the DLs used in this paper formally. Concept
descriptions are inductively defined from a set of concepts names NC and a set
of role names NR by applying the constructors from the upper half of Table 1.
In particular, EL-concept descriptions only allow for conjunctions, existential
restrictions, and the top concept >. The concept constructors are interpreted in
the standard way. Their semantics is displayed in the upper half of Table 1.
    In EL there are general concept inclusions (GCIs), which are displayed in
the lower half of Table 1. EL+ additionally allows for complex role inclusion
axioms (RIAs). These role inclusions can express, in particular, inclusion of
roles (s v r) and transitive roles (r ◦ r v r). Now, an EL-TBox T is a set of
GCIs and an EL+ -TBox is a set of GCIs and RIAs. We denote by NC,T and
NR,T the set of concept names and the set of role names that occur in a TBox
T . The fundamental reasoning service for TBoxes is subsumption. We say that a
concept description C subsumes another concept description D w.r.t. the TBox
T (written vT ) iff C I ⊆ DI for all models of T .
    For a concept description C we denote by rd(C) its role-depth, i.e., its maxi-
mal nesting of quantifiers. We define the central reasoning services of this paper
for the Description Logic EL+ .
Definition 1 ((Role-depth bounded) least common subsumer). Let T
be an EL+ -TBox and C1 , . . . , Cn be EL+ -concept descriptions. Then the EL+ -
concept description D is the least common subsumer of C1 , . . . , Cn w.r.t. T iff
(1) Ci vT D for all i ∈ {1, . . . , n}, and (2) for all EL+ -concept descriptions E:
Ci vT E for all i ∈ {1, . . . , n} implies D vT E.
    Let k ∈ N. Then the EL+ -concept description D is the role-depth bounded
least common subsumer of C1 , . . . , Cn w.r.t. T and the role-depth k iff (1)
rd(D) ≤ k, (2) Ci vT D for all i ∈ {1, . . . , n}, and (3) for all EL+ -concept
descriptions E with rd(E) ≤ k: Ci vT E ∀i ∈ {1, . . . , n} implies D vT E.
For EL+ the (k-)lcs is unique up to equivalence, thus we speak of the (k-)lcs.


3     Computing the k-lcs in EL+ by Completion

The algorithms to compute the role-depth bounded lcs rely on completion graphs
produced by completion-based subsumption algorithms. Completion algorithms
work on normalized TBoxes for which they build a completion graph and ex-
haustively apply completion rules to it. After this step, the completion graph
contains all subsumption relations from the TBox explicitly.


3.1   Completion algorithm for EL+

An EL+ -TBox T is in normal form, if all GCIs in T are of the form A v B,
A1 u A2 v B, A v ∃r.B, or ∃r.A v B with A, A1 , A2 , B ∈ NC and r ∈ NR ;
and all role inclusions are of the form s v r or s ◦ t v r with {r, s, t} ⊆ NR . All
EL+ -TBoxes can be normalized by applying a set of normalization rules [2].
    The completion graph for a normalized TBox T 0 is of the form (V, E, S),
where V = NC,T 0 ∪ {>} is the set of nodes, E ⊆ V × NR,T × V is the set of role
name labeled edges and S : V → 2NC,T 0 ∪{>} is the node-labeling. The completion
algorithm starts with an initial graph (V, E, S) with E = ∅ and S(A) = {A, >}
for each A ∈ NC,T 0 ∪ {>} and exhaustively applies a set of completion rules
from [2] until no more rule applies.
    After rule-application all subsumption relations can be directly read-off the
completion graph. This completion algorithm is sound and complete as shown
in [2]. The intuition is that the completion rules make implicit subsumption
relationships explicit in the following sense:

 – B ∈ S(A) implies that A vT B,
 – (A, r, B) ∈ E implies that A vT ∃r.B.

The resulting completion graph is used to compute the role-depth bounded lcs.


3.2   Computation algorithm of the k-lcs in EL+

All RIAs from the EL+ -TBox are explicitly captured in the completion graph
in the following way: for each edge in the completion graph that is labeled with
some role r, the completion algorithm also creates edges for all of r’s super-roles.
This means that for computing the k-lcs w.r.t. an EL+ -TBox the same algorithm
can be used as for EL. Such an algorithm was introduced in [10] and, for the
Algorithm 1 Computation of a role-depth bounded EL+ k-lcs.
Procedure k-lcs (C, D, T , k)
Input: C, D: EL+ -concept descriptions; T : EL+ -TBox; k: natural number
Output: k-lcs(C, D): role-depth bounded EL+ k-lcs of C, D w.r.t. T and k
1: T 0 := normalize(T ∪ {A ≡ C, B ≡ D})
2: (V, E, S) := apply-completion-rules(T 0 )
3: L := k-lcs-r(A, B, (V, E, S), k)
4: return remove-normalization-names(L)

Procedure k-lcs-r(A, B, (V, E, S), k)
Input: A, B: concept names; (V, E, S): completion graph; k: natural number
Output: k-lcs(A, B): role-depth bounded EL+ k-lcs of A, B w.r.t. T and k
1: common-names := S(A) ∩ S(B)
2: if k = 0 then l
3:    return             P
               P ∈common-names
4: else              l
5:    return                     P u
               P ∈common-names
                     l                 l                                             
                                                ∃r.k-lcs-r(C, D, (V, E, S), k − 1)
                   r∈NR   (A,r,C)∈E,(B,r,D)∈E




convenience of the reader, is shown in Algorithm 1 (for the binary lcs).The idea
is to introduce new concept names for the concept descriptions of interest and
to apply the completion algorithm. Then, starting from the newly introduced
names A and B, traverse the completion graph simultaneously. More precisely,
for the tree unravelings of the completion graph of depth k for A and B, the
cross product is computed. In a post-processing step concept names that were
introduced by normalization have to be removed from the concept description.
Obviously, this method creates heavily redundant concept descriptions.
    In cases where the exact lcs exists, the algorithm computes the exact lcs, if
started with a big enough k.


4     Optimizations for the k-lcs algorithm
We present two types of optimizations: to obtain succinct rewritings of EL+ -
concept descriptions and to speed-up the k-lcs computation.

4.1   Simplifying EL+ -concept descriptions
The highly redundant EL+ -concept descriptions obtained from the k-lcs algo-
rithm need to be simplified, in order to make the resulting concept description
readable. The general idea for the simplification is to remove those subtrees
from the syntax tree which are subsumers of any of their sibling subtrees. For a
conjunction of concept names, this results in the least ones (w.r.t. vT ).
Algorithm 2 Simplification
Procedure simplify(C, (V, E, S), T )
Input: C: EL+ -concept description; (V, E, S): completion graph; T : EL+ -TBox
Output: simplify(C): simplified concept description
 1: Let C ≡ A1 u . . . u An u ∃r1 .D1 u . . . u ∃rm .Dm with Ai ∈ NC for 1 ≤ i ≤ n.
 2: Conj := {Ai | 1 ≤ i ≤ n} ∪ {∃rj .Dj | 1 ≤ j ≤ m}
 3: for all X ∈ Conj do
 4:   for all Y ∈ Conj do
 5:      if X 6= Y ∧ subsumes-H(X, Y, (V, E, S), T ) then
 6:         Conj := Conj \ {X}
 7:         break
 8: for all X ∈ Conj do
 9:   if X ≡ ∃rj .Dj then
10:      Conjd := (Conj \ {∃rj .Dj }) ∪ {∃rj .simplify(Dj , (V, E, S), T )}
11: return     X∈Conj X




    Algorithm 2 computes such simplifications of EL+ -concept descriptions. Note,
that it needs to be applied after the normalization names are removed. Otherwise
it might remove names from the original TBox that subsume a normalization
name, which gets removed later during denormalization. For the soundness of
the simplification procedure simplify, it is only necessary to ensure that the pro-
cedure ‘subsumes-H’ is sound. However, for our purpose this procedure does not
need to be complete. This might result in simplifications that are equivalent to
the input concept description, but that are still redundant. Such a heuristic is
displayed in Algorithm 3.
    It is easy to see that the procedure subsumes-H is sound by an inspection of
the different cases according to the structure of C and D. For instance, if C is a
conjunction (line 11), the procedure only returns true if D is a subsumer of Fi ,
for all 1 ≤ i ≤ n. If both C and D are existential restrictions (line 23), then the
procedure returns true if s vT r and F is a subsumer of G, or if there is a role
t with s ◦ t vT r and G = . . . u ∃t.H u . . . where F is a subsumer of H. In both
cases we clearly have D vT C. All other cases are proven similarly.


4.2   Speeding-up the k-lcs algorithm

A concept description C is fully expanded up to the role-depth k w.r.t. T , if (1)
for all A ∈ NC,T with C vT A we have that A is a conjunct of C and (2) if
k > 0 then for all F with C vT ∃r.F we have an F 0 with F 0 vT F s.t. ∃r.F 0 is
a conjunct of C and F 0 is fully expanded up to role-depth k − 1. Now, the k-lcs-
r procedure constructs result descriptions that are the fully expanded and thus
heavily redundant. Most of this redundancy is then removed by the simplification
procedure. For large ontologies with a deep role hierarchy, the fully expanded
result may grow quite large, which causes long run-times and incomprehensible,
large concept descriptions. Therefore, the idea for optimization is to avoid gener-
Algorithm 3 Subsumes-H
Procedure subsumes-H(C, D, (V, E, S), T )
Input: C, D: EL+ -concept descriptions; (V, E, S): completion graph; T : EL+ -TBox
Output: whether C subsumes D w.r.t. T
 1: if C ∈ NC then
 2:    if C = > then
 3:       return true
 4:    else if D ∈ NC then
 5:       return C ∈ S(D)
 6:    else if D ≡ F1 u . . . u Fn then
 7:       for all 1 ≤ i ≤ n do
 8:          if subsumes-H(C, Fi , (V, E, S), T ) then
 9:             return true
10:    return false
11: else if C ≡ F1 u . . . u Fn then
12:    for all 1 ≤ i ≤ n do
13:       if not subsumes-H(Fi , D, (V, E, S), T ) then
14:          return false
15:    return true
16: else if C ≡ ∃r.F then
17:    if D ∈ NC then
18:       return (C, r, F ) ∈ E
19:    else if D ≡ F1 u . . . u Fn then
20:       for all 1 ≤ i ≤ n do
21:          if subsumes-H(C, Fi , (V, E, S), T ) then
22:             return true
23:    else if D ≡ ∃s.G then
24:       if s vT r and subsumes-H(F, G, (V, E, S), T ) then
25:          return true
26:       else
27:          for all t ∈ NR do
28:             if s ◦ t vT r, G ≡ . . . u ∃t.H u . . .
                and subsumes-H(F, H, (V, E, S), T ) then
29:                return true
30:       return false



ating fully expanded descriptions and apply some of the simplifications already
during the construction of the result.


Optimization 1: avoid unnecessary role-depth. This simple optimization
applies, if one of the input concepts of the k-lcs-r procedure already subsumes
the other one, in which case it is the lcs of both. Therefore in k-lcs-r(A, B, S, k),
if A ∈ S(B) we can simply return A and if B ∈ S(A) we can return B.
    This is a well-known optimization for the computation of the lcs for concept
descriptions without reference to a TBox [5]. However, if applied in the context
of completion, this optimization interacts with the denormalization and the role-
depth bound in non-obvious ways. Consider the TBox

            T = {A v ∃r.∃r.K, B v ∃r.∃r.K, ∃r.∃r.K v ∃s.(L u M )}

which gets normalized to
       T 0 = {A v ∃r.X1 , X1 v ∃r.K, B v ∃r.X2 , X2 v ∃r.K, ∃r.K v X3 ,
                 ∃r.X3 v X4 , X4 v ∃s.X5 , X5 v L, X5 v M }.
k-lcs-r(A, B, (· · · ), 1) would yield X4 u ∃r.X3 u ∃s.(X5 u L u M ), which denormal-
izes to > u ∃r.> u ∃s.(L u M ), while the optimization would yield X4 u ∃r.X3 u
∃s.X5 , which denormalizes to > u ∃r.> u ∃s.>, which is not the lcs anymore.
Therefore, the optimization can only be applied to concept names from the orig-
inal TBox, never to normalization names. This optimization and the next one
are shown in Algorithm 4.

Optimization 2: avoid unnecessary branching. For this optimization con-
sider the TBox

       Tn = {A v D u ∃r.C1 , B v E u ∃r.C1 } ∪ {C1 v Ci | i ∈ {2 . . . n}}.

In this case, the lcs of A and B is just ∃r.C1 . However, the algorithm would
generate the complete product set {Ci | i ∈ {1, . . . , n}} × {Ci | i ∈ {1, . . . , n}}
and recursively call k-lcs-r for each pair, just to eliminate all ∃r.Ci for i > 1 and
all ∃r.> afterwards in the simplification step. Even for this simple example, the
algorithm would require time quadratic in the size of the input TBox. Clearly,
evaluating the Ci s for i > 1 is not necessary, as they all subsume C1 . The same
is true for role hierarchies, where for example

         Tn0 = {A v D u ∃r.C1 , B v E u ∃r.C1 } ∪ {r v ri | i ∈ {2 . . . n}}

would lead to the same unnecessary quadratic runtime.
    The idea to avoid this kind of branching is to explicitly create the sets SA and
SB of all role-successors (A, r, C) ∈ E and (B, r, C) ∈ E for input concepts A and
B and to remove all role-successors which are subsumers of other role-successors
in the same set. Recursive calls to k-lcs-r-o can then be made with one successor
from each set SA and SB. However, we have to be careful with role-successors
with different role-names. For example, for a role-successor (r, C) ∈ SA and a
role-successor (s, D) ∈ SB, a recursive call has to be made for all minimal (w.r.t.
vT ) role names t with r vT t and s vT t. The following lemma shows that this
optimization is correct.
Lemma 1. The results of the k-lcs-r and k-lcs-r-o procedures are equivalent.
Proof. Let T be a normalized TBox, (V, E, S) be its completion graph, A, B ∈
NC,T , k ∈ N and L = k-lcs-r(A, B, (V, E, S), k) and Lo = k-lcs-r-o(A, B, (V, E, S), k).
We have to show that L ≡T Lo . Note, that if optimization 1 is applicable, then
A ∈ S(B) or B ∈ S(A) and hence A (resp. B) is equivalent to the lcs. Otherwise,
we show that L ≡T Lo by induction on the role-depth bound k.
Algorithm 4 Computation of a role-depth bounded EL+ -lcs.
Procedure k-lcs-o(C, D, T , k)
Input: C, D: EL+ concept descriptions; T : EL+ TBox; k: natural number
Output: k-lcs(C, D): role-depth bounded EL+ -lcs of C, D w.r.t. T and k
1: T 0 := normalize(T ∪ {A ≡ C, B ≡ D})
2: (V, E, S) := apply-completion-rules(T 0 )
3: L := k-lcs-r-o(A, B, (V, E, S), k)
4: if L ≡T 0 A then
5:    return C
6: else if L ≡T 0 B then
7:    return D
8: else
9:    return remove-normalization-names(L)

Procedure k-lcs-r-o(A, B, (V, E, S), k)
Input: A, B: concept names; (V, E, S): completion graph; k: natural number
Output: k-lcs(A, B): role-depth bounded EL+ -lcs of A, B w.r.t. T and k
1: if A ∈ S(B) and A is not a normalization name then
2:    return A
3: else if B ∈ S(A) and B is not a normalization name then
4:    return B
5: common-names := S(A) ∩ S(B)
6: SA := remove-redundant({(r, C) | (A, r, C) ∈ E})
7: SB := remove-redundant({(s, D) | (B, s, D) ∈ E})
8: if k = 0 then   l
9:    return                P
                P ∈common-names
10: else              l                          l
11:    return                 P u                           ∃ t.k-lcs-r-o(C, D, (V, E, S), k-1)
                P ∈common-names         (r,C)∈SA, (s,D)∈SB,
                                    t∈NR minimal with rvT t∧svT t

Procedure remove-redundant(S)
Input: S: set of role-successors
Output: remove-redundant(S): simplified set
1: for all (r, C) ∈ S do
2:   for all (s, D) ∈ S do
3:      if (r, C) 6= (s, D) and r vT s and D ∈ S(C) then
4:         S := S \ {(s, D)}
5: return S



For k = 0 both procedures return the same concept description, i.e., L = Lo .
For k > 0, we first show that for each r ∈ NR,T and all (A, r, C) ∈ E, (B, r, D) ∈
E, we have Lo vT ∃r.k-lcs-r(C, D, (V, E, S), k − 1). If (A, r, C) ∈ E, then there
exists (s1 , C 0 ) ∈ SA with s1 vT r and C ∈ S(C 0 ). Similarly, there exists
(s2 , D0 ) ∈ SB with s2 vT r and D ∈ S(D0 ). Then ∃t.k-lcs-r-o(C 0 , D0 , (V, E, S), k−
1) is a conjunct of Lo for all minimal t ∈ NR,T with s1 vT t and s2 vT t. Since
s1 vT r and s2 vT r, there is at least one minimal t0 ∈ NR,T with t0 v r
for which ∃t0 .k-lcs-r-o(C 0 , D0 , (V, E, S), k − 1) is conjunct of Lo . Together with
k-lcs-r-o(C 0 , D0 , (V, E, S), k − 1) ≡T k-lcs-r(C 0 , D0 , (V, E, S), k − 1) by the induc-
tion hypothesis and k-lcs-r(C 0 , D0 , (V, E, S), k −1) vT k-lcs-r(C, D, (V, E, S), k −
1) for C 0 vT C and D0 vT D we have ∃t0 .k-lcs-r-o(C 0 , D0 , (V, E, S), k − 1) vT
∃r.k-lcs-r(C, D, (V, E, S), k − 1). Since Lo vT ∃r.k-lcs-r(C, D, (V, E, S), k − 1) for
each r ∈ NR,T and all edges (A, r, C) ∈ E, (B, r, D) ∈ E and also Lo vT C for
C ∈ common-names, we have Lo vT L. Obviously, k-lcs-r computes all of the
recursive concept descriptions (and possibly more) that k-lcs-r-o computes and
hence L vT Lo . Thus L ≡T Lo .

    Obviously, both optimizations also yield shorter, but equivalent concept de-
scriptions than the naive algorithm.


5     Implementation and Evaluation
We implemented the role-depth bounded lcs for EL+ in the new version of our
Protégé plug-in GEL1 . This program uses the reasoner jCEL2 , which imple-
ments the completion-based classification algorithms for EL+ (and ELHIfR+ ).
The earlier version of GEL implemented this reasoning service for EL [10, 9].
    GEL processes concept descriptions and ontologies in OWL API format.
According to Algorithm 4, it first adds new concept definitions for the input
EL+ -concept descriptions to the ontology and then uses jCEL to apply the
normalization and the completion rules. After all completion sets are computed,
the GEL-processor applies the recursive k-lcs-r-o procedure. The resulting k-lcs
is then (possibly simplified,) translated back to OWL API format and returned.
Although Algorithm 4 is formulated for the binary k-lcs, the implementation can
handle input tuples of concept descriptions by applying the binary k-lcs to the
first two concepts and then successively computing the k-lcs of the result with
the next concept in the tuple. In previous tests we found this to be faster then
computing the n-ary k-lcs directly.
    Our implementation includes a Protégé plug-in with a Protégé ontology
view, i.e., a component where the user can input concept descriptions and select
processing options such as the role-depth bound and whether to apply the sim-
plification or optimizations. Figure 1 shows a screen-shot of the GEL plug-in.



5.1    Evaluation
Test data. The real-world ontologies used in our tests were the Gene Ontology
[6] and Not-Galen, a version of the Galen ontology [11] pruned to EL+ . Since
most random tuples of concepts in these ontologies have nothing in common,
their k-lcs would trivially be >. To exclude such uninteresting cases, we selected
around 50 tuples of named concepts by hand, taking concepts classified as sibling
1
    http://sourceforge.net/p/gen-el
2
    http://jcel.sourceforge.net
                Fig. 1. Screen-shot of the Protégé plug-in GEL

concepts that had similar existential restrictions. We computed the k-lcs for
these input concepts with various role-depth bounds both with and without
optimizations. We computed the k-lcs for these input concept descriptions with
various role-depth bounds both with and without optimizations.

Evaluation. In the worst case the role-depth bounded lcs can have a size that
is exponential in the role-depth bound k. However, it largely depends on the
ontology whether such worst-case behavior occurs. For the Gene Ontology, the
role-depth bounded lcs was always constructed and simplified almost instantly.
The runtime was totally dominated by the classification time by jCEL.
     For Not-Galen some input concept pairs resulted in long runtimes, which
were mostly dominated by the runtime of the k-lcs-r-procedure and not by clas-
sification. Figure 2 shows the average k-lcs-r-construction runtime of GEL on
various input pairs for different values of k. Figure 2 also shows the effect of
the optimizations on the construction time. Optimization 1, which returns one
of the input concepts if it subsumes the other, is able to cut-off the k-lcs-r-o
recursion before the maximum role-depth is reached. This improves the runtime
by a factor of around 2 compared to the basic k-lcs-r procedure with no op-
timizations. Optimization 2, which removes redundant successor nodes before
product construction, reduces the branching factor of the recursion. In our tests,
it yielded even better runtime improvements than Optimization 1—on average
by factor 30. Each optimization yields a larger speed-up for increasing role-depth
bounds. Combining the optimizations yielded the best runtime for most cases,
which indicates independent of the optimizations.
                                                              10000



                                                               1000




                                     Construction time [ms]
                                                                100
                                                                                                                        no optimization
                                                                                                                        optimization 1
                                                                    10
                                                                                                                        optimization 2
                                                                                                                        optimization 1+2
                                                                     1



                                                                0,1
                                                                             1       2      3        4      5   6
                                                                                         Role-depth bound



Fig. 2. Average k-lcs-r-construction time for concept descriptions from Not-Galen
                                     10000




                                                1000
            Concept size reduction




                                                                                                                           no optimization
                                                              100
                                                                                                                           optimization 1
                                                                                                                           optimization 2
                                                                                                                           optimization 1+2
                                                               10




                                                                1
                                                                         1       2         3         4      5       6
                                                                                         Role-depth bound



Fig. 3. Average size-reduction by simplification for k-lcses computed w.r.t. Not-Galen


     In practice, the runtime with only Optimization 1 (or no optimization) was
sometimes too long to be useful. For some input concepts, like PepticUlcer and
AreaOfAtrophicGastritis, it did not return a result within an hour for a role-depth
bound for as low as 4. However, with both optimizations enabled, the result for
a role-depth bound of 4 was computed in 1.9 seconds (with a classification time
of 330ms). The reason that Optimization 2 is so effective on Not-Galen is its
a deep role hierarchy. Without the optimization for each role also all subsuming
roles would generate a recursive k-lcs-r call, which yields large branching factors.
     The runtime for the simplification is proportional to the size of the concept
description before simplification (which is roughly proportional to the runtime
of the k-lcs-r-construction, with a little bit of overhead for the second opti-
mization). However, Figure 3 shows that simplification reduces the concept sizes
dramatically – even with both optimizations enabled the size of the result is
still reduced by factor 16 on average. Of course, since both optimizations apply
simplification steps during the construction of the k-lcs, the size-reduction for
results computed without optimizations would be even larger.
6    Conclusions
In this paper we presented highly effective optimization methods for the comput-
ing approximative solutions for least common subsumers w.r.t. general TBoxes
written in EL+ . Our optimization methods address two aspects of the role-depth
bounded lcs. On the one hand we devised optimization methods for speeding-up
the computation and on the other, we devised a simplification procedure that
yields smaller, but equivalent rewritings of EL+ -concept descriptions. This pro-
cedure turned out to be extremely helpful, when reducing the concept size. For
the Not-Galen ontology the result concepts were reduced by several orders of
magnitude. With the here devised methods a large part of the constructors avail-
able in the OWL 2 EL profile can be covered when computing generalizations.

References
1. F. Baader. Least common subsumers and most specific concepts in a description
   logic with existential restrictions and terminological cycles. In Proc. of the 18th Int.
   Joint Conf. on Artificial Intelligence (IJCAI-03). Morgan Kaufm., 2003.
2. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope. In Proc. of the
   Nineteenth International Joint Conference on Artificial Intelligence IJCAI-05, 2005.
3. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope further. In In Proc.
   of the OWLED Workshop, 2008.
4. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P.F. Patel-Schneider, edi-
   tors. The Description Logic Handbook: Theory, Implementation, and Applications.
   Cambridge University Press, 2003.
5. F. Baader and A.-Y. Turhan. On the problem of computing small representations of
   least common subsumers. In Proc. of the 25th German Annual Conf. on Artificial
   Intelligence (KI’02), vol. 2479 of LNAI. Springer, 2002.
6. The Gene Ontology Consortium. Gene Ontology: Tool for the unification of biology.
   Nature Genetics, 25:25–29, 2000.
7. A. Ecke and A.-Y. Turhan. Role-depth bounded least common subsumers for EL+
   and ELI. To appear in Proc. of DL’12.
8. K. Janowicz. Computing Semantic Similarity Among Geographic Feature Types
   Represented in Expressive Description Logics. PhD thesis, Instit. f. Geoinformatics,
   Univ. of Münster, Germany, 2008.
9. J. Mendez, A. Ecke, and A.-Y. Turhan. Implementing completion-based inferences
   for the EL-family. In Proc. of the 2011 Description Logic Workshop (DL’11), 2011.
10. R. Peñaloza and A.-Y. Turhan. A practical approach for computing generalization
   inferences in EL. In Proc. of the 8th European Semantic Web Conf. (ESWC’11),
   LNCS. Springer, 2011.
11. A. Rector and I. Horrocks. Experience building a large, re-usable medical ontology
   using a description logic with transitivity and concept inclusions. In Proc. of the
   WS on Ontological Engineering, AAAI Spring Symp. (AAAI’97), 1997.
12. C. Rosse and J. L. V. Mejino Jr. A reference ontology for biomedical informatics:
   the foundational model of anatomy. Journal of Biomedical Informatics, 36(6).
13. K. Spackman. Managing clinical terminology hierarchies using algorithmic calcula-
   tion of subsumption: Experience with snomed-rt. Journal of the American Medical
   Informatics Assoc., 2000. Fall Symposium Special Issue.