=Paper=
{{Paper
|id=None
|storemode=property
|title=Concurrent Classification of OWL Ontologies - An Empirical Evaluation
|pdfUrl=https://ceur-ws.org/Vol-846/paper_31.pdf
|volume=Vol-846
|dblpUrl=https://dblp.org/rec/conf/dlog/AslaniH12
}}
==Concurrent Classification of OWL Ontologies - An Empirical Evaluation==
<pdf width="1500px">https://ceur-ws.org/Vol-846/paper_31.pdf</pdf>
<pre>
       Concurrent Classification of OWL Ontologies –
                An Empirical Evaluation

                             Mina Aslani and Volker Haarslev

                       Concordia University, Montreal, Quebec, Canada

       Abstract. This paper describes our progress in developing algorithms for con-
       current classification of OWL ontologies. We refactored the architecture of our re-
       search prototype and its employed algorithms by integrating lock-free data struc-
       tures and adopting various optimizations to reduce overhead. In comparison to
       our earlier work we increased the size of classified ontologies by one order of
       magnitude, i.e., the size of processed ontologies is now beyond a quarter mil-
       lion of OWL classes. The main focus of this paper is an empirical evaluation
       with huge ontologies that demonstrates an excellent speedup that almost directly
       corresponds to the number of used processors or cores.


1   Introduction
Parallel algorithms for description logic (DL) reasoning were first explored in the FLEX
system [3] where various distributed message-passing schemes for rule execution were
evaluated. The reported results seemed to be promising but the research suffered from
severe limitations due to the hardware available for experiments at that time. The only
other work on parallelizing DL reasoning [9] reported promising results using multi-
core and multi-processor hardware, where the parallel treatment of disjunctions and
individual merging (due to number restrictions) is explored. In [11] an approach on
distributed reasoning for ALCHIQ is presented that is based on resolution techniques
but does not address optimizations for TBox classification.
    Other work has studied concurrency in light-weight ontology languages. There is a
distributed Map Reduce approach algorithm for EL+, however no experiments had been
reported on the proposed algorithms [10]. Other work focuses on distributed reasoning,
and these approaches are different than ours as they manage large-scale data which is
beyond the memory of a single machine [11, 14, 6, 8, 12]. There also exists work on
parallel distributed RDF inferencing (e.g., [13]) and parallel reasoning in first-order
theorem proving but due to completely different proof techniques (resolution versus
tableaux) and reasoning architectures this is not considered as relevant here. Another
work presents an optimized consequence-based procedure for classification of ontolo-
gies but it only addresses the DL EL [7].
    The work in this paper is an extension of our work on Parallel TBox classification
[1]. Compared to our previous work, this paper reports on an enhanced lock-free ver-
sion of algorithms utilizing concurrency in a multi-core environment, optimizations that
increase the performance, a performance evaluation with huge real-world ontologies in
the range of 300K OWL classes (DL concepts) such as SNOMED. Our prototype not
only addresses huge real-world ontologies but also does not compromise on DL com-
plexity. It can process much more complex DLs (e.g., at least SHIQ) than EL, and
provides an excellent speedup considering that no particular DL related optimization
technique is used. The implemented prototype system performs concurrent TBox clas-
sification based on various parameters such as number of threads, size of partitions
assigned to threads, and number of processors. Our evaluation demonstrates impressive
performance improvements where the number of available processors almost linearly
decreases the processing time due to a small overhead. It is important to note that the
focus of this research is on exploring algorithms for concurrent TBox classification and
not on developing a highly optimized DL reasoner. We are currently only interested in
the speedup factor obtained from comparing sequential and parallel runs of our proto-
type.


2   The Concurrent TBox Classifier

This section describes the architecture of the implemented system and its underlying
sound and complete algorithm for concurrent classification of DL ontologies. To com-
pute the hierarchy in parallel, we developed a Java application using a multi-threaded
architecture providing control parameters such as number of threads, number of con-
cepts (also called partition size) to be inserted per thread, and number of processors. As
thoroughly explained in [1], the program reads an input file containing a list of concept
names to be classified and information about them which is generated by the OWL rea-
soner Racer [4]. Racer is only used for generating the input files for our prototype. The
per-concept information available in the file includes the concept name, its parents (in
the complete taxonomy), so-called told subsumers and disjoints, and pseudo model [5]
information. This architecture was deliberately designed to facilitate our experiments
by using existing OWL reasoners to generate auxiliary information and to make the
Concurrent TBox Classifier independent of particular DLs.
     The preprocessing algorithm uses a topological sorting similar to [1] and the order
for processing concepts is based on the topologically sorted list. To manage concur-
rency and multi-threading in our system, as described in [1], a single-shared global tree
approach is used. Also, to classify the TBox, two symmetric tasks are employed, i.e.,
the so-called enhanced tree traversal method [2] using top (bottom) search to compute
the parents (children) of a concept to be inserted into the taxonomy.
     In [1], we first introduced our algorithms for parallel classification and reported
considerable performance improvements but we could only process relatively small
ontologies. In this paper, we introduce the enhanced concurrent version of these algo-
rithms, i.e., Algorithms 2, 6 and 7. In order to make the paper self-contained we repeat
Algorithms 1, 3, 4 and 5 from [1].
     The procedure parallel tbox classification is sketched in Algorithm 1. It is called
with a list of named concepts and sorts them in topological order with respect to the
initial taxonomy created from already known told ancestors and descendants of each
concept (using the told subsumer information). The classifier assigns in a round-robin
manner partitions with a fixed size from the concept list to idle threads and activates
these threads with their assigned partition using the procedure insert partition outlined
in Algorithm 2. All threads work in parallel with the goal to construct a global subsump-
tion tree (taxonomy). They also share a global array located concepts indexed by thread
Algorithm 1 parallel tbox classification(concept list)
     topological order list ← topological order(concept list)
     repeat
        wait until an idle thread ti becomes available
        select a partition pi from topological order list
        run thread ti with insert partition(pi , ti )
     until all concepts in topological order list are inserted


identifications. Using the Concurrency package in Java, synchronization on the nodes
of the global tree as well as the entries in the global array have now been eliminated.
     The procedure insert partition inserts all concepts of a given partition into the global
taxonomy. We use Concurrent collections from the java.util.concurrent package. This
package supplies Collection implementations which are thread-safe and designed for
use in multi-threaded contexts. Therefore, for updating a concept or its parents or chil-
dren, no locking mechanism for the affected nodes of the global tree is needed anymore.
Algorithm 2 first performs for each concept new the top-search phase (starting from the
top concept (>)) and possibly repeats the top-search phase for new if other threads up-
dated the list of children of its parents. Then, it sets the parents of new. Afterwards the
bottom-search phase (starting from the bottom concept (⊥)) is performed. Analogously
to the top-search phase, the bottom search is possibly repeated and sets the children of
new. After finishing the top and bottom search for new, the node new is added to the
entries in located concepts of all other busy threads; it is also checked whether other
threads updated the entry in located concepts for this thread. If this was the case, the
top and/or bottom search need to be repeated correspondingly.
     To reduce overhead in re-running of top or bottom search, we only re-run twice. If
the concept new is still not ready to be inserted; e.g., there is any interaction between
new and a concept in located concepts; it will be added to the partition list of concepts
(to be located later), and also eliminated from the other busy threads’ located concepts
list, otherwise, new can be inserted into the taxonomy using Algorithm 7. In order to
avoid unnecessary tree traversals and tableau subsumption tests when computing the
subsumption hierarchy, the parallel classifier adapted the enhanced traversal method
[2], which is an algorithm that was designed for sequential execution. Algorithms 3 and
41 outline the traversal procedures for the top-search phase.
     The possible incompleteness caused by parallel classification [1] can be character-
ized by the following two scenarios: Scenario I: In top search, as the new concept is
pushed downward, right after the children of the current concept have been processed,
at least one new child is added by another thread. In this scenario, the top search for
the concept new is not aware of the recent change and this might cause missing sub-
sumptions if there is any interaction between the concept new and the added children.
The same might happen in bottom search if the bottom search for the concept new is
not informed of the recent change to the list of parents of the current node. Scenario II:
Between the time that top search has been started to find the location of the concept
new in the taxonomy and the time that its location has been decided, another thread has
 1
     Algorithm found in ancestors(current,new) checks if current is an ancestor of new.
Algorithm 2 insert partition(partition,id)
  for all new ∈ partition do
     rerun ← 0
     finish rerun ← false
     parents ← top search(new,>)
     while ¬ consistent in top search(parents,new) do
        parents ← top search(new,>)
     predecessors(new) ← parents
     children ← bottom search(new,⊥)
     while ¬ consistent in bottom search(children,new) do
        children ← bottom search(new,⊥)
     successors(new) ← children
     for all busy threads ti 6= id do
        located concepts(ti ) ← located concepts(ti ) ∪ {new }
     check ← check if concept has interaction(new , located concepts(id ))
     while (check 6= 0) and ¬finish rerun do
        if rerun < 3 then
            if check = 1 then
                new predecessors ← top search(new,>)
                rerun ← rerun + 1
                predecessors(new) ← new predecessors
            if check = 2 then
                new successors ← bottom search(new,⊥)
                rerun ← rerun + 1
                successors(new) ← new successors
            check ← check if concept has interaction(new , located concepts(id ))
        else
            finish rerun ← true
            for all busy threads ti 6= id do
                located concepts(ti ) ← located concepts(ti ) \ {new }
     if ¬finish rerun then
        insert concept in tbox(new, predecessors(new), successors(new))


placed at least one concept into the hierarchy which the concept new has an interaction
with. Again, this might cause missing subsumptions and is analogously also applicable
to bottom search.

     Both scenarios are properly addressed in Algorithm 2 to ensure completeness. Every
time a thread locates a concept in the taxonomy, it notifies the other threads by adding
this concept name to their “located concepts” list. Therefore, as soon as a thread finds
the parents and children of the concept new by running top search and bottom search;
it checks if there is any interaction between concept new and the concepts located in the
“located concepts” list. Based on the interaction, top search or bottom search needs
to be repeated accordingly. If no possible situations for incompleteness are discovered
anymore, Algorithm 7 is called. To resolve the possible incompleteness we utilize Algo-
Algorithm 3 top search(new,current)
     mark(current,‘visited’)
     pos-succ ← ∅
     captured successors(new)(current) ← successors(current)
     for all y ∈ successors(current) do
         if enhanced top subs(y,new) then
             pos-succ ← pos-succ ∪ {y}
     if pos-succ = ∅ then
         return {current}
     else
         result ← ∅
         for all y ∈ pos-succ do
             if y not marked as ‘visited’ then
                 result ← result ∪ top search(new,y)
         return result

Algorithm 4 enhanced top subs(current,new)
     if current marked as ‘positive’ then
         return true
     else if current marked as ‘negative’ then
         return false
     else if for all z ∈ predecessors(current)
                 enhanced top subs(z,new)
             and found in ancestors(current,new) then
         mark(current,‘positive’)
         return true
     else
         mark(current,‘negative’)
         return false


rithms 5 and 6.2 The procedure consistent in bottom search is not shown here because
it mirrors consistent in top search.


3      Evaluation

In the previous section, we explained the algorithms used in our Concurrent TBox Clas-
sifier. In this section, we study the scalability and performance of our prototype. Here,
we would like to explain the behavior of our system when we run it in a (i) sequential or
(ii) parallel multi-processor environment. We also describe how the prototype performs
when we have huge real-world ontologies with different DL complexities. Therefore,
in the remaining of this section, we report on the conducted experiments.
     We first provide a description of the used platform and the implemented prototype,
then we describe the test cases used to evaluate Concurrent TBox Classifier and provide
 2
     Algorithm interaction possible(new,concept) uses pseudo model merging [5] to decide
     whether a subsumption is possible between new and concept.
Algorithm 5 consistent in top search(parents,new)
  for all pred ∈ parents do
     if successors(pred) 6= captured successors(new)(pred) then
         diff ← successors(pred) \ captured successors(new)(pred)
         for all child ∈ diff do
            if found in ancestors(child,new) then
                return false
  return true


Algorithm 6 check if concept has interaction(new,located concepts)
  The return value indicates whether and what type of re-run needs to be done:
  0 : No re-run in needed
  1 : Re-run TopSearch because a possible parent could have been overlooked
  2 : Re-run BottomSearch because a possible child could have been overlooked
  if located concepts = ∅ then
      return 0
  else
      for all concept ∈ located concepts do
         if interaction possible(new,concept) then
             if found in ancestors(new,concept) then
                 return 2
             else
                 return 1
         else if interaction possible(concept,new) then
             if found in ancestors(new,concept) then
                 return 2
             else
                 return 1
      return 0


an overview of the parameters used in the experiments. Finally, we show the results
and discuss the performance of the classifier. In addition, the measured runtimes in the
figures are shown in seconds using a logarithmic scale.
Platform and implementation All the experiments were conducted on a high perfor-
mance parallel computing cluster. The nodes in the cluster run an HP-version of RedHat
Enterprise Linux for 64 bit processors, with HP’s own XC cluster software stack. To
evaluate our approach, Concurrent TBox Classifier has been implemented in Java using
lock-free data structures from the java.util.concurrent package with minimal synchro-
nization.
Test cases Table 1 shows a collection of 9 mostly publicly available real-world on-
tologies. Note that the chosen test cases exhibit different sizes, structure, and DL com-
plexities. The benchmark ontologies are characterized by their name, size in number of
named concepts or classes, and used DL.
Parameters used in experiments The parameters used in our empirical evaluation and
their meaning are described below (the default parameter value in shown in bold).
Algorithm 7 insert concept in tbox(new,predecessors,successors)
     for all pred ∈ predecessors do
        successors(pred) ← successors(pred) ∪ {new}
     for all succ ∈ successors do
        predecessors(succ) ← predecessors(succ) ∪ {new}

Table 1. Characteristics of the used test ontologies (e.g., LH denotes the DL allowing only
conjunction and role hierarchies, and unfoldable TBoxes)

                         Ontology       DL language No. of named concepts
                        Embassi-2         ALCHN                  657
                         Galen1            ALCH                 2,730
                    LargeTestOntology     ELHR+                 5,584
                        Tambis-2a          ELH                 10,116
                           Cyc             LHF                 25,566
                      EClass-51En-1         LH                 76,977
                        Snomed-2           ELH                182,869
                        Snomed-1           ELH                223,260
                         Snomed            ELH                379,691


 – Number of Threads: To measure the scalability of our system, we have performed
   our experiments using different numbers of threads (1, 2, 4).
 – Partition Size: The number of concepts (5, 25) that are assigned to every thread and
   are expected to be inserted by the corresponding thread. Similar to the number of
   threads, this parameter is also used to measure the scalability of our approach.
 – Number of Processors: For the presented benchmarks we always had 8 processors
   or cores3 available.

Performance In order to test the effect of these parameters in our system, the bench-
marks are run with different parameter values. The performance improvement is mea-
sured using the speedup factor which is defined as Speedupp = TTp , where Speedupp is
                                                                1


the speedup factor, and
 – p is the number of threads. In the cluster environment we always had 8 cores avail-
   able and never used more than 8 threads in our experiments, so, each thread can be
   considered as mapped to one core exclusively;
 – T1 is the CPU time for the sequential run using only one thread and one single
   partition containing all concept names to be inserted;
 – Tp is the CPU time for the parallel run with p threads.

Effect of changing only the number of threads To measure the performance of the
classifier in this case we selected EClass-51En-1 as our test case and ran the tests with
a fixed partition size (5 or 25) but a different number of threads (2 and 4), as shown
in Fig. 1. In the following we use Pthreads,partition size to indicate a parallel multi-core
setting where the subscripts give the number of cores available, the number of threads
 3
     For ease of presentation we use the terms core and processor as synonyms here.
              SS PP2,5
                    2,5            PP4,5
                                      4,5       PP2,25
                                                   2,25        PP4,25
                                                                  4,25

           100,000
          100,000                                                                                              SS    PP2,5
                                                                                                                        2,5   PP4,5
                                                                                                                                 4,5   PP2,25
                                                                                                                                          2,25     PP4,25
                                                                                                                                                      4,25
            seconds)
        (in seconds)

                                                                                                                20
                                                                                                               20


                                                                                            factor
                                                                                    Speedup factor
                                                                                                                15
                                                                                                               15
                  10,000
Runtime (in


                 10,000
Runtime


                                                                                    Speedup
                                                                                                                10
                                                                                                               10

                                                                                                                55
                        1,000
                       1,000
                                      eclass-51en-1
                                     eclass-51en-1                                                              00
                                                                                                                               eclass-51en-1
                                                                                                                              eclass-51en-1
  Fig.1.1.
 Fig.
Fig.    1.Runtimes
           Runtimesfor
           Runtimes     foreclass-51en-1
                       for   eclass-51en-1using
                            eclass-51en-1             using555set-
                                                     using        set-
                                                                 set-
  tings:SSS(sequential),
 tings:
tings:       (sequential),PP2,5
            (sequential),        P2,5, ,,PP4,5
                                   2,5     P4,5, ,,PP2,25
                                             4,5     P2,25, ,,PP4,25
                                                       2,25     P4,25
                                                                  4,25              Fig.2.2.
                                                                                    Fig.
                                                                                   Fig.   2.Speedup
                                                                                             Speedupfor
                                                                                             Speedup foreclass-51en-1
                                                                                                     for eclass-51en-1from
                                                                                                         eclass-51en-1 fromFig.
                                                                                                                       from Fig.1.1.
                                                                                                                            Fig.  1
  (P
 (P
(P   threads,partitionsize
    threads,partition
   threads,partition       ).).)
                        size
                       size


    created, and the partitions size used (from left to right). In the test cases P2,5 and P4,5 ,
    we get an ideal speedup proportional to the number of threads, as shown in Fig. 2. As we
    can see, doubling the number of threads from S to P2,5 and to P4,5 , each time doubles
    the speedup, in other words, decreases the CPU time by the number of threads. This is
    the ideal speedup that we were expecting to happen.
         Comparing the SS test casesPPS,
                            PP2,5
                               2,5
                                         P , and P4,25 , we get an even better speedup, also
                                       4,252,25
                                      4,25
    shown in Fig. 1 and 2. In this case, the CPU time decreases almost to 10    1
                                                                                  compared to
      100,000
     100,000                                                         SS PP2,5
                                                                            2,5   PP4,25
                                                                                     4,25
    the sequential case (S). This speedup is due to a combination of the partition size as well
    as 10,000
       the  cache effect and results from the different memory hierarchies of a cluster with
            seconds)
        (in seconds)


        10,000
    modern computers. When we increase the number of threads to 4, the speedup is again
                                                                                                      factor
                                                                                              Speedup factor


    proportional
         1,000    to the number of threads and this is what2020 expected. Here, by doubling
                                                             we
        1,000
    the number of threads, the speedup doubles.
Runtime (in


                                                                                              Speedup


                         100
Runtime


                        100
    Effect of changing only partition sizes The performance                                            10 of the classifier in this case
                                                                                                      10
    for EClass-51En-1
             10
            10                            is also shown in Fig. 1 with a fixed number of threads (2 or 4) but
    different partition sizes (5 or 25). When using 2 threads, compared to case S, we get
                                                                                                       00
    the ideal11speedup                for Pyy2,5 , as             shown    c in        Fig. 2. As we can isee,   22 doubling
                                                                                                                          11 yy athe          a ycnumber c 11 of
                     - 2
                       - 2      n
                  sisi aalele the n11
                                          l    gg
                                            olospeedup,s
                                                             a
                                                             2 a
                                                         -s2- in       y c
                                                                         y
                                                                    cc other       - 1
                                                                                     -1
                                                                                 nn words, decreasesbaasthe
                                                                                 e                           ss-i- aalelenn otolologg isi-s2-2 ccy 11eenn- -
    threads,bbaasdoubles  gg nn       t o
                                        t o         bbi i                     11e                      b         gg CPU   nnt    timebb    by       half.
                                                                                                                                                      -s-5 This is
                                                                                                                                                        5
                                                                         s-s5-5 expecting to happen.
                                                                                                   eemm                OO aamm                   alass to case S
         eemm case ewhich
    the ideal                 stsOtO is
                              e                tatamm
                                                    what        we  l a
                                                                      l swere
                                                                        a                                       Te
                                                                                                                 T estst t t compared
                                                                                                                  Again,                     c l
                                                                                                                                               c
                                                                 eecc                                          e                            ee
                           eTT
                     rgrge size is increased to 25, it shows the same                                     rgrge
    if the partition
                 LLaa                                                                                 LLaaspeedup as shown in Fig. 2. In
    this case, the CPU time decreases almost to 15 compared to the previous case.
       InRuntimes
          the scenario
 Fig.3.3.Runtimes
Fig.
                          with 4 threads,
                     forontologies
                    for   ontologiesusing
                                           we  get a corresponding
                                             settings: Fig.
                                     using33settings:
                                                                      speedup,
                                                         Fig.4.4.Speedup
                                                                  Speedup for     as shown
                                                                           forontologies
                                                                               ontologies    inFig.
                                                                                          from
                                                                                         from   Fig.3.3.2.
                                                                                                Fig.
 In
 S   this  case,
   (sequential), the
                  P
S (sequential), P2,5  CPU
                       , P
                   2,5, P4,25
                             time
                          4,25. . decreases  to  almost  1
                                                        10 compared     to  the sequential  case. This
    speedup again is due to a combination of the number of threads as well as the cache
    effect. When we increase the partition size to 25, the speedup is what we expected.
    Here, by multiplying the partition size by 5, the speedup is multiplied by five too.
        Increasing the partition size, means that more concepts are assigned to one thread;
    therefore, all the related concepts are inserted into the taxonomy by one thread. Hence,
    increasing the partition size, reduces the number of corrections.
                                                                                                                                  eclass-51en-1
                                                                                                                                 eclass-51en-1
 Fig.1.1.Runtimes
Fig.     Runtimesfor  foreclass-51en-1
                           eclass-51en-1using      using55set- set-
 tings: SS (sequential),
tings:      (sequential), PP2,5 2,5, , PP4,5
                                          4,5, , PP2,25
                                                    2,25, , PP4,25
                                                               4,25                     Fig.2.2.Speedup
                                                                                        Fig.     Speedupfor
                                                                                                         foreclass-51en-1
                                                                                                             eclass-51en-1from
                                                                                                                           fromFig.
                                                                                                                                Fig.1.1.
 (Pthreads,partition
(P                     size).).
    threads,partition size


                                             SS      PP2,5
                                                        2,5      PP4,25
                                                                    4,25

            100,000
           100,000
            seconds)
        (in seconds)                                                                                                           SS       PP2,5
                                                                                                                                           2,5       PP4,25
                                                                                                                                                        4,25


                   10,000
                  10,000


                                                                                                   factor
                                                                                           Speedup factor
                        1,000                                                                                  20
                                                                                                              20
                       1,000
Runtime (in


                                                                                           Speedup
                          100
Runtime


                         100
                                                                                                               10
                                                                                                              10
                            10
                           10

                             11                                                                                 00
                                                                                                                          22 11 yy aa ycc 11
                                    -2-2 nn11 ggyy                2aa ccyycc enn-1-1                                 ssi-i- aalelenn otolologg bisis-2-2 ccy 11eenn--
                               s ssi i ggaalele totololo bbisis-2-                 1e                             as
                                                                                                                   a
                                                                                                                bb gg OOnnt aammb
                             a
                            bb a                    nn aamm                   -5-51                         eemm                                                s-5-5
                        eemm                    t
                                              ess
                                                 OtO    t t                s
                                                                        lalas s
                                                                                                                        T Teesst t       tt
                                                                                                                                                        clalass
                                                                                                                                                         c
                                          eTTe                       eecc                                         rgrge
                                                                                                                        e                             ee
                                  a argrge                                                                     LLaa
                               LL

 Fig.3.3. Runtimes
          Runtimes for
                     for ontologies
                            ontologies using
                                       using 3 settings:
                                                settings:                                   Fig. 4.
                                                                                           Fig.  4. Speedup
                                                                                                    Speedup for
                                                                                                            for ontologies
                                                                                                                ontologies from
                                                                                                                           from Fig.
                                                                                                                                Fig. 3.
                                                                                                                                     3
 Fig.
Fig. 3. Runtimes   for ontologies     using 33 settings:                                   Fig. 4. Speedup for ontologies from Fig. 3.
 S  (sequential), P2,5 , P4,25
SS(sequential),
   (sequential),PP2,5
                   2,5, ,PP4,25
                            4,25. .

  Effect of increasing both the number of threads and the partition size In this sce-
  nario, we measured the CPU time when increasing both the number of threads and the
  partition size. In Fig. 3 and 4, our test suite includes the ontologies Embassi-2, Galen1,
  LargeTestOntology, Tambis-2a, Cyc, and EClass-51En-1. The CPU time for each test
  case is shown in Fig. 3 and the speedup factor for each experiment is depicted in Fig. 4.
  As the results show, in the scenario with 2 threads and partition size 5, the speedup dou-
  bles compared to the sequential case and is around 2 and this is what we were expecting.
  When we increase the number of threads as well as the partition size, for the scenario
  with 4 threads and partition size 25, the CPU time decreases dramatically and therefore
  the speedup factor is above 20 for most test cases. This is more than a linear speedup,
  and it is the result of increasing the thread number as well as partition size together with
  the cache effect. The highest speedup factor is reported with test case Galen1.
  Experiment on very large ontologies We selected 3 Snomed variants as very large on-
  tologies with more than 150,000 concepts. Snomed-2 with 182,869 concepts, Snomed-1
  with 223,260 concepts, and Snomed with 379,691 concepts were included in our tests.
  Fig. 7 shows an excellent improvement of CPU time for the parallel over the sequential
  case. In Fig. 8, the speedup factor is almost 2, which the expected behavior. The best
  speedup factor is observed for test case Snomed.
  Observation on the increase of size of ontologies We chose Cyc, EClass-51en-1,
  Snomed-1, Snomed-2, and Snomed as test cases. Here, as shown in Fig. 5 and 7, in
  a parallel setting with 2 threads, the CPU time is divided by 2 compared to the se-
  quential case. The speedup, shown in Fig. 6 and 8, is linear and is consistent for our
  benchmark ontologies even when the size of the ontologies increases.
      Overall, the overhead is mostly determined by the quality of the told subsumers and
  disjoints information, the imposed order of traversal within a partitioning, the division
  of the ordered concept list into partitions, and the number of corrections which have
  been taken place (varied between 0.5% and 3% of the ontology size; depends on the
                                                     SS     PP2,5
                                                               2,5      PP4,25
                                                                           4,25                                             SS      PP2,5       PP4,25
                                                                                                                                       2,5         4,25
                   100,000
                    100,000


                             seconds)
                         (in seconds)
                                                                                                                  20
                                                                                                                   20
                               10,000
                                10,000


                                                                                                      factor
                                                                                              Speedup factor
                                                                                                                  15
                                                                                                                   15

                 Runtime (in
                                                      SS     PP2,5
                                                               2,5      PP4,25
                                                                          4,25
                                                                                                                             SS      PP2,5          PP4,25


                                                                                              Speedup
                                                                                                                                       2,5            4,25
                 Runtime  1,000
                         100,000
                           1,000
                         100,000                                                                                  10
                                                                                                                   10
Runtime (in seconds)
                                                                                                                   20
                                                                                                                   20
                                                                                                                   55
                                        10,000
                                        10,000
                                          100
                                           100


                                                                                                Speedup factor
                                                          cyc
                                                           cyc               eclass-51en-1                         15
                                                                                                                   15
                                                                                                                   00
                                                                              eclass-51en-1
                                                                                                                  cyc
                                                                                                                   cyc           eclass-51en-1
                                                                                                                                  eclass-51en-1
         1,000                                                                                             10
                                                                                                           10
 Fig.5.5.1,000
Fig.      Runtimes
           Runtimesfor
                    forcyc
                        cycand
                            andeclass-51en-1
                                eclass-51en-1us-  us-
ing
 ing 3 settings:SS(sequential),
    3  settings:   (sequential),PP2,5                                                              Fig.
                                                                                                   Fig.6.6.Speedup
                                                                                                           Speedupfor
                                                                                                                    forontologies
                                                                                                                        ontologiesfrom
                                                                                                                                   fromFig.
                                                                                                                                        Fig.5.5.
                                   2,5,,PP4,25
                                           4,25..
                                                                                                            55
                                          100
                                          100
                                                           cyc
                                                           cyc               eclass-51en-1
                                                                             eclass-51en-1                          00
                                                                                                                               cyc
                                                                                                                               cyc                     eclass-51en-1
                                                                                                                                                       eclass-51en-1
  Fig.5.
 Fig.
 Fig.   5.Runtimes
       5.  Runtimesfor
          Runtimes     forcyc
                      for  cycand
                          cyc  andeclass-51en-1
                              and  eclass-51en-1us-
                                   eclass-51en-1       us-
                                                       us-
  ing333settings:
         settings:SSS(sequential),
                      (sequential),PPP2,5                                                              Fig.6.
                                                                                                      Fig.
                                                                                                      Fig. 6.6.Speedup
                                                                                                                Speedupfor
                                                                                                               Speedup  forontologies
                                                                                                                       for  ontologiesfrom
                                                                                                                           ontologies  fromFig.
                                                                                                                                      from  Fig.5.
                                                                                                                                           Fig. 5.5
 ing
 ing     settings:   (sequential),     2,5,,,P
                                      2,5    PP4,25
                                                4,25..
                                               4,25


                                                            SS   PP2,5
                                                                    2,5                                                                 SS      PP2,5
                                                                                                                                                   2,5
                                             77
                                          10
                                           10
                             seconds)
                         (in seconds)


                                                                                                                    22
                                             66
                                                                                                       factor
                                                                                               Speedup factor


                                          10
                                           10                                                                     1.5
                                                                                                                   1.5
                 Runtime (in


                                                            SS       PP2,5
                                                                       2,5
                                                                                                                                         SS         PP2,5
                                                                                               Speedup


                                                                                                                                                      2,5
                                                                                                                    11
                 Runtime


                                           105577
                                          10
                                           10
Runtime (in seconds)


                                                                                                                  0.522
                                                                                                                   0.5
                                           104466
                                                                                                 Speedup factor


                                          10
                                           10
                                                          -22      -11         dd                                 1.5
                                                                                                                  1.5
                                                                                                                   00
                                                       eedd- mmeedd-      o mmee                                                -22           -11             eedd
                                                   o
                                                ssnno
                                                     mm         o
                                                             ssnno
                                                                        n
                                                                       ss n o                                                eedd-         eedd-          oomm
                                                                                                                     11nnoomm          o mm             n
                                                                                                                                                       ss n
                                           1055
                                           10                                                                        ss             ssnno
  Fig.7.7.7.Runtimes
Fig.
 Fig.        Runtimesfor
            Runtimes         forsnomed
                           for      snomedusing
                                   snomed         using222settings:
                                                 using         settings:
                                                              settings:                                      0.5
  S (sequential),       P                                                                          Fig.
                                                                                                   Fig.   8.0.5
                                                                                                    Fig.8.8.  Speedupfor
                                                                                                            Speedup
                                                                                                             Speedup   forontologies
                                                                                                                      for  ontologiesfrom
                                                                                                                           ontologies  fromFig.
                                                                                                                                      from  Fig.7.7.
                                                                                                                                           Fig.   7
SS(sequential),
   (sequential),      P 2,5..
             1044 P2,5
                          2,5
             10
                               2               1           dd
                           dd--2           dd--1
                                                                                                                     00
                        mee             mee             mee
                                                      oom                2               1             dd
                     oom             oom          s n
                                                    n                dd--2           dd--1          mee
                 ssnn            ssnn             s
                                                                o
                                                                o mee
                                                                  m            oommee         s n
                                                                                                n oom
                                                            ssnn and partition
                                                                           ssnn               s
       structure of ontology as well as the number of threads                             size). In general,
   one7.
  Fig.
 Fig.   7.should try to
            Runtimes
           Runtimes   forinsert
                     for  snomed
                          snomednodes
                                  usingas22close
                                  using           as possible to their final order in the tree using a
                                            settings:
                                           settings:
 SStop  to bottomPP2,5
     (sequential),
    (sequential),   strategy.
                       ..                                 Fig. 8.
                                                         Fig.  8. Speedup
                                                                  Speedup for
                                                                          for ontologies
                                                                              ontologies from
                                                                                         from Fig.
                                                                                              Fig. 7.
                                                                                                   7.
                    2,5


           In Concurrent TBox Classifier no optimization techniques for classification have
       been implemented. For instance, there are well-known optimizations which can avoid
       subsumption tests or eliminate the bottom search for some DL languages or decrease
       the number of bottom searches in general. Of course, our system is not competitive at
       all compared to highly optimized DL reasoners or special-purpose reasoners designed
       to take advantage of the characteristics of the EL fragment (e.g., see [7]). In our case,
       we can easily classify ontologies that are outside of the EL fragment.
4    Conclusion
In this paper, we have shown an excellent scalable technique for concurrent OWL on-
tology classification. The explained architecture, which proposes lock-free algorithms
with limited synchronization, utilizes concurrency in a multi-core environment. The
experimental results show the effectiveness of our algorithms. We can say that this
work appears to be the first which documents significant performance improvements
in a multi-core environment using real-world benchmarks for ontologies of various DL
complexities.


References
 1. Aslani, M., Haarslev, V.: Parallel TBox classification in description logics - first experimental
    results. In: Proceedings of the 19th European Conference on Artificial Intelligence - ECAI
    2010, Lisbon, Portugal, Aug. 16-20, 2010. pp. 485–490 (2010)
 2. Baader, F., Franconi, E., Hollunder, B., Nebel, B., Profitlich, H.: An empirical analysis of
    optimization techniques for terminological representation systems or: Making KRIS get a
    move on. Applied Artificial Intelligence 4(2), 109–132 (1994)
 3. Bergmann, F., Quantz, J.: Parallelizing description logics. In: Proc. of 19th Ann. German
    Conf. on Artificial Intelligence. pp. 137–148. LNCS, Springer-Verlag (1995)
 4. Haarslev, V., Möller, R.: RACER system description. In: Proc. of the Int. Joint Conf. on
    Automated Reasoning, IJCAR’2001, June 18-23, 2001, Siena, Italy. pp. 701–705. LNCS
    (Jun 2001)
 5. Haarslev, V., Möller, R., Turhan, A.Y.: Exploiting pseudo models for TBox and ABox reason-
    ing in expressive description logics. In: Proc. of the Int. Joint Conf. on Automated Reasoning,
    IJCAR’2001, June 18-23, Siena, Italy. pp. 61–75 (2001)
 6. Hogan, A., Pan, J., Polleres, A., Decker, S.: SAOR: template rule optimisations for dis-
    tributed reasoning over 1 billion linked data triples. In: Proc. 9th Int. Semantic Web Conf.
    pp. 337–353 (2010)
 7. Kazakov, Y., Krötzsch, M., Simancik, F.: Concurrent classification of EL ontologies. In:
    Proc. of the 10th Int. Semantic Web Conf. pp. 305–320 (2011)
 8. Kotoulas, S., Oren, E., van Harmelen, F.: Mind the data skew: distributed inferencing by
    speeddating in elastic regions. In: Proc. 19th Int. Conf. on World Wide Web. pp. 531–540
    (2010)
 9. Liebig, T., Müller, F.: Parallelizing tableaux-based description logic reasoning. In: Proc. of
    3rd Int. Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS ’07), Vil-
    amoura, Portugal, Nov 27. LNCS, vol. 4806, pp. 1135–1144. Springer-Verlag (2007)
10. Mutharaju, R., Maier, F., Hitzler, P.: A MapReduce algorithm for EL+. In: Proc. 23rd Int.
    Workshop on Description Logics. pp. 464–474 (2010)
11. Schlicht, A., Stuckenschmidt, H.: Distributed resolution for expressive ontology networks.
    In: Web Reasoning and Rule Systems, 3rd Int. Conf. (RR 2009), Chantilly, VA, USA, Oct.
    25-26, 2009. pp. 87–101 (2009)
12. Urbani, J., Kotoulas, S., Maassen, J., van Harmelen, F., Bal, H.: WebPIE: a webscale parallel
    inference engine using mapreduce. In: J. of Web Semantics (2011)
13. Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F.: Scalable distributed reasoning using
    MapReduce. In: International Semantic Web Conference. pp. 634–649 (2009)
14. Weaver, J., Hendler, J.: Parallel materialization of the finite RDFS closure for hundreds of
    millions of triples. In: Proc. 8th Int. Semantic Web Conf. pp. 87–101 (2009)

</pre>