=Paper=
{{Paper
|id=None
|storemode=property
|title=Concurrent Classification of OWL Ontologies - An Empirical Evaluation
|pdfUrl=https://ceur-ws.org/Vol-846/paper_31.pdf
|volume=Vol-846
|dblpUrl=https://dblp.org/rec/conf/dlog/AslaniH12
}}
==Concurrent Classification of OWL Ontologies - An Empirical Evaluation==
Concurrent Classification of OWL Ontologies – An Empirical Evaluation Mina Aslani and Volker Haarslev Concordia University, Montreal, Quebec, Canada Abstract. This paper describes our progress in developing algorithms for con- current classification of OWL ontologies. We refactored the architecture of our re- search prototype and its employed algorithms by integrating lock-free data struc- tures and adopting various optimizations to reduce overhead. In comparison to our earlier work we increased the size of classified ontologies by one order of magnitude, i.e., the size of processed ontologies is now beyond a quarter mil- lion of OWL classes. The main focus of this paper is an empirical evaluation with huge ontologies that demonstrates an excellent speedup that almost directly corresponds to the number of used processors or cores. 1 Introduction Parallel algorithms for description logic (DL) reasoning were first explored in the FLEX system [3] where various distributed message-passing schemes for rule execution were evaluated. The reported results seemed to be promising but the research suffered from severe limitations due to the hardware available for experiments at that time. The only other work on parallelizing DL reasoning [9] reported promising results using multi- core and multi-processor hardware, where the parallel treatment of disjunctions and individual merging (due to number restrictions) is explored. In [11] an approach on distributed reasoning for ALCHIQ is presented that is based on resolution techniques but does not address optimizations for TBox classification. Other work has studied concurrency in light-weight ontology languages. There is a distributed Map Reduce approach algorithm for EL+, however no experiments had been reported on the proposed algorithms [10]. Other work focuses on distributed reasoning, and these approaches are different than ours as they manage large-scale data which is beyond the memory of a single machine [11, 14, 6, 8, 12]. There also exists work on parallel distributed RDF inferencing (e.g., [13]) and parallel reasoning in first-order theorem proving but due to completely different proof techniques (resolution versus tableaux) and reasoning architectures this is not considered as relevant here. Another work presents an optimized consequence-based procedure for classification of ontolo- gies but it only addresses the DL EL [7]. The work in this paper is an extension of our work on Parallel TBox classification [1]. Compared to our previous work, this paper reports on an enhanced lock-free ver- sion of algorithms utilizing concurrency in a multi-core environment, optimizations that increase the performance, a performance evaluation with huge real-world ontologies in the range of 300K OWL classes (DL concepts) such as SNOMED. Our prototype not only addresses huge real-world ontologies but also does not compromise on DL com- plexity. It can process much more complex DLs (e.g., at least SHIQ) than EL, and provides an excellent speedup considering that no particular DL related optimization technique is used. The implemented prototype system performs concurrent TBox clas- sification based on various parameters such as number of threads, size of partitions assigned to threads, and number of processors. Our evaluation demonstrates impressive performance improvements where the number of available processors almost linearly decreases the processing time due to a small overhead. It is important to note that the focus of this research is on exploring algorithms for concurrent TBox classification and not on developing a highly optimized DL reasoner. We are currently only interested in the speedup factor obtained from comparing sequential and parallel runs of our proto- type. 2 The Concurrent TBox Classifier This section describes the architecture of the implemented system and its underlying sound and complete algorithm for concurrent classification of DL ontologies. To com- pute the hierarchy in parallel, we developed a Java application using a multi-threaded architecture providing control parameters such as number of threads, number of con- cepts (also called partition size) to be inserted per thread, and number of processors. As thoroughly explained in [1], the program reads an input file containing a list of concept names to be classified and information about them which is generated by the OWL rea- soner Racer [4]. Racer is only used for generating the input files for our prototype. The per-concept information available in the file includes the concept name, its parents (in the complete taxonomy), so-called told subsumers and disjoints, and pseudo model [5] information. This architecture was deliberately designed to facilitate our experiments by using existing OWL reasoners to generate auxiliary information and to make the Concurrent TBox Classifier independent of particular DLs. The preprocessing algorithm uses a topological sorting similar to [1] and the order for processing concepts is based on the topologically sorted list. To manage concur- rency and multi-threading in our system, as described in [1], a single-shared global tree approach is used. Also, to classify the TBox, two symmetric tasks are employed, i.e., the so-called enhanced tree traversal method [2] using top (bottom) search to compute the parents (children) of a concept to be inserted into the taxonomy. In [1], we first introduced our algorithms for parallel classification and reported considerable performance improvements but we could only process relatively small ontologies. In this paper, we introduce the enhanced concurrent version of these algo- rithms, i.e., Algorithms 2, 6 and 7. In order to make the paper self-contained we repeat Algorithms 1, 3, 4 and 5 from [1]. The procedure parallel tbox classification is sketched in Algorithm 1. It is called with a list of named concepts and sorts them in topological order with respect to the initial taxonomy created from already known told ancestors and descendants of each concept (using the told subsumer information). The classifier assigns in a round-robin manner partitions with a fixed size from the concept list to idle threads and activates these threads with their assigned partition using the procedure insert partition outlined in Algorithm 2. All threads work in parallel with the goal to construct a global subsump- tion tree (taxonomy). They also share a global array located concepts indexed by thread Algorithm 1 parallel tbox classification(concept list) topological order list ← topological order(concept list) repeat wait until an idle thread ti becomes available select a partition pi from topological order list run thread ti with insert partition(pi , ti ) until all concepts in topological order list are inserted identifications. Using the Concurrency package in Java, synchronization on the nodes of the global tree as well as the entries in the global array have now been eliminated. The procedure insert partition inserts all concepts of a given partition into the global taxonomy. We use Concurrent collections from the java.util.concurrent package. This package supplies Collection implementations which are thread-safe and designed for use in multi-threaded contexts. Therefore, for updating a concept or its parents or chil- dren, no locking mechanism for the affected nodes of the global tree is needed anymore. Algorithm 2 first performs for each concept new the top-search phase (starting from the top concept (>)) and possibly repeats the top-search phase for new if other threads up- dated the list of children of its parents. Then, it sets the parents of new. Afterwards the bottom-search phase (starting from the bottom concept (⊥)) is performed. Analogously to the top-search phase, the bottom search is possibly repeated and sets the children of new. After finishing the top and bottom search for new, the node new is added to the entries in located concepts of all other busy threads; it is also checked whether other threads updated the entry in located concepts for this thread. If this was the case, the top and/or bottom search need to be repeated correspondingly. To reduce overhead in re-running of top or bottom search, we only re-run twice. If the concept new is still not ready to be inserted; e.g., there is any interaction between new and a concept in located concepts; it will be added to the partition list of concepts (to be located later), and also eliminated from the other busy threads’ located concepts list, otherwise, new can be inserted into the taxonomy using Algorithm 7. In order to avoid unnecessary tree traversals and tableau subsumption tests when computing the subsumption hierarchy, the parallel classifier adapted the enhanced traversal method [2], which is an algorithm that was designed for sequential execution. Algorithms 3 and 41 outline the traversal procedures for the top-search phase. The possible incompleteness caused by parallel classification [1] can be character- ized by the following two scenarios: Scenario I: In top search, as the new concept is pushed downward, right after the children of the current concept have been processed, at least one new child is added by another thread. In this scenario, the top search for the concept new is not aware of the recent change and this might cause missing sub- sumptions if there is any interaction between the concept new and the added children. The same might happen in bottom search if the bottom search for the concept new is not informed of the recent change to the list of parents of the current node. Scenario II: Between the time that top search has been started to find the location of the concept new in the taxonomy and the time that its location has been decided, another thread has 1 Algorithm found in ancestors(current,new) checks if current is an ancestor of new. Algorithm 2 insert partition(partition,id) for all new ∈ partition do rerun ← 0 finish rerun ← false parents ← top search(new,>) while ¬ consistent in top search(parents,new) do parents ← top search(new,>) predecessors(new) ← parents children ← bottom search(new,⊥) while ¬ consistent in bottom search(children,new) do children ← bottom search(new,⊥) successors(new) ← children for all busy threads ti 6= id do located concepts(ti ) ← located concepts(ti ) ∪ {new } check ← check if concept has interaction(new , located concepts(id )) while (check 6= 0) and ¬finish rerun do if rerun < 3 then if check = 1 then new predecessors ← top search(new,>) rerun ← rerun + 1 predecessors(new) ← new predecessors if check = 2 then new successors ← bottom search(new,⊥) rerun ← rerun + 1 successors(new) ← new successors check ← check if concept has interaction(new , located concepts(id )) else finish rerun ← true for all busy threads ti 6= id do located concepts(ti ) ← located concepts(ti ) \ {new } if ¬finish rerun then insert concept in tbox(new, predecessors(new), successors(new)) placed at least one concept into the hierarchy which the concept new has an interaction with. Again, this might cause missing subsumptions and is analogously also applicable to bottom search. Both scenarios are properly addressed in Algorithm 2 to ensure completeness. Every time a thread locates a concept in the taxonomy, it notifies the other threads by adding this concept name to their “located concepts” list. Therefore, as soon as a thread finds the parents and children of the concept new by running top search and bottom search; it checks if there is any interaction between concept new and the concepts located in the “located concepts” list. Based on the interaction, top search or bottom search needs to be repeated accordingly. If no possible situations for incompleteness are discovered anymore, Algorithm 7 is called. To resolve the possible incompleteness we utilize Algo- Algorithm 3 top search(new,current) mark(current,‘visited’) pos-succ ← ∅ captured successors(new)(current) ← successors(current) for all y ∈ successors(current) do if enhanced top subs(y,new) then pos-succ ← pos-succ ∪ {y} if pos-succ = ∅ then return {current} else result ← ∅ for all y ∈ pos-succ do if y not marked as ‘visited’ then result ← result ∪ top search(new,y) return result Algorithm 4 enhanced top subs(current,new) if current marked as ‘positive’ then return true else if current marked as ‘negative’ then return false else if for all z ∈ predecessors(current) enhanced top subs(z,new) and found in ancestors(current,new) then mark(current,‘positive’) return true else mark(current,‘negative’) return false rithms 5 and 6.2 The procedure consistent in bottom search is not shown here because it mirrors consistent in top search. 3 Evaluation In the previous section, we explained the algorithms used in our Concurrent TBox Clas- sifier. In this section, we study the scalability and performance of our prototype. Here, we would like to explain the behavior of our system when we run it in a (i) sequential or (ii) parallel multi-processor environment. We also describe how the prototype performs when we have huge real-world ontologies with different DL complexities. Therefore, in the remaining of this section, we report on the conducted experiments. We first provide a description of the used platform and the implemented prototype, then we describe the test cases used to evaluate Concurrent TBox Classifier and provide 2 Algorithm interaction possible(new,concept) uses pseudo model merging [5] to decide whether a subsumption is possible between new and concept. Algorithm 5 consistent in top search(parents,new) for all pred ∈ parents do if successors(pred) 6= captured successors(new)(pred) then diff ← successors(pred) \ captured successors(new)(pred) for all child ∈ diff do if found in ancestors(child,new) then return false return true Algorithm 6 check if concept has interaction(new,located concepts) The return value indicates whether and what type of re-run needs to be done: 0 : No re-run in needed 1 : Re-run TopSearch because a possible parent could have been overlooked 2 : Re-run BottomSearch because a possible child could have been overlooked if located concepts = ∅ then return 0 else for all concept ∈ located concepts do if interaction possible(new,concept) then if found in ancestors(new,concept) then return 2 else return 1 else if interaction possible(concept,new) then if found in ancestors(new,concept) then return 2 else return 1 return 0 an overview of the parameters used in the experiments. Finally, we show the results and discuss the performance of the classifier. In addition, the measured runtimes in the figures are shown in seconds using a logarithmic scale. Platform and implementation All the experiments were conducted on a high perfor- mance parallel computing cluster. The nodes in the cluster run an HP-version of RedHat Enterprise Linux for 64 bit processors, with HP’s own XC cluster software stack. To evaluate our approach, Concurrent TBox Classifier has been implemented in Java using lock-free data structures from the java.util.concurrent package with minimal synchro- nization. Test cases Table 1 shows a collection of 9 mostly publicly available real-world on- tologies. Note that the chosen test cases exhibit different sizes, structure, and DL com- plexities. The benchmark ontologies are characterized by their name, size in number of named concepts or classes, and used DL. Parameters used in experiments The parameters used in our empirical evaluation and their meaning are described below (the default parameter value in shown in bold). Algorithm 7 insert concept in tbox(new,predecessors,successors) for all pred ∈ predecessors do successors(pred) ← successors(pred) ∪ {new} for all succ ∈ successors do predecessors(succ) ← predecessors(succ) ∪ {new} Table 1. Characteristics of the used test ontologies (e.g., LH denotes the DL allowing only conjunction and role hierarchies, and unfoldable TBoxes) Ontology DL language No. of named concepts Embassi-2 ALCHN 657 Galen1 ALCH 2,730 LargeTestOntology ELHR+ 5,584 Tambis-2a ELH 10,116 Cyc LHF 25,566 EClass-51En-1 LH 76,977 Snomed-2 ELH 182,869 Snomed-1 ELH 223,260 Snomed ELH 379,691 – Number of Threads: To measure the scalability of our system, we have performed our experiments using different numbers of threads (1, 2, 4). – Partition Size: The number of concepts (5, 25) that are assigned to every thread and are expected to be inserted by the corresponding thread. Similar to the number of threads, this parameter is also used to measure the scalability of our approach. – Number of Processors: For the presented benchmarks we always had 8 processors or cores3 available. Performance In order to test the effect of these parameters in our system, the bench- marks are run with different parameter values. The performance improvement is mea- sured using the speedup factor which is defined as Speedupp = TTp , where Speedupp is 1 the speedup factor, and – p is the number of threads. In the cluster environment we always had 8 cores avail- able and never used more than 8 threads in our experiments, so, each thread can be considered as mapped to one core exclusively; – T1 is the CPU time for the sequential run using only one thread and one single partition containing all concept names to be inserted; – Tp is the CPU time for the parallel run with p threads. Effect of changing only the number of threads To measure the performance of the classifier in this case we selected EClass-51En-1 as our test case and ran the tests with a fixed partition size (5 or 25) but a different number of threads (2 and 4), as shown in Fig. 1. In the following we use Pthreads,partition size to indicate a parallel multi-core setting where the subscripts give the number of cores available, the number of threads 3 For ease of presentation we use the terms core and processor as synonyms here. SS PP2,5 2,5 PP4,5 4,5 PP2,25 2,25 PP4,25 4,25 100,000 100,000 SS PP2,5 2,5 PP4,5 4,5 PP2,25 2,25 PP4,25 4,25 seconds) (in seconds) 20 20 factor Speedup factor 15 15 10,000 Runtime (in 10,000 Runtime Speedup 10 10 55 1,000 1,000 eclass-51en-1 eclass-51en-1 00 eclass-51en-1 eclass-51en-1 Fig.1.1. Fig. Fig. 1.Runtimes Runtimesfor Runtimes foreclass-51en-1 for eclass-51en-1using eclass-51en-1 using555set- using set- set- tings:SSS(sequential), tings: tings: (sequential),PP2,5 (sequential), P2,5, ,,PP4,5 2,5 P4,5, ,,PP2,25 4,5 P2,25, ,,PP4,25 2,25 P4,25 4,25 Fig.2.2. Fig. Fig. 2.Speedup Speedupfor Speedup foreclass-51en-1 for eclass-51en-1from eclass-51en-1 fromFig. from Fig.1.1. Fig. 1 (P (P (P threads,partitionsize threads,partition threads,partition ).).) size size created, and the partitions size used (from left to right). In the test cases P2,5 and P4,5 , we get an ideal speedup proportional to the number of threads, as shown in Fig. 2. As we can see, doubling the number of threads from S to P2,5 and to P4,5 , each time doubles the speedup, in other words, decreases the CPU time by the number of threads. This is the ideal speedup that we were expecting to happen. Comparing the SS test casesPPS, PP2,5 2,5 P , and P4,25 , we get an even better speedup, also 4,252,25 4,25 shown in Fig. 1 and 2. In this case, the CPU time decreases almost to 10 1 compared to 100,000 100,000 SS PP2,5 2,5 PP4,25 4,25 the sequential case (S). This speedup is due to a combination of the partition size as well as 10,000 the cache effect and results from the different memory hierarchies of a cluster with seconds) (in seconds) 10,000 modern computers. When we increase the number of threads to 4, the speedup is again factor Speedup factor proportional 1,000 to the number of threads and this is what2020 expected. Here, by doubling we 1,000 the number of threads, the speedup doubles. Runtime (in Speedup 100 Runtime 100 Effect of changing only partition sizes The performance 10 of the classifier in this case 10 for EClass-51En-1 10 10 is also shown in Fig. 1 with a fixed number of threads (2 or 4) but different partition sizes (5 or 25). When using 2 threads, compared to case S, we get 00 the ideal11speedup for Pyy2,5 , as shown c in Fig. 2. As we can isee, 22 doubling 11 yy athe a ycnumber c 11 of - 2 - 2 n sisi aalele the n11 l gg olospeedup,s a 2 a -s2- in y c y cc other - 1 -1 nn words, decreasesbaasthe e ss-i- aalelenn otolologg isi-s2-2 ccy 11eenn- - threads,bbaasdoubles gg nn t o t o bbi i 11e b gg CPU nnt timebb by half. -s-5 This is 5 s-s5-5 expecting to happen. eemm OO aamm alass to case S eemm case ewhich the ideal stsOtO is e tatamm what we l a l swere a Te T estst t t compared Again, c l c eecc e ee eTT rgrge size is increased to 25, it shows the same rgrge if the partition LLaa LLaaspeedup as shown in Fig. 2. In this case, the CPU time decreases almost to 15 compared to the previous case. InRuntimes the scenario Fig.3.3.Runtimes Fig. with 4 threads, forontologies for ontologiesusing we get a corresponding settings: Fig. using33settings: speedup, Fig.4.4.Speedup Speedup for as shown forontologies ontologies inFig. from from Fig.3.3.2. Fig. In S this case, (sequential), the P S (sequential), P2,5 CPU , P 2,5, P4,25 time 4,25. . decreases to almost 1 10 compared to the sequential case. This speedup again is due to a combination of the number of threads as well as the cache effect. When we increase the partition size to 25, the speedup is what we expected. Here, by multiplying the partition size by 5, the speedup is multiplied by five too. Increasing the partition size, means that more concepts are assigned to one thread; therefore, all the related concepts are inserted into the taxonomy by one thread. Hence, increasing the partition size, reduces the number of corrections. eclass-51en-1 eclass-51en-1 Fig.1.1.Runtimes Fig. Runtimesfor foreclass-51en-1 eclass-51en-1using using55set- set- tings: SS (sequential), tings: (sequential), PP2,5 2,5, , PP4,5 4,5, , PP2,25 2,25, , PP4,25 4,25 Fig.2.2.Speedup Fig. Speedupfor foreclass-51en-1 eclass-51en-1from fromFig. Fig.1.1. (Pthreads,partition (P size).). threads,partition size SS PP2,5 2,5 PP4,25 4,25 100,000 100,000 seconds) (in seconds) SS PP2,5 2,5 PP4,25 4,25 10,000 10,000 factor Speedup factor 1,000 20 20 1,000 Runtime (in Speedup 100 Runtime 100 10 10 10 10 11 00 22 11 yy aa ycc 11 -2-2 nn11 ggyy 2aa ccyycc enn-1-1 ssi-i- aalelenn otolologg bisis-2-2 ccy 11eenn-- s ssi i ggaalele totololo bbisis-2- 1e as a bb gg OOnnt aammb a bb a nn aamm -5-51 eemm s-5-5 eemm t ess OtO t t s lalas s T Teesst t tt clalass c eTTe eecc rgrge e ee a argrge LLaa LL Fig.3.3. Runtimes Runtimes for for ontologies ontologies using using 3 settings: settings: Fig. 4. Fig. 4. Speedup Speedup for for ontologies ontologies from from Fig. Fig. 3. 3 Fig. Fig. 3. Runtimes for ontologies using 33 settings: Fig. 4. Speedup for ontologies from Fig. 3. S (sequential), P2,5 , P4,25 SS(sequential), (sequential),PP2,5 2,5, ,PP4,25 4,25. . Effect of increasing both the number of threads and the partition size In this sce- nario, we measured the CPU time when increasing both the number of threads and the partition size. In Fig. 3 and 4, our test suite includes the ontologies Embassi-2, Galen1, LargeTestOntology, Tambis-2a, Cyc, and EClass-51En-1. The CPU time for each test case is shown in Fig. 3 and the speedup factor for each experiment is depicted in Fig. 4. As the results show, in the scenario with 2 threads and partition size 5, the speedup dou- bles compared to the sequential case and is around 2 and this is what we were expecting. When we increase the number of threads as well as the partition size, for the scenario with 4 threads and partition size 25, the CPU time decreases dramatically and therefore the speedup factor is above 20 for most test cases. This is more than a linear speedup, and it is the result of increasing the thread number as well as partition size together with the cache effect. The highest speedup factor is reported with test case Galen1. Experiment on very large ontologies We selected 3 Snomed variants as very large on- tologies with more than 150,000 concepts. Snomed-2 with 182,869 concepts, Snomed-1 with 223,260 concepts, and Snomed with 379,691 concepts were included in our tests. Fig. 7 shows an excellent improvement of CPU time for the parallel over the sequential case. In Fig. 8, the speedup factor is almost 2, which the expected behavior. The best speedup factor is observed for test case Snomed. Observation on the increase of size of ontologies We chose Cyc, EClass-51en-1, Snomed-1, Snomed-2, and Snomed as test cases. Here, as shown in Fig. 5 and 7, in a parallel setting with 2 threads, the CPU time is divided by 2 compared to the se- quential case. The speedup, shown in Fig. 6 and 8, is linear and is consistent for our benchmark ontologies even when the size of the ontologies increases. Overall, the overhead is mostly determined by the quality of the told subsumers and disjoints information, the imposed order of traversal within a partitioning, the division of the ordered concept list into partitions, and the number of corrections which have been taken place (varied between 0.5% and 3% of the ontology size; depends on the SS PP2,5 2,5 PP4,25 4,25 SS PP2,5 PP4,25 2,5 4,25 100,000 100,000 seconds) (in seconds) 20 20 10,000 10,000 factor Speedup factor 15 15 Runtime (in SS PP2,5 2,5 PP4,25 4,25 SS PP2,5 PP4,25 Speedup 2,5 4,25 Runtime 1,000 100,000 1,000 100,000 10 10 Runtime (in seconds) 20 20 55 10,000 10,000 100 100 Speedup factor cyc cyc eclass-51en-1 15 15 00 eclass-51en-1 cyc cyc eclass-51en-1 eclass-51en-1 1,000 10 10 Fig.5.5.1,000 Fig. Runtimes Runtimesfor forcyc cycand andeclass-51en-1 eclass-51en-1us- us- ing ing 3 settings:SS(sequential), 3 settings: (sequential),PP2,5 Fig. Fig.6.6.Speedup Speedupfor forontologies ontologiesfrom fromFig. Fig.5.5. 2,5,,PP4,25 4,25.. 55 100 100 cyc cyc eclass-51en-1 eclass-51en-1 00 cyc cyc eclass-51en-1 eclass-51en-1 Fig.5. Fig. Fig. 5.Runtimes 5. Runtimesfor Runtimes forcyc for cycand cyc andeclass-51en-1 and eclass-51en-1us- eclass-51en-1 us- us- ing333settings: settings:SSS(sequential), (sequential),PPP2,5 Fig.6. Fig. Fig. 6.6.Speedup Speedupfor Speedup forontologies for ontologiesfrom ontologies fromFig. from Fig.5. Fig. 5.5 ing ing settings: (sequential), 2,5,,,P 2,5 PP4,25 4,25.. 4,25 SS PP2,5 2,5 SS PP2,5 2,5 77 10 10 seconds) (in seconds) 22 66 factor Speedup factor 10 10 1.5 1.5 Runtime (in SS PP2,5 2,5 SS PP2,5 Speedup 2,5 11 Runtime 105577 10 10 Runtime (in seconds) 0.522 0.5 104466 Speedup factor 10 10 -22 -11 dd 1.5 1.5 00 eedd- mmeedd- o mmee -22 -11 eedd o ssnno mm o ssnno n ss n o eedd- eedd- oomm 11nnoomm o mm n ss n 1055 10 ss ssnno Fig.7.7.7.Runtimes Fig. Fig. Runtimesfor Runtimes forsnomed for snomedusing snomed using222settings: using settings: settings: 0.5 S (sequential), P Fig. Fig. 8.0.5 Fig.8.8. Speedupfor Speedup Speedup forontologies for ontologiesfrom ontologies fromFig. from Fig.7.7. Fig. 7 SS(sequential), (sequential), P 2,5.. 1044 P2,5 2,5 10 2 1 dd dd--2 dd--1 00 mee mee mee oom 2 1 dd oom oom s n n dd--2 dd--1 mee ssnn ssnn s o o mee m oommee s n n oom ssnn and partition ssnn s structure of ontology as well as the number of threads size). In general, one7. Fig. Fig. 7.should try to Runtimes Runtimes forinsert for snomed snomednodes usingas22close using as possible to their final order in the tree using a settings: settings: SStop to bottomPP2,5 (sequential), (sequential), strategy. .. Fig. 8. Fig. 8. Speedup Speedup for for ontologies ontologies from from Fig. Fig. 7. 7. 2,5 In Concurrent TBox Classifier no optimization techniques for classification have been implemented. For instance, there are well-known optimizations which can avoid subsumption tests or eliminate the bottom search for some DL languages or decrease the number of bottom searches in general. Of course, our system is not competitive at all compared to highly optimized DL reasoners or special-purpose reasoners designed to take advantage of the characteristics of the EL fragment (e.g., see [7]). In our case, we can easily classify ontologies that are outside of the EL fragment. 4 Conclusion In this paper, we have shown an excellent scalable technique for concurrent OWL on- tology classification. The explained architecture, which proposes lock-free algorithms with limited synchronization, utilizes concurrency in a multi-core environment. The experimental results show the effectiveness of our algorithms. We can say that this work appears to be the first which documents significant performance improvements in a multi-core environment using real-world benchmarks for ontologies of various DL complexities. References 1. Aslani, M., Haarslev, V.: Parallel TBox classification in description logics - first experimental results. In: Proceedings of the 19th European Conference on Artificial Intelligence - ECAI 2010, Lisbon, Portugal, Aug. 16-20, 2010. pp. 485–490 (2010) 2. Baader, F., Franconi, E., Hollunder, B., Nebel, B., Profitlich, H.: An empirical analysis of optimization techniques for terminological representation systems or: Making KRIS get a move on. Applied Artificial Intelligence 4(2), 109–132 (1994) 3. Bergmann, F., Quantz, J.: Parallelizing description logics. In: Proc. of 19th Ann. German Conf. on Artificial Intelligence. pp. 137–148. LNCS, Springer-Verlag (1995) 4. Haarslev, V., Möller, R.: RACER system description. In: Proc. of the Int. Joint Conf. on Automated Reasoning, IJCAR’2001, June 18-23, 2001, Siena, Italy. pp. 701–705. LNCS (Jun 2001) 5. Haarslev, V., Möller, R., Turhan, A.Y.: Exploiting pseudo models for TBox and ABox reason- ing in expressive description logics. In: Proc. of the Int. Joint Conf. on Automated Reasoning, IJCAR’2001, June 18-23, Siena, Italy. pp. 61–75 (2001) 6. Hogan, A., Pan, J., Polleres, A., Decker, S.: SAOR: template rule optimisations for dis- tributed reasoning over 1 billion linked data triples. In: Proc. 9th Int. Semantic Web Conf. pp. 337–353 (2010) 7. Kazakov, Y., Krötzsch, M., Simancik, F.: Concurrent classification of EL ontologies. In: Proc. of the 10th Int. Semantic Web Conf. pp. 305–320 (2011) 8. Kotoulas, S., Oren, E., van Harmelen, F.: Mind the data skew: distributed inferencing by speeddating in elastic regions. In: Proc. 19th Int. Conf. on World Wide Web. pp. 531–540 (2010) 9. Liebig, T., Müller, F.: Parallelizing tableaux-based description logic reasoning. In: Proc. of 3rd Int. Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS ’07), Vil- amoura, Portugal, Nov 27. LNCS, vol. 4806, pp. 1135–1144. Springer-Verlag (2007) 10. Mutharaju, R., Maier, F., Hitzler, P.: A MapReduce algorithm for EL+. In: Proc. 23rd Int. Workshop on Description Logics. pp. 464–474 (2010) 11. Schlicht, A., Stuckenschmidt, H.: Distributed resolution for expressive ontology networks. In: Web Reasoning and Rule Systems, 3rd Int. Conf. (RR 2009), Chantilly, VA, USA, Oct. 25-26, 2009. pp. 87–101 (2009) 12. Urbani, J., Kotoulas, S., Maassen, J., van Harmelen, F., Bal, H.: WebPIE: a webscale parallel inference engine using mapreduce. In: J. of Web Semantics (2011) 13. Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F.: Scalable distributed reasoning using MapReduce. In: International Semantic Web Conference. pp. 634–649 (2009) 14. Weaver, J., Hendler, J.: Parallel materialization of the finite RDFS closure for hundreds of millions of triples. In: Proc. 8th Int. Semantic Web Conf. pp. 87–101 (2009)