A Parallel Reasoner for the Description Logic ALC Kejia Wu and Volker Haarslev Concordia University, Montreal, Canada Abstract. Multi-processor/core systems have become ubiquitous but the vast majority of OWL reasoners can process ontologies only sequentially. This ob- servation motivates our work on the design and evaluation of Deslog, a parallel tableau-based description logic reasoner for ALC. A first empirical evaluation for TBox classification demonstrates that Deslog’s architecture supports a speedup factor that is linear to the number of utilized processors/cores. 1 Introduction The popularity of multi-processor/core computing facilities makes Description Logic (DL) reasoning feasible that scales w.r.t. the number of available cores.1 Modern many- core architectures together with operating systems implementing symmetric multitask- ing efficiently support thread-level parallelism (TLP), where smaller subtasks are imple- mented as threads that are concurrently executed on available cores. In order to utilize such hardware for making DL reasoning scalable to the number of available cores, one has to develop new reasoning architectures efficiently supporting concurrency. More- over, many well-known tableau optimization techniques need to be revised or adapted in order to be applicable in a parallel context. In this paper we present a new DL rea- soning framework that fully utilizes TLP on many-core systems. In principle, standard tableau algorithms are well suited for parallelization because tableau completion rules usually do not depend on a sequential execution but require shared access to common data structures. We consider the inherent non-determinism of DL tableaux as a feature of DL rea- soning that naturally leads to parallel algorithms which are suitable for a shared-memory setting. For instance, a source for such a non-determinism are disjunctions and qualified cardinality restrictions for logics containing at least ALCQ. Many standard DL reason- ing services, e.g. concept satisfiability testing, TBox classification, instance checking, ABox realization, etc. might be amenable to be implemented in a non-deterministic way supporting parallelism. However, the use of parallelism in DL reasoning has yet to be explored systematically. Although it is advantageous to seek scalable solutions by means of parallelism, little progress has been made in parallel DL reasoning dur- ing the last decade [6]. Most DL reasoning algorithms and optimization techniques have only been investigated in a sequential context, but how these methods should be accommodated to parallelism remains mostly open. Besides these algorithmic and the- oretical issues, the efficient implementation of a scalable parallel DL reasoner is also worth to be investigated. Many practical issues on shared-memory parallelism need to 1 For ease of presentation we consider the terms processor and core as synonyms. reasoning engine Deslog system service provider rule applier consistency u pre-processing satisfiability t subsumption 8 post-processing classification 9 .. .. infrastructure . . Fig.Fig. 1: The framework 1: The of Deslog framework of Deslog. be researched, e.g., selecting suitable parallel architectures,ltrldesignating 8 efficient data 0 structures and memory management, and exploring parallel algorithms. ⇤ ⇤ In the remaining sectionsoperator we introduce the design of Deslog, discussltrlrelated u work, 1 literal and present an evaluation demonstrating a mostly (super)linear R scalability⇤ of ⇤Deslog. A B 2 Architecture⇤ lef ⇤ right oft Deslog 2.1 Framework Fig. 2: Deslog data struc- Fig. 3: Deslog data structure for ture for a concept. the DL The shared-memory parallel reasoner Deslog consists expression of three layers: 8R.(A u B). (i) pre-processing layer, which converts the Web Ontology Language (OWL) representation of ontologies to internal data structures; (ii) reasoning engine layer, which performs the standard DL reasoning services and is composed of two key components, the service provider and the tableau rule applier; (iii) post-processing layer, which collects, caches, and saves reasoning results; (iv) infrastructure layer, which provides core components and utilities, such as structures representing concepts and roles, and the object copy tool. Figure 1 gives an overview of the framework. First, OWL ontology data is read into the pre-processing layer. Various typical pre- processing operations, such as transformation into negation normal form (NNF), axiom re-writing, and axiom absorption, are executed in this layer. The reasoner’s run-time options, such as service selection, maximum number of threads, and rule application order, are also set up on this layer. We implemented this layer by using the OWL API [14]. The pre-processed data is streamed to the reasoning engine. The reasoning engine performs primarily inference computation. The first key com- ponent of the reasoning engine is the service provider. As with popular DL reasoning systems, Deslog provides standard reasoning services, such as testing TBox consis- tency, concept satisfiability, etc. As we know, these services may depend on each other. In Deslog, the classification service depends on subsumption, and the latter depends on the satisfiability service. The service provider uses a suite of tableau-based algorithms to perform reasoning. The reasoner adopts tableaux as its primary reasoning method, which is complemented by another key component, the tableau expansion rule applier. The main function of this component is to execute tableau rules in some order to build expansion forests. In Deslog, tableau expansion rules are designed as configurable plug- ins, so what rule has to be applied in what application order can be specified flexibly. At present, Deslog implements the standard ALC tableau expansion rules [4]. All three layers mentioned above use the facilities provided by the infrastructure layer. All common purpose utilities are part of this layer. For example, the threads man- ager, global counters, and the globally unique identifier (GUID) generator. In addition, the key data structures representing DL elements and basic operations on them are pro- vided by this layer. 2.2 Key Data Structures In contrast to popular DL reasoning systems, Deslog aims to improve reasoning perfor- mance by employing parallel computing, while data structures employed by sequential DL reasoners are not always suitable for parallelism. Tree structures have been adopted by many tableau-based reasoners. However, a naive tree data structure introduces data races in a shared-memory parallel environment and are not well suited in a concurrency setting. Therefore, we need to devise more ef- ficient data structures in order to reduce the amount of shared data as much as possible. The new data structures facilitating concurrency must support DL tableaux. One important function of trees is to preserve non-deterministic branches generated during tableau expansion. Non-deterministic branches are mainly produced by the disjunction rule and the at-most number restriction rules. To separate non-deterministic branches into independent data vessels, which are suited to be processed in parallel, we adopt a list-based structure, called stage, to maintain a single non-deterministic branch, and a queue-based structure, called stage pool, to buffer all branches in a tableau. Every stage is composed of the essential elements of a DL ontology, concepts and roles. As with any DL reasoner, the representation of concepts and roles are fundamental design considerations. The core data structure of Deslog is a four-slot list representing a concept. A literal uniquely identifies a distinct concept. An operator indicates the dom- inant DL constructor applied to a concept. Available constructors cover intersection, disjunction, existential and value restriction, and so on, and this slot can also be empty. The remaining two slots hold pointers to extend nested concept definitions, namely left and right. Figure 2 illustrates a DL concept encoded with the Deslog protocol. subsumption subsumption 88 post-processing post-processing classification classification 99 .. . .. . . .. infrastructure infrastructure . .. Fig. Fig.1: 1: The The framework framework of of Deslog. Deslog. ltrl ltrl0 0 88 ⇤⇤ ⇤⇤ ltrl ltrl1 1 uu literal literal operator operator RR ⇤⇤ ⇤⇤ AA BB ⇤⇤lef leftt ⇤⇤right right Fig. Fig. 2: Fig.2: 2: Deslog Deslog data Deslog data struc- data struc- struc- Fig. 3: Fig.3: Fig. 3:Deslog Deslog data structure datastructure Deslogdata for structurefor for ture ture for for a a concept. concept. ture for a concept the the DL DL expression expression 8R.(A 8R.(A the DL expression ∀R.(A u B)uuB). B). Roles in Deslog are handled as a special type of concepts and have a similar struc- ture as concepts. For instance, the encoding of the DL expression ∀R.(A u B) is shown in Figure 3. Further properties needed for describing a role can be added to the generic structure, e.g., the number restriction quantity. A role data structure is also associated with a list recording instance pairs. With this design, DL concepts can be lined up seam- lessly. Instances (i.e., labels in tableau expansions) are lists holding their typing data, concepts. There are also helper facilities, such as a role pool and an instance pool, which are useful to accelerate the indexing of objects. A notable point on our encoding is how the complement of an atomic concept (i.e., concept name) is expressed (see Figures 4a and 4b). ⇠A ¬ A ⇤ A (a) Atomic (a) Atomic concept concept A A (b) Atomic (b) Atomic concept concept ¬A ¬A Fig. Fig. 4: Deslog data 4: Deslog data structure—atomic structure—atomic concepts. concepts A principle of Deslog’s design is to model objects and behaviours involved in DL reasoning 14 14 13 13 as pharmacogenomics independent pharmacogenomics complex economy economy complex abstractions as much as possible in order to facilitate concur- rent processing. 12 12 For instance, transportation transportation bfo bfo branches created during tableau expansion are encapsu- 11 mao mao lated into 10 standalone objects. Thus, a whole tableau expansion forest is designed as a list 11 10 yeast phenotype yeast phenotype factor Speedupfactor plant trait plant trait of branch99 88 objects. Tableau expansion rules and even some key optimization techniques Speedup are also7676designed as independent components. 55 44 33 22 11 11 22 44 66 88 10 10 12 12 14 14 16 16 18 18 20 20 22 22 24 24 26 26 28 28 30 30 32 32 34 34 36 36 Number of Number of threads threads Fig. 5: Speedup factor for pharmacogenomics complex, economy, transportation, bfo, mao, yeast phenotype, and plant trait. 2.3 Implementation As aforementioned, multi-processor computers are becoming the main stream, it is ex- pected that idle processors are utilized, and shared-memory TLP is quite suitable for this purpose. One significant aspect of this research is to investigate how well important DL rea- soning optimization techniques are suited to be implemented in a parallel reasoner and how they should be adapted if plausible. Deslog has adopted the following optimization techniques. 1. Lazy-unfolding This technique enables a reasoner to unfold a concept only when necessary [5]. 2. Axiom absorption Disjunctive branches introduced by naively internalizing TBox- axioms is one of the primary sources of reasoning inefficiency. With the axiom ab- sorption technique, a TBox is separated into two parts, the general TBox and the unfoldable TBox. Then, using internalization to process the general part and lazy- unfolding to process the unfoldable part can reduce reasoning time dramatically [15, 25]. 3. Semantic branching This DPLL style technique prunes disjunctive branches by avoiding to compute the same problem repeatedly [15]. Other primary optimization techniques, such as dependency directed backtracking [4, Chapter 9] [15] and model merging [12] are currently being implemented. It is no- ticeable that not all significant optimization techniques are suitable for concurrency. Some of them still depend on complex shared data and may significantly degrade the performance of a concurrent program. Based on these elemental techniques, we com- pleted a suite of standard TBox reasoning services. The current system implements a parallel ALC TBox classifier. It can concurrently classify an ALC terminology. The parallelized classification service of Deslog com- putes subsumptions in a brutal way [5]. It is obvious that the algorithm is sound and complete and has order of n2 time complexity in a sequential context. In order to figure out a terminology hierarchy, the algorithm calculates the subsumptions of all atomic concepts pairs. A subsumption relationship only depends on the involved concepts pair, and does not have any connections with the computation order. Therefore, the sub- sumptions can be computed in parallel, and soundness and completeness are retained in a concurrent context. A rather difficult issue in implementing a parallel DL reasoner is managing over- head. This issue is relatively easy for high level parallel reasoning, where multiple threads mainly execute reading operations on some shared data, thus, we implemented the parallel classification service first. Besides the high-level parallelized service, classification, low level parallelized pro- cessing is being developed. In the architecture of Deslog, the classification service uses subsumption, and subsumption uses satisfiability. The low level parallel reasoning fo- cuses on dealing with non-deterministic branches, which are represented as stages in Deslog. It might seems easy to process stages in parallel, but quite some effort is required to achieve a satisfying scalability via concurrency. The first noticeable fact is that from a root stage every stage may generate new ones. At present, our strategy is using one thread to process one stage. That means the stages buffer, the stage pool, is frequently accessed by multiple threads. That accessing includes both writing and reading shared data frequently. So, designing a high-performance stage buffer and efficient accessing schemes is an essential condition for the enabling a of scalable performance improve- ment. Otherwise, a parallel approach only causes overhead instead of performance im- provement. We are currently working on efficient low-level parallel reasoning. Although there are robust shared-memory concurrent libraries available, such as the C++ Boost.Thread library and the Java concurrent package, according to our experi- ence, using these libraries immoderately often degrades performance. Therefore, one needs to design sophisticated structures which better avoid shared data, or which do not access shared data frequently. 3 Evaluation Deslog is implemented in Java 6 in conformity with the aforementioned design. The parallelism of Deslog is based on a multi-threading model and aims at exploiting symmetric multiprocessing (SMP) supported by multi-processor computing facilities. The system is implemented in Java 6 for Java’s relatively mature parallel ecosystem.2 Specifically, the java.util.concurrent package of Java 6 is utilized. In the following we report on experiments to demonstrate that a shared-memory parallel tableau-based reasoner can achieve a scalable performance improvement. 3.1 Conducted Experiments The classification service of Deslog can be executed concurrently by multiple threads. We conducted a group of tests, and they show that Deslog has an obvious scalability. All tests were conducted on a 16-core computer running Solaris OS and Sun Java 6. Many of the test cases were chosen from OWL Reasoner Evaluation Workshop 2012 (ORE 2012) data sets. We manually changed the expressivity of some test cases to ALC so that Deslog could reason about them. Table 1 lists the metrics of the test cases. The results are shown in Figures 5-7. 3.2 Discussion The experimental results collected from testing ontologies show a scalable performance improvement. The tests on more difficult ontologies demonstrate a better scalability due to reduced overhead. Because the computing time of the single thread configuration, T1 , is rather small for some ontologies, often smaller than ten seconds, the overhead introduced by main- taining multiple threads can limit the scalability. For some of these ontology tests, the reasoning times are reduced to several milliseconds, i.e., the whole work load assigned to a single thread is around several milliseconds in these settings. According to our em- pirical results, a small work load w.r.t. the overhead, which is produced by manipulating 2 All components and sources of the system are available at http://code.google.com/p/deslog/. Table 1: Characteristics of the used test onologies Ontology DL expressivity Concept count Axiom count bfo ALC 36 45 pharmacogenomics complex ALC 145 259 ⇠ A ¬ economy ALCH(D) 339 563 A transportation ALCH(D) 445 489 ⇤ ¬ A ⇠A mao ALE+ 167 167 yeast phenotype A AL 281 276 ⇤ loggerhead nesting ALE A311 347 spider anatomy ALE 454 607 pathway (a) Atomic concept A ALE 646(b) Atomic concept 767 ¬A amphibian anatomy ALE+ 703 696 flybase vocab Fig. 4: Deslog ALE+ data structure—atomic 718 concepts. 726 (a) Atomic concept A (b) Atomic concept ¬A tick anatomy ALE+ 631 947 plant trait 976 ALE data structure—atomic Fig. 4: Deslog 1140 concepts. evoc AL 1001 990 protein ALE+ 1055 1053 14 pharmacogenomics complex 13 economy transportation 12 bfo 11 mao 10 yeast phenotype 14 pharmacogenomics complex Speedup factor Speedup factor plant trait economy 139 transportation 128 bfo 117 mao 106 yeast phenotype 95 plant trait 84 73 62 51 1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 4 Number of threads 3 2 Fig. 5: Speedup 1 2 factor 4 6 for 8 pharmacogenomics 10 12 14 1 16 18 20 complex, 22 24 26 28economy, 30 32 34 transportation, 36 bfo, mao, yeast phenotype, and plant trait. Number of threads Fig. 5:Fig. Speedup factor factor 5: Speedup for pharmacogenomics complex, for pharmacogenomics economy, complex, transportation, economy, bfo, bfo, transportation, mao, yeast mao, phenotype, and plant yeast phenotype, and trait plant trait. 20 spider anatomy pathway 18 amphibian anatomy tick anatomy 16 loggerhead nesting 14 spiderprotein anatomy Speedup factor Speedup factor 20 flybase vocab pathway 12 18 amphibianevocanatomy 10 tick anatomy 16 loggerhead nesting 148 protein flybase vocab 126 evoc 104 82 1 1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 6 Number of threads 4 Fig. 6:Fig. Speedup1 factor factor 6: Speedup1 2 for4 spider anatomy, 6 for8 spider 14 pathway, 12 anatomy, 20 amphibian 18 pathway, 28anatomy, 24 amphibian 34tick 36 anatomy, anatomy, tick anatomy, 2 10 16 22 26 30 32 loggerhead nesting, loggerhead protein, nesting, flybaseflybase protein, vocab, and evoc vocab, Number of threads and evoc. Fig. 6: Speedup factor for spider anatomy, pathway, amphibian anatomy, tick anatomy, loggerhead nesting, protein, flybase vocab, and evoc. pharmacogenomics complex economy transportation bfo Standard deviation in milliseconds mao yeast phenotype 100 plant trait spider anatomy pathway amphibian anatomy tick anatomy loggerhead nesting protein flybase vocab evoc 50 0 1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 Number of threads Fig. 7: The standard deviations of thread runtimes. runtimes threads as well as accessing shared data, counteracts the benefits gained from parallel processing and the reasoning performance starts to degrade. When testing ontologies that are big enough, the observed scalability is linear, sometimes even superlinear. These bigger ontologies need longer single thread comput- ing times (T1 ). The overhead introduced by maintaining a tolerable number of multiple threads is very small and becomes insignificant. A tolerable number, Ni , should always be smaller than or equal to the total number of available processors. In our conducted tests due to the available hardware we have Ni ∈ [1, 16] but in order to stress multi- threading we conducted all tests with up to 32 threads. In some cases, even though the number of threads exceeds 16, the reasoning performance keeps stable in a rather long run. This clearly supports our hypothesis that a further scalability improvement could be achieved by adding more processors, and we will verify this hypothesis when better experimental hardware becomes available. Figure 7 shows the standard deviation of thread runtimes measured in the series of tests (in the unit of milliseconds). Overall, the deviations are limited to an acceptable range, i.e., below 140 milliseconds, which is relatively insignificant w.r.t. system over- head. This implies that the work load is well balanced among threads. That is to say, all threads are as much busy as possible. For the most part, when the number of threads is smaller than the tolerable number, 16, deviations are normally close to 0. When threads are added beyond 16, deviations become greater. This is because some processors ex- ecute more than one thread, and hereby the thread contexts switching produces a lot of overhead. In our original implementation, we had distributed all subsumption candi- dates into independent lists, every of which mapped to a thread, but the deviations were sometimes too large. So, the Deslog classification uses now a shared queue to buffer all subsumption candidates, in order keep all threads busy. We had conducted similar experiments on a high-performance computing cluster, and the results were rather disappointing. The speedup factor gained on the cluster was generally below 3 although we assigned at least 16 processors for each test. The most plausible explanation we can give is that the complex hardware and software environ- ment of the cluster degrades the performance of Deslog. The cluster consists of three types of computing nodes with respect to the built-in processors: 4-core, 8-core, and 16-core. And the same type of computers may have heterogeneous architectures. A job is scheduled on one or more computers randomly. It is normal that a job is assigned to more than one computer, and the communication between computers results in a bot- tleneck. Another possible reason is that the cluster does not guarantee exclusive usage, which means it is possible that more than one job is running on the same computer at the same time. Deslog can only deal with ALC ontologies at present, and we expect that more inter- esting results will be obtained by implementing more powerful tableau expansion rules (e.g., see [16]). We already investigated the feasibility of several tableau optimization techniques for a concurrency setting, but most of them are not yet fully implemented or tested. 4 Related Work Our research investigates the potential contribution of concurrent computing to tableau- based DL reasoning. Tableau-based DL reasoning has been extensively researched in sequential contexts. A large amount of literature is available for addressing sequential DL reasoning (see [4]). Only a few approaches investigated parallel DL reasoning so far. The work in [19] reports on a parallel SHN reasoner. This reasoner implemented parallel processing of disjunctions and at-most cardinality restrictions, as well as some DL tableau optimization techniques. The experimental results show noticeable per- formance improvement in comparison with sequential reasoners. This work mainly showed the feasibility of parallelism for low level tableau-based reasoning, and did not mention high level reasoning tasks, such as classification. Besides utilizing non- determinism, this research also presented the potential of making use of and-parallelism, and we plan to follow up on this idea. The canonical top-search algorithm, as well as its dual bottom-search, can be exe- cuted in parallel, but extra work is needed to preserve completeness. Such an approach is presented in [1, 2] and the experimental results are very promising and demonstrate the feasibility of parallelized DL reasoning. In [17, 23] a consequence-based DL reasoning method was proposed, mainly deal- ing with Horn ontologies. Based on consequence-based reasoning and the results of [3], the work in [18] reports on a highly optimized reasoner that can classify EL ontologies concurrently. Two hypothesises on parallelized ontology reasoning were proposed in [6]: inde- pendent ontology modules and a parallel reasoning algorithm. Independent ontology modules strive for structuring ontologies as modules which naturally can be computed in parallel. The idea of partitioning ontologies into modules is supported by [11, 9, 10]. According to the second hypothesis, extensive research on parallelized logic program- ming does not contribute much to DL reasoning. Furthermore, some DL fragments, without disjunction and at-most cardinality restriction constructors, do not profit much from parallelizing non-deterministic branches in tableau expansion. The research mentioned above is focusing on a shared-memory multi-threading en- vironment. There is also quite some research proposing distributed solutions, and some of these ideas are also worth being tried in a shared-memory environment. In [21] the idea of applying a constraint programming solver, Mozart, was proposed to ALC tableau reasoning in parallel, and the implementation was reported on in [20]. The experimental results show scalability to some extent. Recently, some research work focuses on how DL reasoning can be applied to Re- source Description Framework (RDF) and OWL . This trend leads to research on rea- soning about massive web ontologies in a scalable way. MapReduce [22], ontology mapping [13], ontology partitioning [10], rule partitioning [24], distributed hash table (DHT) [8], swarm intelligence [7], etc., are examples of these approaches. However, we do not consider these approaches as relevant in our context. 5 Conclusion and Future Work DL has been successfully applied in many domains, amongst which utilizing knowledge on the Internet has become a research focus in recent years. Scalable solutions are required for processing an enormous amount of structured information spreading over the Internet. Meanwhile, multi-processor computing facilities are becoming the main stream. Thus, we consider research on parallelism in DL reasoning as necessary to utilize available processing power.. The objective of this research is to explore how parallelism plays a role in tableau- based DL reasoning. A number of tableau-based DL reasoning optimization techniques have been extensively researched, but most of them are investigated in sequential con- texts, so adapting these methods to the parallel context is an important part of this research. We have partially shown that shared-memory parallel tableau-based DL reasoning can contribute to scalable solutions. This paper introduced our reasoner, Deslog, of which the architecture is devised specially for a shared-memory parallel environment. We presented an aspect of the reasoner’s concurrency performance, and a good scala- bility could be demonstrated for TBox classification. In the near future, we will enhance Deslog so that it can reason about more expres- sive DL languages. More powerful tableau expansion rules will be added. Concurrency- suitable tableau optimization techniques will be implemented. Moreover, we plan to investigate non-determinism more closely, and to design corresponding parallel algo- rithms. References 1. Aslani, M., Haarslev, V.: Towards parallel classification of TBoxes. In: Proceedings of the 21st International Workshop on Description Logics (DL2008), Dresden, Germany. vol. 353 (2008) 2. Aslani, M., Haarslev, V.: Parallel TBox classification in description logics—first experimen- tal results. In: Proceeding of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence. pp. 485–490 (2010) 3. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: International Joint Conference on Artificial Intelligence. vol. 19, p. 364 (2005) 4. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.: The descrip- tion logic handbook: theory, implementation, and applications. Cambridge University Press (2003) 5. Baader, F., Hollunder, B., Nebel, B., Profitlich, H.J., Franconi, E.: An empirical analysis of optimization techniques for terminological representation systems. Applied Intelligence 4(2), 109–132 (1994) 6. Bock, J.: Parallel computation techniques for ontology reasoning. In: Proceedings of the 7th International Conference on The Semantic Web. pp. 901–906 (2008) 7. Dentler, K., Guéret, C., Schlobach, S.: Semantic web reasoning by swarm intelligence. In: The 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009). p. 1 (2009) 8. Fang, Q., Zhao, Y., Yang, G., Zheng, W.: Scalable distributed ontology reasoning using DHT- based partitioning. In: The semantic web: 3rd Asian Semantic Web Conference, ASWC 2008. pp. 91–105 (2008) 9. Grau, B.C., Horrocks, I., Kazakov, Y., Sattler, U.: A logical framework for modularity of ontologies. In: Proc. IJCAI. pp. 298–304 (2007) 10. Grau, B.C., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies: Theory and practice. Journal of Artificial Intelligence Research 31(1), 273–318 (2008) 11. Grau, B.C., Parsia, B., Sirin, E., Kalyanpur, A.: Modularizing OWL ontologies. In: K-CAP 2005 Workshop on Ontology Management (2005) 12. Haarslev, V., Möller, R., Turhan, A.Y.: Exploiting pseudo models for TBox and ABox rea- soning in expressive description logics. Automated Reasoning 2083/2001, 61–75 (2001) 13. Homola, M., Serafini, L.: Towards formal comparison of ontology linking, mapping and importing. In: 23rd International Workshop on Description Logics, DL2010. p. 291 (2010) 14. Horridge, M., Bechhofer, S.: The OWL API: A Java API for OWL ontologies. Semantic Web 2(1), 11–21 (2011) 15. Horrocks, I.: Optimising tableaux decision procedures for description logics. Ph.D. thesis, The University of Manchester (1997) 16. Horrocks, I., Sattler, U., Tobies, S.: Reasoning with individuals for the description logic SHIQ. In: Automated deduction-CADE-17: 17th International Conference on Automated Deduction, Pittsburgh, PA, USA, June 17-20, 2000: proceedings. vol. 17, p. 482 (2000) 17. Kazakov, Y.: Consequence-driven reasoning for horn SHIQ ontologies. In: Proceedings of the 21st international jont conference on Artifical intelligence. pp. 2040–2045 (2009) 18. Kazakov, Y., Krötzsch, M., Simančı́k, F.: Concurrent classification of EL ontologies. In: Proceedings of the 10th International Semantic Web Conference (2011) 19. Liebig, T., Müller, F.: Parallelizing tableaux-based description logic reasoning. In: Proceed- ings of the 2007 OTM Confederated International Conference on the Move to Meaningful Internet Systems-Volume Part II. pp. 1135–1144 (2007) 20. Meissner, A.: A simple parallel reasoning system for the ALC description logic. In: Com- putational Collective Intelligence: Semantic Web, Social Networks and Multiagent Sys- tems (First International Conference, ICCCI 2009, Wroclaw, Poland, 2009). pp. 413–424. Springer (2009) 21. Meissner, A., Brzykcy, G.: A parallel deduction for description logics with ALC language. Knowledge-Driven Computing 102, 149–164 (2008) 22. Mutharaju, R., Maier, F., Hitzler, P.: A MapReduce algorithm for EL+. In: 23rd International Workshop on Description Logics DL2010. p. 456 (2010) 23. Simančı́k, F., Kazakov, Y., Horrocks, I.: Consequence-based reasoning beyond Horn ontolo- gies. In: Proc. of the 22nd Int. Joint Conf. on Artificial Intelligence (IJCAI 2011) (2011) 24. Soma, R., Prasanna, V.K.: Parallel inferencing for OWL knowledge bases. In: Proceedings of the 2008 37th International Conference on Parallel Processing. pp. 75–82 (2008) 25. Wu, J., Haarslev, V.: Planning of axiom absorptions. In: In Proceedings of International Workshop on Description Logics 2008 (2008)