gOWL: A Fast Ontology-Mediated Query Answering Chenhong Meng1,5 , Xiaowang Zhang1,5,∗ , Guohui Xiao3 , Zhiyong Feng2,5 , and Guilin Qi4 1 School of Computer Science and Technology, Tianjin University, Tianjin 300350, China, 2 School of Computer Software,Tianjin University, Tianjin 300350, China 3 Faculty of Computer Science, Free University of Bozen-Bolzano, Bolzano I-39100, Italy 4 School of Computer Science and Engineering, Southeast University, Nanjing 211189, China 5 Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin 300350, China ∗ Corresponding author. Abstract. This poster shows an ontology-mediated query answering system (gOWL) based on pure materialization to avoid query rewriting online. gOWL shows that answering queries over a partial materialization of the chase is complete for a large fragment of practical queries with bounded depth. From a system engineer- ing perspective, the materialization approach allows us to design a modular ar- chitecture to integrate off-the-shelf efficient SPARQL query engines. The poster will be based on DBpedia and UOBM datasets. We will compare the performance of gOWL with PAGOdA, Ontop, and Pellet (with speedup up to three orders of magnitude). 1 Introduction OMQ (Ontology-mediated Query) is a core reasoning task in many applications [1]. OMQs have been studies intensively on lightweight ontology languages, that have canon- ical model properties. There are basically two approaches for query answering, namely, materialization-based and query rewriting-based. Materialization-based approaches nor- mally compute and materialize the chase first and then execute the queries over the ma- terialization, such as RDFox [7] and PAGOdA [8]. However, materialization is often infeasible when the chase contains infinitely existentially entailed elements. Instead, query writing-based approaches avoid materializing the chase but rewrite input queries by compiling the consequence of the reasoning into the query, such as QuOnto [2], Clipper [4] and Ontop [3]. Query rewriting comes at the cost of query rewriting and query evaluation at runtime, and the possibility of missing optimization opportunity at the data level. In this poster, we adopt a pure materialization-based approach which allows us to design a modular architecture to integrate off-the-shelf efficient SPARQL query en- gines. We implement the proposed approach in a prototype gOWL, and build an OMQ systems gOWL-3X by employing RDF-3X. The preliminary encouraging experiments on DBpedia and UOBM show that gOWL outperforms PAGOdA, Ontop, and Pellet. 2 Framework of gOWL The approach proposed in this Section has been implemented in the gOWL system 6 . The framework of gOWL contains three modules, namely, Query Processor, Normal- ized Model Constructor, and Query Execution shown in the following figure. gOWL Query Execution SPARQL Query SPARQL Query Processor Graph SPARQL Query Model SPARQL Engine Query Result Engine Normalized Selector API Selector (Centralized/ KB RDF Distributed) Model Dataset Constructor Fig. 1: The framework of gOWL Query Processor Query Processor is a module to compute the depth of a query by applying a depth-first-search method. We use dep(q) to denote the depth of q, which represents the length of maximal certain path in q. The nodes in the path must be starting of a quantified-free variable of q and all other nodes are quantified variables. Normalized Model Constructor In Normalized Model Constructor, we compute n- n 0 n step universal models UK (i.e., UK . . . UK ) for given a KB K = (T , A) a natural number 0 n. n represents the count of expanded steps from ABox. For example, UK means to 1 expand 0 step from ABox, which equivalent to itself; UK means to expand 1 step from 0 0 0 UK (i.e., ABox), the expansion condition is (T , UK ) |= ∃ R1 (a) but R1 (a, b) 6∈ UK , 0 n (n−1) for all b ∈ Ind(UK ); UK means to expand 1 step from UK (i.e., n-step from ABox), (n−1) (n−1) the expansion condition is (T , UK ) |= ∃ R1 (a) but R1 (a, b) 6∈ UK , for all (n−1) b ∈ Ind(UK ). Intuitively speaking, n-step universal models of a KB are hierarchical extensions of the ABox via the TBox. In fact, based on the statistical analysis of practical SPARQL queries, over 96% of queries contain at most 7 triple patterns (i.e.,7 triples in a SPARQL query) [5]. Therefore, we only consider n no more than 7 in this poster. Query Execution The module of Query Execution contains four parts, namely Model Selector, SPARQL API, Engine Selector, and SPARQL Query Engine. Model Selector i is used to select a suitable UK for a given query that i = dep can be satisfied. Through SPARQL API, the query and dataset are passed on Engine Selector which utilizes the information of them and the characteristics of each engine to recommend the suitable one. Finally we employ SPARQL Query Engine to return solutions. 6 https://github.com/liulovemeng/gOWL.git 3 Experiments and evaluations The experiments are carried out on a machine running Linux, which has 4 CPUs with 6 cores and 64GB memory. RDF-3X is used as the underlying SPARQL query engines. We utilized UOBM and DBpedia data as a standard of evaluation. We evaluate on a dataset (around 12 million triples) of DBpedia ontology in [8] and U7K is computed in 48 hours. The experimental results of 10 queries (Q1 ∼ Q10 ) over three engines (i.e., gOWL-3X, PAGOdA, Pellet) are shown in the Figure 2. In a same way, the evaluate on UOBM [8] over three engines (i.e., gOWL-3X, PAGOdA, Ontop) are shown in the Figure 3 and Figure 4 (Note that we only list experimental results in 24 hours and the ordinate represents the total online time of query answering). 108 gOWL-3X PAGOdA Pellet 107 106 105 Query time (in ms) 104 103 102 101 100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Fig. 2: Evaluations on DBpedia 107 gOWL-3X PAGOdA Ontop 106 105 Query time (in ms) 104 103 102 101 100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Fig. 3: Evaluations on UOBM100 We find that gOWL-3X significantly improve the performance of all 21 queries comparing to PAGOdA, Ontop and Pellet which represent materialization-based ap- proach, query rewriting-based approach and traditional reasoning machine, respectively. Since Ontop doesn’t have mapping information about DBpedia, it cannot handle DBpdiea dataset. From the figures we can see that gOWL-3X is several orders of magnitude higher than other methods, because it has the advantage of changing data processing to offline before the query coming. In addition, its performance remains good with data size increases (UOBM100 ∼ UOBM1000 ). Under the same conditions, PAGOdA couldn’t handle large-scale dataset effectively because its query execution level is de- pendent on RDFox, which finishes processing the data online, so the memory require- ments are much stronger. 107 gOWL-3X Ontop 106 105 Query time (in ms) 104 103 102 101 100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Fig. 4: Evaluations on UOBM1000 4 Conclusions In this poster, we have presented the gOWL system for ontology-mediated query an- swering. The approach take advantage of high-performance of off-on-shelf SPARQL query engines for supporting large-scale ontology query answering in an efficient and simple way. Based on DBpedia and UOBM datasets, gOWL outperforms existing en- gines significantly. Acknowledgments This work is supported by the National Natural Science Foundation of China (61502336), the National Key R&D Program of China (2016YFB1000603), the Key Technology R&D Program of Tianjin (16YFZCGX00210), and the Seed Foundation of Tianjin Uni- versity (2018XZC-0016). References 1. Bienvenu, M. (2016). Ontology-mediated query answering: Harnessing knowledge to get more from data. In Proc. of IJCAI’16, pp.4058–4061. 2. Botoeva, E., Calvanese, D., Santarelli, V., Fabio Savo, D., Solimando, A. and Xiao, G. (2016). Beyond OWL 2 QL in OBDA: Rewritings and approximations. In Proc. of AAAI’16, pp. 921–928. 3. Calvanese, D., Cogrel, B., Komla-Ebri, S., Kontchakov, R., Lanti, D., Rezk, M., Rodriguez- Muro, M., and Xiao, G. (2016). Ontop: Answering SPARQL queries over relational databases. Semantic Web J., 8(3):471–487. 4. Eiter, T., Ortiz, M., Simkus, M., Tran, T.K., and Xiao, G. (2012). Query rewriting for Horn- SHIQ plus rules. In Proc. of AAAI’12, pp.726–733. 5. Han, X., Feng, Z., Zhang, X., Wang, X., Rao, G., and Jiang, S. (2016). On the statistical analysis of practical SPARQL queries. In Proc. of WebDB’11, article 2, ACM. 6. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., and Zakharyaschev, M. (2010). The com- bined approach to query answering in DL-Lite. In Proc. of KR’10. AAAI Press. 7. Motik, B., Nenov, Y., Piro, R., Horrocks, I., and Olteanu, D. (2014). Parallel materialisa- tion of Datalog programs in centralised, main-memory RDF systems. In Proc. of AAAI’14, pp.129–137. 8. Zhou, Y., Grau, B., Nenov, Y., Kaminski, M., and Horrocks, I. (2015). PAGOdA: Pay-as-you- go ontology query answering using a datalog reasoner. J. Artif. Intell. Res., 54(1):309–367.