Towards Practicable Algorithms for Rewriting Graph Queries beyond DL-Lite (Extended Abstract) Bianca Löhnert1 , Nikolaus Augsten1 , Cem Okulmus2 and Magdalena Ortiz3 1 2 3 Paris-Lodron University of Salzburg, Austria Umeå University, Sweden TU Wien, Austria Abstract Despite the many advantages that ontology-based data access (OBDA) has brought to a range of applica- tion domains, state-of-the-art OBDA systems still do not support popular graph database management systems such as Neo4j. Algorithms for query rewriting focus on languages like conjunctive queries and their unions, which are fragments of first-order logic and were developed for relational data. Such query languages are poorly suited for querying graph data. Moreover, they also limit the expressiveness of the ontology languages that admit rewritings, restricting them to those where the data complexity of reasoning is not higher than it is in first-order logic. In this paper, we propose a technique for rewriting a family of navigational queries for a suitably restricted fragment of ℰℒℋℐ that extends DL-Lite and that is NL-complete in data complexity. We implemented a proof-of-concept prototype that rewrites into Cypher queries, and tested it on a real-world cognitive neuroscience use case with promising results. 1. Introduction The ontology-based data access (OBDA) paradigm has seen successful adoption and brought significant advantages to a range of applications [1], but so far it remains limited to relational database management systems (RDBMS). Recent years have seen huge adoption of graph databases and bringing the OBDA paradigm could open up a plethora of novel opportunities. The central problem in OBDA is ontology-mediated query answering (OMQA), where a query is to be evaluated over the consequences of the given dataset together with the knowledge in the ontology. The predominant technique for OMQA is query rewriting, where an input query is transformed to incorporate the ontological knowledge so that the rewritten query can be directly evaluated over any input dataset using standard technologies. Until now, this has meant that the target of query rewriting have been variants of conjunctive queries (CQs), expressible in SQL and supported in standard RDBMs [2], and the ontology languages have been limited to those whose inference problem is not harder in data complexity than SQL evaluation. The DL-Lite family of description logics (DLs), designed to fulfil this requirement, remains the ontology language of choice for OMQA [2]. But with graph query languages, breaking through this barrier is possible. Their navigational features allow queries to traverse paths of arbitrary length that comply with some regular path expression, which results in a higher data complexity than that of SQL. Standard graph query languages like conjunctive two-way regular path queries (C2RPQs), the most popular navigational extension of CQs, and GQL [3], a new DL 2024: 37th International Workshop on Description Logics, June 18–21, 2024, Bergen, Norway $ bianca.loehnert@plus.ac.at (B. Löhnert); nikolaus.augsten@plus.ac.at (N. Augsten); okulmus@cs.umu.se (C. Okulmus); magdalena.ortiz@tuwien.ac.at (M. Ortiz)  0000-0002-3036-6201 (N. Augsten); 0000-0002-7742-0439 (C. Okulmus); 0000-0002-2344-9658 (M. Ortiz) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings standard for extending C2RPQs, are NL complete in data complexity (under the so-called walk semantics). This means that we are not bound to DL-Lite and can attempt query rewriting for richer ontology languages, provided that they have NL data complexity. Contributions This extended abstract summarizes the following results, described in [4]. • We prove the existence of C2RPQs that cannot be rewritten into a union of C2RPQs (UC2RPQs) which captures all consequences from the ontology, even it the ontology is restricted to one axiom ∃𝑟 ⊑ ∃𝑠 for role names 𝑟, 𝑠. Motivated by this negative result, we define Navigational Conjunctive Queries (NCQs), a subset of C2RPQs (similar to [5, 6]) which admits such rewritings. • We propose such a rewriting for OMQs that pair NCQs with ontologies in a language that we call ℰℒℋℐ 𝑙𝑖𝑛 ⊥ , tailored to extend DL-Lite with some features of ℰℒℋℐ while keeping the data complexity of standard reasoning in NL. We first show how to construct a graph structure whose paths witness all the relevant entailments of the input ontology, and then use this graph to rewrite atomic queries into C2RPQs. Then we combine this with the ideas behind the Clipper rewriting [7] to obtain a rewriting of NCQ mediated by ℰℒℋℐ 𝑙𝑖𝑛 ⊥ ontologies into UC2RPQs. • We present a proof-of-concept prototype of our technique and use it to evaluate queries over data from a real-world use case in the domain of cognitive neuroscience. 2. Query Rewriting Algorithm for Navigational Queries For our algorithm we consider ℰℒℋℐ 𝑙𝑖𝑛 ⊥ TBoxes, that have axioms of the form (NF1) 𝐴 ⊑ 𝐵, (NF2) ∃𝑟.𝐴 ⊑ 𝐵, (NF3) 𝐴 ⊑ ∃𝑟.𝐵, (NF4) 𝑟 ⊑ 𝑠, (NF5) ∃𝑟− .⊤ ⊑ 𝐵, or (NF6) (︀ 𝐴 ⊑ ∃𝑟− .⊤. )︀ In Navigational Conjunctive Queries (NCQs), atoms take the form 𝐴1 ∪ · · · ∪ 𝐴𝑛 (𝑥) or (︀ )︀ (︀ )︀* 𝜋1 ∪ · · · ∪ 𝜋𝑛 (𝑥, 𝑦) or 𝜋1 ∪ · · · ∪ 𝜋𝑛 (𝑥, 𝑦), with 𝐴𝑖 a concept name and each 𝜋𝑖 a restricted regular path expression 𝜋𝑖 := 𝑟 | 𝑟− | 𝑟* | (𝑟− )* . As a first step towards our query rewriting algorithm, we introduce the so-called concept dependency graph for ℰℒℋℐ 𝑙𝑖𝑛 ⊥ TBoxes. Concept Dependency Graph As the name suggests the concept dependency graph (CDG) depicts the dependencies of concepts of a given TBox. The nodes of the CDG represent concepts and for axioms in form of (NF1), e.g., 𝐴 ⊑ 𝐵 we add an edge pointing from 𝐵 to 𝐴 with the empty label 𝜀. In case of an existential restriction on the left-hand side as in (NF2) or (NF5) we additionally label this edge with the role name. In order to make the construction of the CDG complete for TBox reasoning we extend it with additional edges and witnessing nodes for axioms with existential quantifiers on the right-hand side, i.e., (NF3) and (NF6). Having all these edges in place it is possible to check for TBox entailments. More precisely, it holds that 𝒯 |= ∃𝑟1 . . . ∃𝑟𝑘 .𝐴 ⊑ 𝐵 if and only if there is a path from 𝐵 to 𝐴 with a sequence of edge labels 𝑠1 , . . . , 𝑠𝑘 (possibly interleaved with 𝜀-labeled edges) such that 𝑠𝑖 ⊑*𝒯 𝑟𝑖 for 1 ≤ 𝑖 ≤ 𝑘. Note that, in particular, this means that 𝒯 |= 𝐴 ⊑ 𝐵 iff there is a path from 𝐵 to 𝐴 with 𝜀 labels only. We illustrate these properties of the CDG in the example below. For a formal definition of concept dependency graphs see [4]. Example 1. For the TBox 𝒯 = {𝐴2 ⊑ 𝐴1 , ∃𝑟.𝐵1 ⊑ 𝐴1 , ∃𝑟3 .𝐵1 ⊑ 𝐵3 , 𝐴3 ⊑ 𝐴2 , ∃𝑟1 .𝐵2 ⊑ 𝐵1 , ∃𝑟2− .⊤ ⊑ 𝐴3 , 𝑠 ⊑ 𝑟2 , ∃𝑟2 .𝐵3 ⊑ 𝐵2 , 𝐵1 ⊑ ∃𝑟2 .𝐵3 } the CDG 𝐺𝒯 is as in Figure 1. The presence of paths in the CDG corresponds to concept inclusions that hold in 𝒯 . We illustrate this on an ABox 𝒜 = {𝑟1 (𝑎1 , 𝑎2 ), 𝑟2 (𝑎2 , 𝑎3 ), 𝑟3 (𝑎3 , 𝑎4 ), 𝑟1 (𝑎4 , 𝑎5 ), 𝐵2 (𝑎5 )}. Since there is the path ⊤ Ω ⊤ 𝐵3 𝐵2 𝜀 𝐵1 𝑟 𝐴1 𝜀 𝐴2 𝜀 𝐴3 𝐵2 𝐵1 𝐴1 𝐴2 𝐴3 𝑟2− 𝑟1 𝑟1 𝜀 𝑟2 r 𝜀 𝜀 𝑟2 ∃𝑟2 𝑟2− 𝐵3 𝐵2 𝐵1 𝐴1 𝐴2 𝐴3 𝑟3 𝜀 𝜀 ∃𝑟2 . 𝑟3 𝐵3 ⊤ 𝐵3 start Figure 1: CDG from Example 1. Figure 2: NFA from Example 2. 𝐵1 𝑟1 𝐵2 𝑟2 𝐵3 𝑟3 𝐵1 𝑟1 𝐵2 in the CDG of 𝒯 it follows for 𝑎1 that 𝒯 , 𝒜 |= 𝐵1 (𝑎1 ). However, as we mentioned before, we can use the CDG to check for TBox entailments, i.e, from the path 𝐵1 𝑟1 𝐵2 𝑟2 𝐵3 𝑟3 𝐵1 𝑟1 𝐵2 we can infer that 𝒯 |= ∃𝑟1 ∃𝑟2 ∃𝑟3 ∃𝑟1 .𝐵2 ⊑ 𝐵1 . The construction of the CDG is complete, in the sense that for every ℰℒℋℐ 𝑙𝑖𝑛 ⊥ entailment from 𝒯 with a concept name on the right-hand side there exists a corresponding path in the CDG of 𝒯 . Extracting regular path expressions. Given a CDG 𝐺𝒯 and a concept 𝐵 in 𝒯 , we construct an NFA 𝐺A 𝒯 (𝐵) that accepts the propagating paths in 𝐺𝒯 which start at node 𝐵 and end in another concept while passing over only roles or 𝜀, as illustrated in Example 2. We then use an off-the-shelf technique to transform this NFA into a regular expression, and finally we eliminate all occurrences of 𝜀 from this regular expression. Example 2. For the CDG from Example 1, we can see its 𝐵1 -path-generating NFA in Figure 2. paths ending in 𝐵 1 paths ending in 𝐵2 paths ending in 𝐵3 (︀ ⏞ ⏟ ⏞ ⏟ ⏞ ⏟ The extracted RPE is: ((𝑟1 ∪ (𝑟1 𝑟2 𝑟3 )) 𝐵1 ) ∪ (𝑟1+ (𝑟2 𝑟3 𝑟1+ )* 𝐵2 ) ∪ (𝑟1+ 𝑟2 (𝑟3 𝑟1+ 𝑟2 )* 𝐵3 ) . * )︀ Rewriting Atomic (︀ Queries Into)︀Regular Path Expressions We replace each 𝐴𝑖 in a given input of the form 𝐴1 ∪ · · · ∪ 𝐴𝑛 (𝑥) with the regular expression described above. Then we replace each role 𝑟 in the regular path expression with a union over all roles 𝑠 such that 𝑠 ⊑*𝒯 𝑟. Rewriting NCQs into UC2RPQs Given a NCQ 𝑞(⃗𝑥) and an ℰℒℋℐ 𝑙𝑖𝑛 ⊥ TBox 𝒯 , we exhaus- tively apply the so-called clipping function, inspired by [7]. It relies on a well-known property of ℰℒℋℐ: for each 𝒯 and 𝒜 there is a universal, forest-shaped model ℐ 𝑢 that can be used for answering all C2RPQs. Intuitively, we consider each possible set 𝑌 of variables that may be mapped to the same ‘anonymous’ object 𝑑 of maximal depth in a tree structure of ℐ 𝑢 . We know that each such 𝑑 is introduced to satisfy an existential axiom 𝐴 ⊑ ∃𝑟.𝐵, that is, it has a parent 𝑑𝑝 that satisfies 𝐴, and 𝑑 is created as an 𝑟-child of 𝑑𝑝 that is 𝐵. Exploiting this, we can ‘remove’ from the query the parts that are guaranteed to be locally satisfied using 𝑑𝑝 ’s 𝑟-child 𝑑. This idea has been applied to variants of (U)CQs [7], where each application must only remove the query atoms that are locally satisfied. But in navigational queries we must consider also how path expressions can be partially satisfied, and modify the expressions in the atoms accordingly. E.g., given 𝑞(𝑥) = (𝑟* ∪ 𝑠* )(𝑥, 𝑦), 𝐵(𝑦) and 𝒯 = {𝐴 ⊑ ∃𝑟.𝐵}, a possible application of clipping returns 𝑞 ′ (𝑥) = 𝑟* (𝑥, 𝑦), 𝐴(𝑦). Note that 𝑟* ∪ 𝑠* is replaced by 𝑟* , reflecting the fact that an 𝑟* path to an 𝐴 can always be extended to an 𝑟* path to a 𝐵 in the models of 𝒯 , but this does not apply to the 𝑠* paths. We show in [4] that for every query mapping there is an application of Table 1 Experimental results of the algorithm. #∨ is the number of C2RPQs, #𝛼 the number of atoms and #C the number of distinct concepts in the output, ∅∪ is the average and max∪ the maximal number of expressions in unions inside the C2RPQs, 𝑡rew and 𝑡eval the times for rewriting and evaluation in seconds. We indicate the two ontologies by 𝑂1 and 𝑂2 . Timeout was 1000 seconds. 𝑂1 #∨ #𝛼 #C𝑂1 | #C𝑂2 ∅∪ | ∅𝑂 ∪ 2 max𝑂 𝑂2 ∪ | max∪ 1 𝑡𝑂 𝑂2 rew | 𝑡rew 1 𝑡𝑂 𝑂2 eval | 𝑡eval 1 Q1 60 313 934|957 987|1007 995|1062 17.78|32.76 904.13| 800.25 Q2 23 123 951|972 995|1020 1036|1062 7.44|8.77 253.60|245.77 Q3 1 3 3|3 2|2 2|2 0.08|0.08 0.23|0.71 Q4 1 3 3|3 2|2 2|2 0.07|0.07 0.11|0.03 Q5 1 3 12|12 11|11 11|11 0.07|0.06 2.08|1.94 Q6 297 1891 1018|1042 1063|1085 1169|1178 131.55|132.31 >1000|>1000 Q7 77 509 922|932 988|1002 994|1009 32.24|54.20 >1000|>1000 Q8 1 4 2|2 0|0 0|0 0.2|0.2 0.08|0.08 clipping that results in a rewritten query whose mappings have a strictly lower depth (as at least one variable is now mapped to 𝑑𝑝 instead of its child 𝑑), but remain unchanged otherwise. By clipping exhaustively, iterating over all sets 𝑌 of variables in 𝑞 and all axioms in 𝒯 of the form 𝐴 ⊑ ∃𝑟.𝐵 and 𝐴 ⊑ ∃𝑟− .⊤, and taking each result as a disjunct in a union of NCQs, we obtain a query that has the same answers as the original NCQ, but whose variables are mapped to ABox individuals only. We can then apply the rewriting of atomic queries into regular path expressions described above, and obtain a sound and complete rewriting of NCQs mediated by ℰℒℋℐ 𝑙𝑖𝑛 ⊥ ontologies into UC2RPQs . 3. Implementation and Experiments We implemented a proof-of-concept prototype [8] that, given an ℰℒℋℐ 𝑙𝑖𝑛 ⊥ TBox (in OWL syntax), rewrites NCQs into UC2RPQs and translates them into Cypher. We execute the experiments on a machine running Ubuntu 22.04.4 with an Intel Core i7-6700HQ CPU clocked at 2.60 GHz and 8 GB RAM; with Neo4j 5.18.1 running on it. As TBox (OWL ontology) we use the Cognitive Task Ontology (COGITO) [9], which includes about 9000 axioms describing 4700 concepts used for annotating experimental data [10]. For example, the axiom ReadingTask ⊑ (∃has.Read ⊓ ∃has.Language-item) defines a reading task by referring to the Hierarchical Event Descriptors (HEDs) Read and Language-item [11]. COGITO includes axioms that are not in ℰℒℋℐ 𝑙𝑖𝑛⊥ since they use conjunction on the left side (e.g., the converse of the ReadingTask axiom above). Hence, to make our experimental evaluation more faithful to COGITO, we consider two ontologies: (1) in 𝑂1 we drop the axioms with conjunction on the left-hand side, and (2) in 𝑂2 we replace these conjunctions by disjunctions. We are currently working on inferring the exact answers over COGITO from these over- and under-approximations. We chose a real-world dataset from the domain of cognitive neuroscience [12], stored in a Neo4j database and consisting of 15 512 nodes and 78 113 relationships. For our experiments, we manually created 8 queries, structurally similar to those in [12], and applied our rewriting technique. The rewritten queries were evaluated using Cypher.1 We report our results in Table 1. 4. Conclusion We presented an algorithm for rewriting NCQs into UC2RPQs over a lightweight ontology that extends DL-Lite with some of the expressive features of ℰℒℋ while keeping the data complexity of reasoning in NL. One of our goals is to make progress towards a practicable algorithm, and our prototype implementation, which rewrites the queries into Cypher, suggests that we may be on track. It shows promising results on a real-world dataset from cognitive neuroscience. Acknowledgements This work was partially supported by the Federal State of Salzburg under grant number 20102- F2101143-FPR (Digital Neuroscience Initiative) and the Austrian Federal Ministry of Education, Science and Research (BMBWF) under grant number 2920 (Austrian NeuroCloud). This work was also partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. References [1] G. Xiao, L. Ding, B. Cogrel, D. Calvanese, Virtual knowledge graphs: An overview of systems and use cases, Data Intell. 1 (2019) 201–223. URL: https://doi.org/10.1162/dint_a_ 00011. doi:10.1162/DINT\_A\_00011. [2] D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and efficient query answering in description logics: The DL-lite family, Journal of Automated Reasoning 39 (2007) 385–429. doi:10.1007/s10817-007-9078-x. [3] N. Francis, A. Gheerbrant, P. Guagliardo, L. Libkin, V. Marsault, W. Martens, F. Murlak, L. Peterfreund, A. Rogova, D. Vrgoc, A researcher’s digest of GQL (invited talk), in: F. Geerts, B. Vandevoort (Eds.), 26th International Conference on Database Theory, ICDT 2023, March 28-31, 2023, Ioannina, Greece, volume 255 of LIPIcs, Schloss Dagstuhl - Leibniz- Zentrum für Informatik, 2023, pp. 1:1–1:22. doi:10.4230/LIPIcs.ICDT.2023.1. [4] B. Löhnert, N. Augsten, C. Okulmus, M. Ortiz, Towards practicable algorithms for rewriting graph queries beyond dl-lite – full version, 2024. URL: https://arxiv.org/abs/2405.18181, accessed: 2024-06-04. [5] N. Dragovic, C. Okulmus, M. Ortiz, Rewriting ontology-mediated navigational queries into cypher, in: O. Kutz, C. Lutz, A. Ozaki (Eds.), Proceedings of the 36th International Workshop on Description Logics (DL 2023) co-located with the 20th International Confer- ence on Principles of Knowledge Representation and Reasoning and the 21st International Workshop on Non-Monotonic Reasoning (KR 2023 and NMR 2023)., Rhodes, Greece, September 2-4, 2023, volume 3515 of CEUR Workshop Proceedings, CEUR-WS.org, 2023. URL: https://ceur-ws.org/Vol-3515/paper-9.pdf. 1 The selected queries happen to be insensitive to the trail semantics (implemented in Cypher) vs. the walk semantics. [6] N. Dragovic, Querying Property Graphs with Ontologies, Master’s thesis, TU Wien, 2022. [7] T. Eiter, M. Ortiz, M. Simkus, T. Tran, G. Xiao, Query rewriting for Horn-SHIQ plus rules, in: J. Hoffmann, B. Selman (Eds.), Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada, AAAI Press, 2012, pp. 726–733. URL: http://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4931. [8] N. Dragovic, B. Löhnert, Ontology-mediated querying for property graphs, 2022. URL: https://gitlab.com/ccns/neurocog/neurodataops/anc/software/owl2cypher, accessed: 2024- 05-24. [9] B. Löhnert, B. Engler, F. Hutzler, Cognitive task ontology (COGITO), 2024. URL: https: //gitlab.com/ccns/neurocog/neurodataops/anc/classification/cogito/, accessed: 2024-05-24. [10] R. A. Poldrack, A. Kittur, D. J. Kalar, E. Miller, C. Seppa, Y. Gil, D. S. Parker, F. W. Sabb, R. M. Bilder, The cognitive atlas: Toward a knowledge foundation for cognitive neuroscience, Frontiers Neuroinformatics 5 (2011) 17. doi:10.3389/fninf.2011.00017. [11] K. Robbins, D. Truong, S. Appelhoff, A. Delorme, S. Makeig, Capturing the nature of events and event context using hierarchical event descriptors (hed), NeuroImage 245 (2021) 118766. [12] A. Ravenschlag, M. Denissen, B. Löhnert, M. Pawlik, N. A. Himmelstoß, F. Hutzler, Effective queries for mega-analysis in cognitive neuroscience, in: G. Fletcher, V. Kantere (Eds.), Proceedings of the Workshops of the EDBT/ICDT 2023 Joint Conference, Ioannina, Greece, March, 28, 2023, volume 3379 of CEUR Workshop Proceedings, CEUR-WS.org, 2023. URL: https://ceur-ws.org/Vol-3379/CoMoNoS_2023_id252_Mateusz_Pawlik.pdf.