Towards Practicable Algorithms for Rewriting Graph
                                Queries beyond DL-Lite (Extended Abstract)
                                Bianca Löhnert1 , Nikolaus Augsten1 , Cem Okulmus2 and Magdalena Ortiz3
                                1                                                                                          2                                       3
                                    Paris-Lodron University of Salzburg, Austria                                               Umeå University, Sweden                 TU Wien, Austria


                                                                         Abstract
                                                                         Despite the many advantages that ontology-based data access (OBDA) has brought to a range of applica-
                                                                         tion domains, state-of-the-art OBDA systems still do not support popular graph database management
                                                                         systems such as Neo4j. Algorithms for query rewriting focus on languages like conjunctive queries and
                                                                         their unions, which are fragments of first-order logic and were developed for relational data. Such query
                                                                         languages are poorly suited for querying graph data. Moreover, they also limit the expressiveness of
                                                                         the ontology languages that admit rewritings, restricting them to those where the data complexity of
                                                                         reasoning is not higher than it is in first-order logic. In this paper, we propose a technique for rewriting
                                                                         a family of navigational queries for a suitably restricted fragment of ℰℒℋℐ that extends DL-Lite and
                                                                         that is NL-complete in data complexity. We implemented a proof-of-concept prototype that rewrites into
                                                                         Cypher queries, and tested it on a real-world cognitive neuroscience use case with promising results.

                                1. Introduction
                                The ontology-based data access (OBDA) paradigm has seen successful adoption and brought
                                significant advantages to a range of applications [1], but so far it remains limited to relational
                                database management systems (RDBMS). Recent years have seen huge adoption of graph
                                databases and bringing the OBDA paradigm could open up a plethora of novel opportunities.
                                   The central problem in OBDA is ontology-mediated query answering (OMQA), where a query
                                is to be evaluated over the consequences of the given dataset together with the knowledge
                                in the ontology. The predominant technique for OMQA is query rewriting, where an input
                                query is transformed to incorporate the ontological knowledge so that the rewritten query
                                can be directly evaluated over any input dataset using standard technologies. Until now, this
                                has meant that the target of query rewriting have been variants of conjunctive queries (CQs),
                                expressible in SQL and supported in standard RDBMs [2], and the ontology languages have been
                                limited to those whose inference problem is not harder in data complexity than SQL evaluation.
                                The DL-Lite family of description logics (DLs), designed to fulfil this requirement, remains
                                the ontology language of choice for OMQA [2]. But with graph query languages, breaking
                                through this barrier is possible. Their navigational features allow queries to traverse paths of
                                arbitrary length that comply with some regular path expression, which results in a higher data
                                complexity than that of SQL. Standard graph query languages like conjunctive two-way regular
                                path queries (C2RPQs), the most popular navigational extension of CQs, and GQL [3], a new


                                    DL 2024: 37th International Workshop on Description Logics, June 18–21, 2024, Bergen, Norway
                                $ bianca.loehnert@plus.ac.at (B. Löhnert); nikolaus.augsten@plus.ac.at (N. Augsten); okulmus@cs.umu.se
                                (C. Okulmus); magdalena.ortiz@tuwien.ac.at (M. Ortiz)
                                 0000-0002-3036-6201 (N. Augsten); 0000-0002-7742-0439 (C. Okulmus); 0000-0002-2344-9658 (M. Ortiz)
                                                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
standard for extending C2RPQs, are NL complete in data complexity (under the so-called walk
semantics). This means that we are not bound to DL-Lite and can attempt query rewriting for
richer ontology languages, provided that they have NL data complexity.

Contributions This extended abstract summarizes the following results, described in [4].
• We prove the existence of C2RPQs that cannot be rewritten into a union of C2RPQs (UC2RPQs)
which captures all consequences from the ontology, even it the ontology is restricted to one
axiom ∃𝑟 ⊑ ∃𝑠 for role names 𝑟, 𝑠. Motivated by this negative result, we define Navigational
Conjunctive Queries (NCQs), a subset of C2RPQs (similar to [5, 6]) which admits such rewritings.
• We propose such a rewriting for OMQs that pair NCQs with ontologies in a language that
we call ℰℒℋℐ 𝑙𝑖𝑛
               ⊥ , tailored to extend DL-Lite with some features of ℰℒℋℐ while keeping the
data complexity of standard reasoning in NL. We first show how to construct a graph structure
whose paths witness all the relevant entailments of the input ontology, and then use this graph
to rewrite atomic queries into C2RPQs. Then we combine this with the ideas behind the Clipper
rewriting [7] to obtain a rewriting of NCQ mediated by ℰℒℋℐ 𝑙𝑖𝑛  ⊥ ontologies into UC2RPQs.
• We present a proof-of-concept prototype of our technique and use it to evaluate queries over
data from a real-world use case in the domain of cognitive neuroscience.


2. Query Rewriting Algorithm for Navigational Queries
For our algorithm we consider ℰℒℋℐ 𝑙𝑖𝑛       ⊥ TBoxes, that have axioms of the form (NF1) 𝐴 ⊑ 𝐵,
(NF2) ∃𝑟.𝐴 ⊑ 𝐵, (NF3) 𝐴 ⊑ ∃𝑟.𝐵, (NF4) 𝑟 ⊑ 𝑠, (NF5) ∃𝑟− .⊤ ⊑ 𝐵, or (NF6)      (︀    𝐴 ⊑ ∃𝑟− .⊤.
                                                                                             )︀
    In Navigational Conjunctive Queries (NCQs), atoms take the form 𝐴1 ∪ · · · ∪ 𝐴𝑛 (𝑥) or
(︀                  )︀        (︀            )︀*
   𝜋1 ∪ · · · ∪ 𝜋𝑛 (𝑥, 𝑦) or 𝜋1 ∪ · · · ∪ 𝜋𝑛 (𝑥, 𝑦), with 𝐴𝑖 a concept name and each 𝜋𝑖 a restricted
regular path expression 𝜋𝑖 := 𝑟 | 𝑟− | 𝑟* | (𝑟− )* . As a first step towards our query rewriting
algorithm, we introduce the so-called concept dependency graph for ℰℒℋℐ 𝑙𝑖𝑛      ⊥ TBoxes.
Concept Dependency Graph As the name suggests the concept dependency graph (CDG)
depicts the dependencies of concepts of a given TBox. The nodes of the CDG represent concepts
and for axioms in form of (NF1), e.g., 𝐴 ⊑ 𝐵 we add an edge pointing from 𝐵 to 𝐴 with the
empty label 𝜀. In case of an existential restriction on the left-hand side as in (NF2) or (NF5)
we additionally label this edge with the role name. In order to make the construction of the
CDG complete for TBox reasoning we extend it with additional edges and witnessing nodes
for axioms with existential quantifiers on the right-hand side, i.e., (NF3) and (NF6). Having
all these edges in place it is possible to check for TBox entailments. More precisely, it holds
that 𝒯 |= ∃𝑟1 . . . ∃𝑟𝑘 .𝐴 ⊑ 𝐵 if and only if there is a path from 𝐵 to 𝐴 with a sequence of edge
labels 𝑠1 , . . . , 𝑠𝑘 (possibly interleaved with 𝜀-labeled edges) such that 𝑠𝑖 ⊑*𝒯 𝑟𝑖 for 1 ≤ 𝑖 ≤ 𝑘.
Note that, in particular, this means that 𝒯 |= 𝐴 ⊑ 𝐵 iff there is a path from 𝐵 to 𝐴 with 𝜀 labels
only. We illustrate these properties of the CDG in the example below. For a formal definition of
concept dependency graphs see [4].

Example 1. For the TBox 𝒯 = {𝐴2 ⊑ 𝐴1 , ∃𝑟.𝐵1 ⊑ 𝐴1 , ∃𝑟3 .𝐵1 ⊑ 𝐵3 , 𝐴3 ⊑ 𝐴2 , ∃𝑟1 .𝐵2 ⊑ 𝐵1 ,
∃𝑟2− .⊤ ⊑ 𝐴3 , 𝑠 ⊑ 𝑟2 , ∃𝑟2 .𝐵3 ⊑ 𝐵2 , 𝐵1 ⊑ ∃𝑟2 .𝐵3 } the CDG 𝐺𝒯 is as in Figure 1. The presence
of paths in the CDG corresponds to concept inclusions that hold in 𝒯 . We illustrate this on
an ABox 𝒜 = {𝑟1 (𝑎1 , 𝑎2 ), 𝑟2 (𝑎2 , 𝑎3 ), 𝑟3 (𝑎3 , 𝑎4 ), 𝑟1 (𝑎4 , 𝑎5 ), 𝐵2 (𝑎5 )}. Since there is the path
                                                                                                       ⊤
                                                                                   Ω                                     ⊤

                                                                   𝐵3
  𝐵2
       𝜀
             𝐵1
                      𝑟
                          𝐴1
                               𝜀
                                    𝐴2
                                          𝜀
                                                𝐴3                         𝐵2      𝐵1       𝐴1    𝐴2       𝐴3             𝑟2−
       𝑟1                                                                𝑟1
                               𝜀                              𝑟2                        r              𝜀             𝜀
 𝑟2             ∃𝑟2                           𝑟2−        𝐵3        𝐵2           𝐵1               𝐴1             𝐴2       𝐴3
        𝑟3                                                                𝜀
        𝜀    ∃𝑟2 .                                                 𝑟3
  𝐵3                                                ⊤
             𝐵3                                                                 start


Figure 1: CDG from Example 1.                           Figure 2: NFA from Example 2.


𝐵1 𝑟1 𝐵2 𝑟2 𝐵3 𝑟3 𝐵1 𝑟1 𝐵2 in the CDG of 𝒯 it follows for 𝑎1 that 𝒯 , 𝒜 |= 𝐵1 (𝑎1 ). However, as
we mentioned before, we can use the CDG to check for TBox entailments, i.e, from the path
𝐵1 𝑟1 𝐵2 𝑟2 𝐵3 𝑟3 𝐵1 𝑟1 𝐵2 we can infer that 𝒯 |= ∃𝑟1 ∃𝑟2 ∃𝑟3 ∃𝑟1 .𝐵2 ⊑ 𝐵1 . The construction of the
CDG is complete, in the sense that for every ℰℒℋℐ 𝑙𝑖𝑛
                                                    ⊥ entailment from 𝒯 with a concept name on
the right-hand side there exists a corresponding path in the CDG of 𝒯 .

Extracting regular path expressions. Given a CDG 𝐺𝒯 and a concept 𝐵 in 𝒯 , we construct
an NFA 𝐺A  𝒯 (𝐵) that accepts the propagating paths in 𝐺𝒯 which start at node 𝐵 and end in
another concept while passing over only roles or 𝜀, as illustrated in Example 2. We then use an
off-the-shelf technique to transform this NFA into a regular expression, and finally we eliminate
all occurrences of 𝜀 from this regular expression.

Example 2. For the CDG from Example 1, we can see its 𝐵1 -path-generating NFA in Figure 2.
                                   paths ending in 𝐵
                                         1                    paths ending in 𝐵2             paths ending in 𝐵3
                     (︀ ⏞         ⏟              ⏞          ⏟              ⏞            ⏟
The extracted RPE is: ((𝑟1 ∪ (𝑟1 𝑟2 𝑟3 )) 𝐵1 ) ∪ (𝑟1+ (𝑟2 𝑟3 𝑟1+ )* 𝐵2 ) ∪ (𝑟1+ 𝑟2 (𝑟3 𝑟1+ 𝑟2 )* 𝐵3 ) .
                                         *
                                                                                                     )︀


Rewriting Atomic   (︀ Queries Into)︀Regular Path Expressions We replace each 𝐴𝑖 in a given
input of the form 𝐴1 ∪ · · · ∪ 𝐴𝑛 (𝑥) with the regular expression described above. Then we
replace each role 𝑟 in the regular path expression with a union over all roles 𝑠 such that 𝑠 ⊑*𝒯 𝑟.

Rewriting NCQs into UC2RPQs Given a NCQ 𝑞(⃗𝑥) and an ℰℒℋℐ 𝑙𝑖𝑛                 ⊥ TBox 𝒯 , we exhaus-
tively apply the so-called clipping function, inspired by [7]. It relies on a well-known property
of ℰℒℋℐ: for each 𝒯 and 𝒜 there is a universal, forest-shaped model ℐ 𝑢 that can be used for
answering all C2RPQs. Intuitively, we consider each possible set 𝑌 of variables that may be
mapped to the same ‘anonymous’ object 𝑑 of maximal depth in a tree structure of ℐ 𝑢 . We know
that each such 𝑑 is introduced to satisfy an existential axiom 𝐴 ⊑ ∃𝑟.𝐵, that is, it has a parent
𝑑𝑝 that satisfies 𝐴, and 𝑑 is created as an 𝑟-child of 𝑑𝑝 that is 𝐵. Exploiting this, we can ‘remove’
from the query the parts that are guaranteed to be locally satisfied using 𝑑𝑝 ’s 𝑟-child 𝑑. This idea
has been applied to variants of (U)CQs [7], where each application must only remove the query
atoms that are locally satisfied. But in navigational queries we must consider also how path
expressions can be partially satisfied, and modify the expressions in the atoms accordingly. E.g.,
given 𝑞(𝑥) = (𝑟* ∪ 𝑠* )(𝑥, 𝑦), 𝐵(𝑦) and 𝒯 = {𝐴 ⊑ ∃𝑟.𝐵}, a possible application of clipping
returns 𝑞 ′ (𝑥) = 𝑟* (𝑥, 𝑦), 𝐴(𝑦). Note that 𝑟* ∪ 𝑠* is replaced by 𝑟* , reflecting the fact that an 𝑟*
path to an 𝐴 can always be extended to an 𝑟* path to a 𝐵 in the models of 𝒯 , but this does not
apply to the 𝑠* paths. We show in [4] that for every query mapping there is an application of
Table 1
Experimental results of the algorithm. #∨ is the number of C2RPQs, #𝛼 the number of atoms and #C
the number of distinct concepts in the output, ∅∪ is the average and max∪ the maximal number of
expressions in unions inside the C2RPQs, 𝑡rew and 𝑡eval the times for rewriting and evaluation in seconds.
We indicate the two ontologies by 𝑂1 and 𝑂2 . Timeout was 1000 seconds.
                             𝑂1
          #∨ #𝛼 #C𝑂1 | #C𝑂2 ∅∪  | ∅𝑂
                                   ∪
                                     2
                                       max𝑂      𝑂2
                                          ∪ | max∪
                                           1
                                                                        𝑡𝑂      𝑂2
                                                                         rew | 𝑡rew
                                                                           1
                                                                                       𝑡𝑂       𝑂2
                                                                                        eval | 𝑡eval
                                                                                          1


     Q1 60 313           934|957        987|1007        995|1062       17.78|32.76 904.13| 800.25
     Q2 23 123           951|972        995|1020       1036|1062        7.44|8.77   253.60|245.77
     Q3 1     3            3|3            2|2              2|2          0.08|0.08     0.23|0.71
     Q4 1     3            3|3            2|2              2|2          0.07|0.07     0.11|0.03
     Q5 1     3           12|12          11|11           11|11          0.07|0.06     2.08|1.94
     Q6 297 1891        1018|1042      1063|1085       1169|1178      131.55|132.31 >1000|>1000
     Q7 77 509           922|932        988|1002        994|1009       32.24|54.20 >1000|>1000
     Q8 1     4            2|2            0|0              0|0            0.2|0.2     0.08|0.08


clipping that results in a rewritten query whose mappings have a strictly lower depth (as at
least one variable is now mapped to 𝑑𝑝 instead of its child 𝑑), but remain unchanged otherwise.
   By clipping exhaustively, iterating over all sets 𝑌 of variables in 𝑞 and all axioms in 𝒯 of the
form 𝐴 ⊑ ∃𝑟.𝐵 and 𝐴 ⊑ ∃𝑟− .⊤, and taking each result as a disjunct in a union of NCQs, we
obtain a query that has the same answers as the original NCQ, but whose variables are mapped
to ABox individuals only. We can then apply the rewriting of atomic queries into regular path
expressions described above, and obtain a sound and complete rewriting of NCQs mediated by
ℰℒℋℐ 𝑙𝑖𝑛
       ⊥ ontologies into UC2RPQs .


3. Implementation and Experiments
We implemented a proof-of-concept prototype [8] that, given an ℰℒℋℐ 𝑙𝑖𝑛          ⊥ TBox (in OWL
syntax), rewrites NCQs into UC2RPQs and translates them into Cypher. We execute the
experiments on a machine running Ubuntu 22.04.4 with an Intel Core i7-6700HQ CPU clocked
at 2.60 GHz and 8 GB RAM; with Neo4j 5.18.1 running on it. As TBox (OWL ontology) we use
the Cognitive Task Ontology (COGITO) [9], which includes about 9000 axioms describing 4700
concepts used for annotating experimental data [10]. For example, the axiom ReadingTask ⊑
(∃has.Read ⊓ ∃has.Language-item) defines a reading task by referring to the Hierarchical Event
Descriptors (HEDs) Read and Language-item [11]. COGITO includes axioms that are not in
ℰℒℋℐ 𝑙𝑖𝑛⊥ since they use conjunction on the left side (e.g., the converse of the ReadingTask axiom
above). Hence, to make our experimental evaluation more faithful to COGITO, we consider
two ontologies: (1) in 𝑂1 we drop the axioms with conjunction on the left-hand side, and (2)
in 𝑂2 we replace these conjunctions by disjunctions. We are currently working on inferring
the exact answers over COGITO from these over- and under-approximations. We chose a
real-world dataset from the domain of cognitive neuroscience [12], stored in a Neo4j database
and consisting of 15 512 nodes and 78 113 relationships. For our experiments, we manually
created 8 queries, structurally similar to those in [12], and applied our rewriting technique. The
rewritten queries were evaluated using Cypher.1 We report our results in Table 1.


4. Conclusion
We presented an algorithm for rewriting NCQs into UC2RPQs over a lightweight ontology that
extends DL-Lite with some of the expressive features of ℰℒℋ while keeping the data complexity
of reasoning in NL. One of our goals is to make progress towards a practicable algorithm, and
our prototype implementation, which rewrites the queries into Cypher, suggests that we may
be on track. It shows promising results on a real-world dataset from cognitive neuroscience.


Acknowledgements
This work was partially supported by the Federal State of Salzburg under grant number 20102-
F2101143-FPR (Digital Neuroscience Initiative) and the Austrian Federal Ministry of Education,
Science and Research (BMBWF) under grant number 2920 (Austrian NeuroCloud). This work
was also partially supported by the Wallenberg AI, Autonomous Systems and Software Program
(WASP) funded by the Knut and Alice Wallenberg Foundation.


References
    [1] G. Xiao, L. Ding, B. Cogrel, D. Calvanese, Virtual knowledge graphs: An overview of
        systems and use cases, Data Intell. 1 (2019) 201–223. URL: https://doi.org/10.1162/dint_a_
        00011. doi:10.1162/DINT\_A\_00011.
    [2] D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and
        efficient query answering in description logics: The DL-lite family, Journal of Automated
        Reasoning 39 (2007) 385–429. doi:10.1007/s10817-007-9078-x.
    [3] N. Francis, A. Gheerbrant, P. Guagliardo, L. Libkin, V. Marsault, W. Martens, F. Murlak,
        L. Peterfreund, A. Rogova, D. Vrgoc, A researcher’s digest of GQL (invited talk), in:
        F. Geerts, B. Vandevoort (Eds.), 26th International Conference on Database Theory, ICDT
        2023, March 28-31, 2023, Ioannina, Greece, volume 255 of LIPIcs, Schloss Dagstuhl - Leibniz-
        Zentrum für Informatik, 2023, pp. 1:1–1:22. doi:10.4230/LIPIcs.ICDT.2023.1.
    [4] B. Löhnert, N. Augsten, C. Okulmus, M. Ortiz, Towards practicable algorithms for rewriting
        graph queries beyond dl-lite – full version, 2024. URL: https://arxiv.org/abs/2405.18181,
        accessed: 2024-06-04.
    [5] N. Dragovic, C. Okulmus, M. Ortiz, Rewriting ontology-mediated navigational queries
        into cypher, in: O. Kutz, C. Lutz, A. Ozaki (Eds.), Proceedings of the 36th International
        Workshop on Description Logics (DL 2023) co-located with the 20th International Confer-
        ence on Principles of Knowledge Representation and Reasoning and the 21st International
        Workshop on Non-Monotonic Reasoning (KR 2023 and NMR 2023)., Rhodes, Greece,
        September 2-4, 2023, volume 3515 of CEUR Workshop Proceedings, CEUR-WS.org, 2023.
        URL: https://ceur-ws.org/Vol-3515/paper-9.pdf.

1
    The selected queries happen to be insensitive to the trail semantics (implemented in Cypher) vs. the walk semantics.
 [6] N. Dragovic, Querying Property Graphs with Ontologies, Master’s thesis, TU Wien, 2022.
 [7] T. Eiter, M. Ortiz, M. Simkus, T. Tran, G. Xiao, Query rewriting for Horn-SHIQ plus rules,
     in: J. Hoffmann, B. Selman (Eds.), Proceedings of the Twenty-Sixth AAAI Conference on
     Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada, AAAI Press, 2012, pp.
     726–733. URL: http://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4931.
 [8] N. Dragovic, B. Löhnert, Ontology-mediated querying for property graphs, 2022. URL:
     https://gitlab.com/ccns/neurocog/neurodataops/anc/software/owl2cypher, accessed: 2024-
     05-24.
 [9] B. Löhnert, B. Engler, F. Hutzler, Cognitive task ontology (COGITO), 2024. URL: https:
     //gitlab.com/ccns/neurocog/neurodataops/anc/classification/cogito/, accessed: 2024-05-24.
[10] R. A. Poldrack, A. Kittur, D. J. Kalar, E. Miller, C. Seppa, Y. Gil, D. S. Parker, F. W. Sabb, R. M.
     Bilder, The cognitive atlas: Toward a knowledge foundation for cognitive neuroscience,
     Frontiers Neuroinformatics 5 (2011) 17. doi:10.3389/fninf.2011.00017.
[11] K. Robbins, D. Truong, S. Appelhoff, A. Delorme, S. Makeig, Capturing the nature of
     events and event context using hierarchical event descriptors (hed), NeuroImage 245 (2021)
     118766.
[12] A. Ravenschlag, M. Denissen, B. Löhnert, M. Pawlik, N. A. Himmelstoß, F. Hutzler, Effective
     queries for mega-analysis in cognitive neuroscience, in: G. Fletcher, V. Kantere (Eds.),
     Proceedings of the Workshops of the EDBT/ICDT 2023 Joint Conference, Ioannina, Greece,
     March, 28, 2023, volume 3379 of CEUR Workshop Proceedings, CEUR-WS.org, 2023. URL:
     https://ceur-ws.org/Vol-3379/CoMoNoS_2023_id252_Mateusz_Pawlik.pdf.