Data Complexity in Expressive Description Logics With Path Expressions (Extended Abstract) Bartosz Bednarczyk1,2 1 Computational Logic Group, Technische Universitรคt Dresden, Germany 2 Institute of Computer Science, University of Wrocล‚aw, Poland 1. Introduction Among the plethora of various features available in extensions of the basic DL called ๐’œโ„’๐’ž, an especially prominent one is ยทreg , supported by the popular ๐’ต-family of DLs [1]. Using ยทreg , one can specify regular path constraints and hence, allow the user to navigate the underlying graph data-structure. In recent years, many extensions of ๐’œโ„’๐’ž reg for ontology-engineering were proposed [2, 3, 4], and the complexity of their reasoning problems is mostly well-understood [1, 5, 6]. Many years ago, Vardi [7] proposed the vital notion of data complexity. When measuring the complexity of the knowledge-base (KB) satisfiability problem (KBSat), we treat the ontology (TBox) as fixed upfront and only the userโ€™s data (ABox) varies. The satisfiability problem for DLs is usually NP-complete w.r.t. the data complexity, including the two-variable counting logic [8] (encoding DLs up to ๐’œโ„’๐’žโ„ฌโ„๐’ช๐’ฌSelf) as well as ๐’ฎโ„›๐’ชโ„๐’ฌ [9], the logical core of OWL2. Regarding DLs with path expressions, NP-completeness of ๐’œโ„’๐’žโ„reg Self [10] was established recently. This also applies to the (non)entailment of positive two-way regular path queries. In our recent IJCAI paper [11] we study the data complexity of ๐’ตโ„๐’ฌ, ๐’ต๐’ช๐’ฌ, and ๐’ต๐’ชโ„, namely the maximal decidable fragments of ๐’ต๐’ชโ„๐’ฌ that possess the so-called quasi-forest model property, a suitable generalisation of the well-known forest model property for ๐’œโ„’๐’ž. For the uniformity of our approach, we actually focus on the satisfiability problem for full ๐’ต๐’ชโ„๐’ฌ but over quasi-forests. We obtain: Theorem 1. KBSat for ๐’ต๐’ชโ„๐’ฌ over quasi-forests is NP-complete w.r.t. the data complexity. In particular, knowledge-base satisfiability of ๐’ตโ„๐’ฌ, ๐’ต๐’ช๐’ฌ, and ๐’ต๐’ชโ„ is NP-complete w.r.t. the data complexity. โ—€ This completes the data complexity landscape for decidable fragments of ๐’ต๐’ชโ„๐’ฌ that remained open for more than a decade, and reproves known results on the ๐’ฎโ„› family by Kazakov [9]. Recall that regular path queries are queries of the form โ„›(x , y) for a regular expression โ„› (with tests). The built-in support of regular path expression in the ๐’ต family of DLs allow us then to reduce the non-entailment problem (๐’œ, ๐’ฏ ) ฬธ|= โ„›(x , y) to the satisfiability of (๐’œ, ๐’ฏ โˆช {โŠค โŠ‘ ยฌโˆƒโ„›.โŠค}). Hence: Theorem 2. The entailment problem of regular path queries for ๐’ต๐’ชโ„๐’ฌ-KBs over quasi-forests is coNP- complete w.r.t. the data complexity. In particular, this applies to ๐’ตโ„๐’ฌ, ๐’ต๐’ช๐’ฌ, ๐’ต๐’ชโ„, and to the corre- sponding fragments of OWL2, namely ๐’ฎโ„›โ„๐’ฌ, ๐’ฎโ„›๐’ช๐’ฌ, and ๐’ฎโ„›๐’ชโ„. โ—€ We are currently working on lifting the results of Theorem 2 to the class of positive two-way regular path queries. We expect to publish them as the journal version of our IJCAI paper, somewhere next year. As a bonus result, we show how our algorithm used to establish Theorem 1 can be adapted to the entail- ment of unions rooted (connected and containing at least one answer variable) conjunctive queries over ๐’ตโ„๐’ฌ-KB (in terms of combined complexity). The coNExpTime upper bound for the rooted entailment over ๐’ตโ„๐’ฌ-KBs is supplemented with a novel matching lower bound for ๐’œโ„’๐’ž extended with Self. Theorem 3. The query entailment problem over ๐’ตโ„๐’ฌ-KB is coNExpTime-complete for the class of (unions of) rooted conjunctive queries. The lower bound holds already for ๐’œโ„’๐’žSelf. โ—€ DL 2024: 37th International Workshop on Description Logics, June 18โ€“21, 2024, Bergen, Norway $ bartek@cs.uni.wroc.pl (B. Bednarczyk) ย€ https://bartoszjanbednarczyk.github.io/ (B. Bednarczyk)  0000-0002-8267-7554 (B. Bednarczyk) ยฉ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Preliminaries We fix countably-infinite pairwise-disjoint sets NI , NC , NR of individual, concept, and role names. We assume that the reader is familiar with the standard definitions concerning description logics sp (DLs) and we do not recall them here. The set of simple roles NR is defined with the grammar s ::= r โˆˆ NR | s โˆ’ | sโˆฉs | sโˆชs | sโˆ–s. Simple roles are interpreted as expected by invoking the underlying set-theoretic operations. ๐’ต๐’ชโ„๐’ฌ-KBs ๐’ฆ := (๐’œ, ๐’ฏ ) are, as usual, composed of ABoxes ๐’œ and TBoxes ๐’ฏ . We assume that ๐’œ only contains axioms ยฑr (a, b) and ยฑA(a) for non-complex concepts A. ๐’ต๐’ชโ„๐’ฌ-TBoxes are polynomial-time reducible into the following Scottโ€™s normal form. A โ‰ก B, A โ‰ก ยฌB, A โ‰ก {o}, A โ‰ก B โŠ“ Bโ€ฒ , r = s, A โ‰ก โˆƒA.โŠค, A โ‰ก โˆƒr .Self, A โ‰ก (โ‰ฅ๐‘› r ).โŠค sp for A, B, Bโ€ฒ โˆˆ NC โˆช {โŠค, โŠฅ}, o โˆˆ NI , r โˆˆ NR , s โˆˆ NR , ๐‘› โˆˆ N, and an automaton A over the finite alphabet composed of simple roles and tests C? for atomic concepts C. As usual, the equivalence โ‰ก replaces the two concept inclusions โŠ‘. We interpret โˆƒA.โŠค as the set of all elements d for which there is a path ๐œŒ starting from d that realises A (written: ๐œŒ |= A), namely A accepts some word of the form sp ๐‘ค1 r1 . . . ๐‘ค|๐œŒ|โˆ’1 r|๐œŒ|โˆ’1 ๐‘ค|๐œŒ| , where r๐‘– โˆˆ NR and ๐‘ค๐‘– are (possibly empty) sequences of tests, satisfying (๐œŒ๐‘– , ๐œŒ๐‘–+1 ) โˆˆ r๐‘–โ„ and ๐œŒ๐‘– โˆˆ Cโ„ for all ๐‘– โ‰ค |๐œŒ| and tests C? in ๐‘ค๐‘– . The other DL features are defined as usual. We define the DLs ๐’ต๐’ชโ„, ๐’ตโ„๐’ฌ, and ๐’ต๐’ช๐’ฌ, respectively, by dropping (a) number restrictions, (b) nominals, and (c) role inverses from the syntax of ๐’ต๐’ชโ„๐’ฌ. The key ingredients of our paper are quasi-forests [1]. We introduce them under the standard set-theoretic reconstruction of the notion of an N-forest as a prefix-closed subset of N+ without ๐œ€. Definition 1. Let N๐’œ ๐’ฏ I and NI be finite subsets of NI , Root โˆˆ NC , and child , edge โˆˆ NR . An interpretation โ„ is an (NI , NI )-quasi-forest if its domain ฮ”โ„ is an N-forest, ๐’œ ๐’ฏ Rootโ„ = ฮ”โ„ โˆฉN = {aโ„ | a โˆˆ NI } = {aโ„ | a โˆˆ (N๐’œ ๐’ฏ I โˆช NI )}, child โ„ = {(d, dยท๐‘›) | d, dยท๐‘› โˆˆ ฮ”โ„ , ๐‘› โˆˆ N}, edge โ„ = r โˆˆNsp r โ„ , โ‹ƒ๏ธ€ R and for all roles r โ„ and all pairs (d, e) โˆˆ r โ„ at least one of the following conditions hold: (i) d and e belong to Rootโ„ , (ii) one of d or e equals oโ„ for some name o โˆˆ N๐’ฏI , (iii) d = e, (iv) (d, e) โˆˆ child โ„ , or (v) (d, e) โˆˆ (child โˆ’ )โ„ . A quasi-forest model of ๐’ฆ := (๐’œ, ๐’ฏ ) is an (ind(๐’œ), ind(๐’ฏ ))-quasi-forest satisfying ๐’ฆ, where ind(๐’œ) and ind(๐’ฏ ) are sets of all individual names from ๐’œ and ๐’ฏ . โ—€ The names from N๐’ฏI are dubbed nominals, and are denoted with decorated letters o. We employ the notions from graph theory such as node, root, child, parent, or descendant, defined as expected in accordance with ฮ”โ„ , child โ„ and Rootโ„ . For instance, d is a descendant of c whenever (c, d) โˆˆ (child โ„ )+ . A quasi-forest is N-bounded if all nodes have at most N children. A slightly wider notion of quasi-forests was employed by Calvanese et al. [1, Prop. 3.3] and Ortiz [4, L. 3.4.1, Thm. 3.4.2] to reason about the ๐’ต family. Based on their results we prove: Lemma 1. Let ๐’ฆ:=(๐’œ, ๐’ฏ ) be a KB of ๐’ต๐’ช๐’ฌ, ๐’ต๐’ชโ„, or ๐’ตโ„๐’ฌ. ๐’ฆ is satisfiable iff ๐’ฆ has a quasi-forest model. Moreover, for all (unions of) CQs q, we have ๐’ฆ ฬธ|= q iff there exists an N-bounded quasi-forest model of ๐’ฆ violating q (for some N exponential in |๐’ฏ |). Deciding if a ๐’ต๐’ชโ„๐’ฌ-KB has a quasi-forest model is ExpTime-complete w.r.t. the combined complexity. โ—€ 3. Data Complexity: An Overview and Some Tricks To establish NP-completeness of the satisfiability for ๐’ตโ„๐’ฌ, ๐’ต๐’ช๐’ฌ, and ๐’ต๐’ชโ„ in a uniform and elegant way, we focus on the satisfiability of ๐’ต๐’ชโ„๐’ฌ over quasi-forests. As the proof is highly technical, we only mention a few of its important ingredients and provide a very rough sketch below. One of the key ingredients is an exponential time algorithm for deciding if a ๐’ต๐’ชโ„๐’ฌ-KB has a quasi-forest model [4, L. 3.4.1, Thm. 3.4.2]. While this algorithm cannot be used directly to solve the satisfiability problem in optimal time, we stress that we can still employ it as a black-box to decide quasi- forest satisfiability of KBs that have sizes independent from the ABox. In our approach, we intend to construct a quasi-forest model of an input ๐’ต๐’ชโ„๐’ฌ-KB in two steps, i.e. we construct its root part (dubbed the clearing) separately from its subtrees. Our algorithm first pre-computes (an exponential w.r.t. the size of the TBox but of constant size if the TBox is fixed) set of quasi-forest-satisfiable ๐’ต๐’ชโ„๐’ฌ-concepts that indicate possible subtrees that can be โ€œplugged inโ€ to the clearing of the intended model. Then it guesses (in NP) the intended clearing and verifies its consistency in PTime based on the pre-computed concepts and roles. For the feasibility of our โ€œmodular constructionโ€ a lot of bookkeeping needs to be done. Most importantly, certain decorations are employed to โ€œrelativiseโ€ and decide the satisfaction of automata concepts and number restrictions by elements in an incomplete, fragmented forest. I. The first type of decorations, given an automaton A, aggregate information about existing paths realising A and starting at one of the roots of the intended model. As a single such path may visit several subtrees, we cut such paths into relevant pieces and summarise them by means of โ€œshortcutโ€ roles and ๐’ต๐’ชโ„๐’ฌ-concepts describing paths fully contained inside a single subtree. One of the core results is a complete characterisation of how paths in quasi-forest look like, and how they can be decomposed into interesting pieces, axiomatisable in ๐’ต๐’ชโ„๐’ฌ. Such โ€œbasicโ€ paths are depicted below (the decorated o denote nominals, while the other letters denote arbitrary roots). a รถ a a a o a b o o o a a II. The second type of decorations โ€œlocaliseโ€ counting in the presence of nominals, as the nominals may have successors outside their own subtree and the clearing. For instance, suppose that we deal with the number restriction (โ‰ฅ2 r ).โŠค in an ({a, b}, {o})-quasi-forest โ„ sketched below (with r depicted as a green arrow). For different thresholds t โˆˆ {=0, =1, =2, >2} we label the roots of โ„ with concepts Clrngrt , Cldrt , and Desrt ,o, informing how many r -successor a given root has among (i) the clearing, (b) its children, and how many (c) for each nominal o, and how many of its descendants are r -successors of o. The decorated quasi-forest is: Clrngr=2 Chldr=2 Clrngr=0 Desr=2 ,o Clrngr=1 Chldr=1 a o b Chldr=0 Desr=1 ,o Desrโ‰ฅ2+1 ,o These two โ€œsmall tricksโ€, obfuscated by various technical difficulties, are the core ideas behind our quasi-forest-satisfiability algorithm. We encourage the reader to check the full paper for details. 4. Hardness of Rooted Entailment in ๐’œโ„’๐’žSelf: An Overview and Tricks We reduce from the NExpTime-complete 2๐‘› ร— 2๐‘› torus tiling problem [12, Cor. 4.15]. For its instance (D, cฬ„), we construct a rooted CQ q and an ๐’œโ„’๐’žSelf-KB ๐’ฆ such that ๐’ฆ ฬธ|= q iff (D, cฬ„) has a solution. Models of ๐’ฆ represent 2๐‘› ร— 2๐‘› tori (possibly incorrectly) labelled with tiles, while matches of q detect violations of tilings. We represent tori by binary trees of height 2๐‘›, where every leaf determines a ๐‘๐‘›+1 position (๐‘ฅ, ๐‘ฆ) in 2๐‘› ร— 2๐‘› and carries a selection of concepts Ad๐‘11 , . . . , Ad๐‘๐‘›๐‘› and Ad๐‘›+1 , . . . , Ad๐‘2๐‘› 2๐‘› , for bit-strings ๐‘1 . . . ๐‘๐‘› and ๐‘๐‘›+1 . . . ๐‘2๐‘› encoding ๐‘ฅ and ๐‘ฆ. Unusually, we equip every leaf representing (๐‘ฅ, ๐‘ฆ) with three extra children, encoding (๐‘ฅ, ๐‘ฆ), (๐‘ฅ+1, ๐‘ฆ) and (๐‘ฅ, ๐‘ฆ+1) (counting modulo 2๐‘› ). This โ€œlocalisesโ€ vertical and horizontal successors of each element. Call the resulting structures treepods. Lvl0 ๐œ€ e e Ad01 Ad11 Lvl1 0 1 e e e e Ad01 , Ad02 Ad01 , Ad12 Ad11 , Ad02 Ad12 , Ad12 Lvl2 00 01 10 11 e e e e e e e e e e e e Lvl3 000 001 002 010 011 012 100 101 102 110 111 112 L M R L M R L M R L M R Ad01 , Ad02 Ad11 , Ad02 Ad01 , Ad12 Ad01 , Ad12 Ad11 , Ad12 Ad01 , Ad02 Ad11 , Ad02 Ad01 , Ad02 Ad11 , Ad12 Ad11 , Ad12 Ad01 , Ad12 Ad11 , Ad02 Figure 1: An example treepod for ๐‘› = 1. The verification of tiling relations in treepods can be implemented locally in ๐’œโ„’๐’ž, but only under a naรฏve assumption that all the nodes representing the same position of a torus carry precisely the same tile. Such a โ€œtile equality testโ€ will be implemented with the query. To make this feasible, we design an alternative way of encoding positions and tiles in treepods by means of tentacles, namely the outgoing paths endowed with various self-loops replacing the leaves of treepods. An example tentacle replacing the node 100 in the above treepod is depicted on Figure 2. โ‹† โ‹† โ‹† โ‹†, e 1, Lvl6 , Ad11 , Ad02 , r ad 11 ad 02 100 r โ‹†, e 0, Lvl6 , Ad11 , Ad02 , Lvl3 Lvl4 Lvl5 r Ad11 , Ad02 Ad11 , Ad02 Ad11 , Ad02 ......... โ‹†, e 0, Lvl6 , Ad11 , Ad02 , Figure 2: An example tentacle replacing the node 100 in the treepod from Figure 1. By replacing each leaf of a treepod with a suitable tentacle, we obtain jellyfishes. Note that the node 100 that โ€œstores its addressโ€ with concepts Ad11 , Ad02 , and , has now an outgoing ad 11 ad 02 r 1?-path. Similarly, 012 has an outgoing ad 01 ad 02 r 1? path. Our ๐’œโ„’๐’žSelf-KB ๐’ฆ axiomatises jellyfishes, while our query q compares such path-based addresses of nodes. Self-loops are crucial here as they are used to simulate disjunction over paths. Acknowledgements I thank my PhD supervisor Sebastian Rudolph for his constant support and extreme patience. This work was supported by the ERC Consolidator Grant No. 771779 (DeciGUT). The full version of this paper is available on ArXiV, accessible via: https://arxiv.org/abs/2406.07095 References [1] D. Calvanese, T. Eiter, M. Ortiz, Regular Path Queries in Expressive Description Logics with Nominals, in: IJCAI, 2009. [2] M. Bienvenu, D. Calvanese, M. Ortiz, M. Simkus, Nested Regular Path Queries in Description Logics, in: KR, 2014. [3] D. Calvanese, M. Ortiz, M. Simkus, Verification of Evolving Graph-structured Data under Expres- sive Path Constraints, in: ICDT, 2016. [4] M. Ortiz, Query Answering in Expressive Description Logics: Techniques and Complexity Results, Ph.D. thesis, TU Wien, AT, 2010. [5] B. Bednarczyk, S. Rudolph, Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting, in: IJCAI, 2019. [6] B. Bednarczyk, E. Kieroล„ski, Finite Entailment of Local Queries in the Z Family of Description Logics, in: AAAI, 2022. [7] M. Y. Vardi, The Complexity of Relational Query Languages (Extended Abstract), in: STOC, 1982. [8] I. Pratt-Hartmann, Data-Complexity of the Two-Variable Fragment with Counting Quantifiers, Information and Computation (2009). [9] Y. Kazakov, RIQ and SROIQ are harder than SHOIQ, in: KR, 2008, pp. 274โ€“284. [10] J. C. Jung, C. Lutz, M. Martel, T. Schneider, Querying the Unary Negation Fragment with Regular Path Expressions, in: ICDT, 2018. [11] B. Bednarczyk, Data Complexity in Expressive Description Logics with Path Expressions, in: IJCAI, 2024. [12] C. Lutz, The Complexity of Description Logics with Concrete Domains, Ph.D. thesis, RWTH Aachen University, Germany, 2002.