1. Introduction

Data Complexity in Expressive Description Logics With Path Expressions (Extended Abstract)

Bartosz Bednarczyk

0 1 0 Computational Logic Group, Technische Universität Dresden , Germany 1 Institute of Computer Science, University of Wrocław , Poland

Among the plethora of various features available in extensions of the basic DL called ℒ, an especially prominent one is · reg, supported by the popular -family of DLs [1]. Using · reg, one can specify regular path constraints and hence, allow the user to navigate the underlying graph data-structure. In recent years, many extensions of ℒreg for ontology-engineering were proposed [2, 3, 4], and the complexity of their reasoning problems is mostly well-understood [1, 5, 6]. Many years ago, Vardi [7] proposed the vital notion of data complexity. When measuring the complexity of the knowledge-base (KB) satisfiability problem (KBSat), we treat the ontology (TBox) as fixed upfront and only the user's data (ABox) varies. The satisfiability problem for DLs is usually NP-complete w.r.t. the data complexity, including the two-variable counting logic [8] (encoding DLs up to ℒℬℐSelf) as well as ℛℐ [9], the Self [10] was logical core of OWL2. Regarding DLs with path expressions, NP-completeness of ℒℐreg established recently. This also applies to the (non)entailment of positive two-way regular path queries. In our recent IJCAI paper [11] we study the data complexity of ℐ, , and ℐ, namely the maximal decidable fragments of ℐ that possess the so-called quasi-forest model property, a suitable generalisation of the well-known forest model property for ℒ. For the uniformity of our approach, we actually focus on the satisfiability problem for full ℐ but over quasi-forests. We obtain: Theorem 1. KBSat for ℐ over quasi-forests is NP-complete w.r.t. the data complexity. In particular, knowledge-base satisfiability of ℐ, , and ℐ is NP-complete w.r.t. the data complexity. ◀ This completes the data complexity landscape for decidable fragments of ℐ that remained open for more than a decade, and reproves known results on the ℛ family by Kazakov [9]. Recall that regular path queries are queries of the form ℛ(x , y ) for a regular expression ℛ (with tests). The built-in support of regular path expression in the family of DLs allow us then to reduce the non-entailment problem (, ) ̸|= ℛ(x , y ) to the satisfiability of (, ∪ {⊤ ⊑ ¬∃ℛ.⊤}). Hence: Theorem 2. The entailment problem of regular path queries for ℐ-KBs over quasi-forests is coNPcomplete w.r.t. the data complexity. In particular, this applies to ℐ, , ℐ, and to the corresponding fragments of OWL2, namely ℛℐ, ℛ, and ℛℐ. ◀ We are currently working on lifting the results of Theorem 2 to the class of positive two-way regular path queries. We expect to publish them as the journal version of our IJCAI paper, somewhere next year. As a bonus result, we show how our algorithm used to establish Theorem 1 can be adapted to the entailment of unions rooted (connected and containing at least one answer variable) conjunctive queries over ℐ-KB (in terms of combined complexity). The coNExpTime upper bound for the rooted entailment over ℐ-KBs is supplemented with a novel matching lower bound for ℒ extended with Self. Theorem 3. The query entailment problem over ℐ-KB is coNExpTime-complete for the class of (unions of) rooted conjunctive queries. The lower bound holds already for ℒSelf. ◀

1. Introduction 2. Preliminaries

We fix countably-infinite pairwise-disjoint sets NI, NC, NR of individual, concept, and role names. We assume that the reader is familiar with the standard definitions concerning description logics (DLs) and we do not recall them here. The set of simple roles NsRp is defined with the grammar s ::= r ∈ NR | s− | s∩s | s∪s | s∖s. Simple roles are interpreted as expected by invoking the underlying set-theoretic operations. ℐ-KBs := (, ) are, as usual, composed of ABoxes and TBoxes . We assume that only contains axioms ± r (a, b) and ± A(a) for non-complex concepts A. ℐ-TBoxes are polynomial-time reducible into the following Scott’s normal form.

A ≡ B, A ≡ ¬

B, A ≡ { o}, A ≡ B ⊓ B′, r = s, A ≡ ∃

A.⊤, A ≡ ∃ r .Self, A ≡ (≥ r ).⊤ for A, B, B′ ∈ NC ∪ {⊤, ⊥}, o ∈ NI, r ∈ NR, s ∈ NsRp, ∈ N, and an automaton A over the finite alphabet composed of simple roles and tests C? for atomic concepts C. As usual, the equivalence ≡ replaces the two concept inclusions ⊑. We interpret ∃A.⊤ as the set of all elements d for which there is a path starting from d that realises A (written: |= A), namely A accepts some word of the form 1r1 . . . | |− 1r| |− 1| |, where r ∈ NsRp and are (possibly empty) sequences of tests, satisfying ( , +1) ∈ rℐ and ∈ Cℐ for all ≤ | | and tests C? in . The other DL features are defined as usual. We define the DLs ℐ, ℐ, and , respectively, by dropping (a) number restrictions, (b) nominals, and (c) role inverses from the syntax of ℐ.

The key ingredients of our paper are quasi-forests [ 1 ]. We introduce them under the standard set-theoretic reconstruction of the notion of an N-forest as a prefix-closed subset of N+ without . Definition 1. Let NI and NI be finite subsets of NI, Root ∈ NC, and child , edge ∈ NR. An interpretation ℐ is an (NI, NI )-quasi-forest if its domain Δℐ is an N-forest,

Rootℐ = Δℐ ∩N = {aℐ | a ∈ NI} = {aℐ | a ∈ (NI ∪ NI )}, child ℐ = {(d, d· ) | d, d· ∈ Δℐ , ∈ N}, edgeℐ = ⋃︀r∈NsRp r ℐ , and for all roles r ℐ and all pairs (d, e) ∈ r ℐ at least one of the following conditions hold: (i) d and e belong to Rootℐ , (ii) one of d or e equals oℐ for some name o ∈ N , (iii) d = e, (iv) (d, e) ∈ child ℐ , I or (v) (d, e) ∈ (child − )ℐ . A quasi-forest model of := (, ) is an (ind(), ind( ))-quasi-forest satisfying , where ind() and ind( ) are sets of all individual names from and . ◀

The names from NI are dubbed nominals, and are denoted with decorated letters o. We employ the notions from graph theory such as node, root, child, parent, or descendant, defined as expected in accordance with Δℐ , child ℐ and Rootℐ . For instance, d is a descendant of c whenever (c, d) ∈ (child ℐ )+. A quasi-forest is N-bounded if all nodes have at most N children.

A slightly wider notion of quasi-forests was employed by Calvanese et al. [1, Prop. 3.3] and Ortiz [4, L. 3.4.1, Thm. 3.4.2] to reason about the family. Based on their results we prove: Lemma 1. Let :=(, ) be a KB of , ℐ, or ℐ. is satisfiable if has a quasi-forest model. Moreover, for all (unions of) CQs q , we have ̸|= q if there exists an N-bounded quasi-forest model of violating q (for some N exponential in | |). Deciding if a ℐ-KB has a quasi-forest model is ExpTime-complete w.r.t. the combined complexity. ◀ 3. Data Complexity: An Overview and Some Tricks To establish NP-completeness of the satisfiability for ℐ, , and ℐ in a uniform and elegant way, we focus on the satisfiability of ℐ over quasi-forests. As the proof is highly technical, we only mention a few of its important ingredients and provide a very rough sketch below.

One of the key ingredients is an exponential time algorithm for deciding if a ℐ-KB has a quasi-forest model [4, L. 3.4.1, Thm. 3.4.2]. While this algorithm cannot be used directly to solve the satisfiability problem in optimal time, we stress that we can still employ it as a black-box to decide quasiforest satisfiability of KBs that have sizes independent from the ABox. In our approach, we intend to construct a quasi-forest model of an input ℐ-KB in two steps, i.e. we construct its root part (dubbed the clearing) separately from its subtrees. Our algorithm first pre-computes (an exponential w.r.t. the size of the TBox but of constant size if the TBox is fixed) set of quasi-forest-satisfiable ℐ-concepts that indicate possible subtrees that can be “plugged in” to the clearing of the intended model. Then it guesses (in NP) the intended clearing and verifies its consistency in PTime based on the pre-computed concepts and roles. For the feasibility of our “modular construction” a lot of bookkeeping needs to be done. Most importantly, certain decorations are employed to “relativise” and decide the satisfaction of automata concepts and number restrictions by elements in an incomplete, fragmented forest. I. The first type of decorations, given an automaton A, aggregate information about existing paths realising A and starting at one of the roots of the intended model. As a single such path may visit several subtrees, we cut such paths into relevant pieces and summarise them by means of “shortcut” roles and ℐ-concepts describing paths fully contained inside a single subtree. One of the core results is a complete characterisation of how paths in quasi-forest look like, and how they can be decomposed into interesting pieces, axiomatisable in ℐ. Such “basic” paths are depicted below (the decorated o denote nominals, while the other letters denote arbitrary roots).

a b a a a o a o o a o a ö II. The second type of decorations “localise” counting in the presence of nominals, as the nominals may have successors outside their own subtree and the clearing. For instance, suppose that we deal with the number restriction (≥ 2 r ).⊤ in an ({a, b}, {o})-quasi-forest ℐ sketched below (with r depicted as a green arrow). For diferent thresholds t ∈ {=0, =1, =2, >2} we label the roots of ℐ with concepts Clrngtr , Cldtr , and Destr,o, informing how many r -successor a given root has among (i) the clearing, (b) its children, and how many (c) for each nominal o, and how many of its descendants are r -successors of o. The decorated quasi-forest is:

CCDlherslndr=g,r=1or=10a

CCDlheorslndr=g,r=2or=22 b CCDlherslnd≥rg,r=2or=+011

These two “small tricks”, obfuscated by various technical dificulties, are the core ideas behind our quasi-forest-satisfiability algorithm. We encourage the reader to check the full paper for details. 4. Hardness of Rooted Entailment in ℒSelf : An Overview and Tricks We reduce from the NExpTime-complete 2 × 2 torus tiling problem [12, Cor. 4.15]. For its instance (D, ¯c), we construct a rooted CQ q and an ℒSelf-KB such that ̸|= q if (D, ¯c) has a solution. Models of represent 2 × 2 tori (possibly incorrectly) labelled with tiles, while matches of q detect violations of tilings. We represent tori by binary trees of height 2, where every leaf determines a position (, ) in 2 × 2 and carries a selection of concepts Ad11 , . . . , Ad and Ad++11 , . . . , Ad22 , for bit-strings 1 . . . and +1 . . . 2 encoding and . Unusually, we equip every leaf representing (, ) with three extra children, encoding (, ), (+1, ) and (, +1) (counting modulo 2). This “localises” vertical and horizontal successors of each element. Call the resulting structures treepods. 00

e 001

Ad10 0

e 01

e 011 Ad10, Ad20 e e

Ad10, Ad21

Ad11, Ad20 e e

Ad21, Ad21 e e e e e e

e 10

e 101

Ad11 Lvl1 Lvl2

The verification of tiling relations in treepods can be implemented locally in ℒ, but only under a naïve assumption that all the nodes representing the same position of a torus carry precisely the same tile. Such a “tile equality test” will be implemented with the query. To make this feasible, we design an alternative way of encoding positions and tiles in treepods by means of tentacles, namely the outgoing paths endowed with various self-loops replacing the leaves of treepods. An example tentacle replacing the node 100 in the above treepod is depicted on Figure 2.

⋆ 100

Lvl3 Ad11, Ad20 ad 11

By replacing each leaf of a treepod with a suitable tentacle, we obtain jellyfishes . Note that the node 100 that “stores its address” with concepts Ad11, Ad20, and , has now an outgoing ad 11ad 2r 1?-path. 0 Similarly, 012 has an outgoing ad 10ad 2r 1? path. Our ℒSelf-KB axiomatises jellyfishes, while 0 our query q compares such path-based addresses of nodes. Self-loops are crucial here as they are used to simulate disjunction over paths.

Acknowledgements I thank my PhD supervisor Sebastian Rudolph for his constant support and extreme patience. This work was supported by the ERC Consolidator Grant No. 771779 (DeciGUT). The full version of this paper is available on ArXiV, accessible via: https://arxiv.org/abs/2406.07095

[1]

Calvanese ,

Eiter ,

Ortiz , Regular Path Queries in Expressive Description Logics with Nominals , in: IJCAI , 2009 .

[2]

Bienvenu ,

Calvanese ,

Ortiz ,

Simkus , Nested Regular Path Queries in Description Logics, in: KR, 2014 .

[3]

Calvanese ,

Ortiz ,

Simkus , Verification of Evolving Graph-structured Data under Expressive Path Constraints , in: ICDT, 2016 .

[4]

Ortiz , Query Answering in Expressive Description Logics: Techniques and

Complexity

Results , Ph.D. thesis , TU Wien, AT, 2010 .

[5]

Bednarczyk ,

Rudolph , Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting , in: IJCAI, 2019 .

[6]

Bednarczyk , E. Kieroński, Finite Entailment of Local Queries in the Z Family of Description Logics , in: AAAI, 2022 .

[7]

M. Y.

Vardi , The Complexity of Relational Query Languages (Extended Abstract) , in: STOC , 1982 .

[8]

Pratt-Hartmann , Data-Complexity of the Two-Variable Fragment with Counting Quantifiers, Information and Computation ( 2009 ).

[9]

Kazakov , RIQ and SROIQ are harder than SHOIQ , in: KR, 2008 , pp. 274 - 284 .

[10]

J. C.

Jung ,

Lutz ,

Martel ,

Schneider , Querying the Unary Negation Fragment with Regular Path Expressions , in: ICDT, 2018 .

[11]

Bednarczyk , Data Complexity in Expressive Description Logics with Path Expressions , in: IJCAI, 2024 .

[12]

Lutz , The Complexity of Description Logics with Concrete Domains , Ph.D. thesis , RWTH Aachen University, Germany, 2002 .