-

Data complexity of nite query entailment in description logics with transitive roles

Jakub Kuklis jk.kuklis@gmail.com

0 0 University of Warsaw , 02-097 Warszawa , Poland

We study the problem of nite ontology-mediated query answering (FOMQA), the variant of OMQA where the represented world is assumed to be nite, and thus only nite models of the ontology are considered. We adapt the setting of unions of conjunctive queries and ontologies expressed in description logics with transitive role declarations. FOMQA is already undecidable for SHOIF , but was proved to be 2Exp-complete for SOI, SOF and SIF . The proof relies on the fact that any nite counter-model can be transformed to a counter-model of certain regularity. By adjusting this transformation, we are able to provide an upper bound on the size of the smallest nite counter-model for SOI and SOF ; the bound is linear in the size of the ABox. This lets us show that for these logics FOMQA is in coNP with respect to data complexity; as coNP-hardness follows from previous results on weaker logics, we deduce coNP-completeness. 1

FOMQA Data complexity Counter-models SOI SOF

Evaluating queries in the presence of background knowledge has been extensively studied in several communities. A particularly prominent take on this problem is ontology mediated query answering (OMQA) where background knowledge represented by an ontology is leveraged to infer more complete answers to queries [ 3 ]. A widely accepted family of ontology languages with varying expressive power is o ered by Description Logics (DLs) [ 2 ], while the most commonly studied query language is that of (unions of) conjunctive queries.

Identifying ontology languages, for which query evaluation scales to large amounts of instance data, is one of the key research goals, as data tends to dominate in size both the ontology used and the evaluated queries. An early reference on data complexity in DLs, that is, complexity in the settings with xed TBox and query, is showing coNP-hardness of instance queries in ALE [ 13 ]. In the much more expressive DL SHIQ, a coNP upper bound was obtained for instance queries [ 8 ] and conjunctive queries without transitive roles [ 10 ], and later 1 Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). generalized to any CQs [ 5 ]. The classical approach to avoiding intractability is to replace ALC and SHIQ with less expressive Horn DLs. These often admit query evaluation in PTime, examples include a variety of logics from the EL [ 1 ] and DL-Lite families [ 4 ], as well as Horn-SHIQ [ 8 ]. An alternative approach is to study complexity with respect to xed TBoxes only [ 9 ].

Often, the intended models of the ontology are nite and this additional assumption allows to infer more properties: nite ontology mediated query answering (FOMQA) is the variant of OMQA restricted to nite models. For some logics, like ALC, the nite variant and the unrestricted variant of the problem coincide; we then say that OMQA is nitely controllable. Studying FOMQA is interesting in settings lacking nite controllability. This is the case not only for DLs lacking the nite model property (e.g., DLs allowing both inverse roles and number restrictions), but also for logics allowing transitive role declarations, like logics from the S family. FOMQA is undecidable for SHOIF ontologies [ 11 ], but recently its decidability has been shown for three proper SOIF fragments: SOI, SOF , and SIF [ 6 ]. For all three fragments, the problem turned out to be 2Exp-complete in terms of combined complexity, but its data complexity was not studied. The present paper aims to ll this gap, thus initiating the study of the data complexity of FOMQA.

In [ 6 ], FOMQA is reduced to query entailment over a certain class of tree-like models recognisable by tree automata. Here, we adjust and adapt this approach to obtain the linear-size counter-model property for SOI and SOF (with respect to xed TBox and query). This lets us derive an algorithm for FOMQA, which is coNP with respect to data complexity. The matching lower bound can be obtained using nite controllability from the results on answering instance queries, which is coNP-hard even for the ALE fragment of ALC [ 13 ] (see also [ 2 ]). 2

Preliminaries

The DL SOIF extends the classical DL ALC with transitivity declarations on roles (S), nominals (O), inverses (I), and role functionality declarations (F ) [ 2 ]. We assume a signature of countably in nite disjoint sets of concept names NC = fA1; A2; : : : g, role names NR = fr1; r2; : : : g and individual names NI = fa1; a2; : : : g. SOIF -concepts C; D are de ned by the grammar:

C; D ::= > j A j :C j C u D j fag j 9r:C ; where r 2 NR [ fs j s 2 NRg is a role. Roles of the form r are called inverse roles. A SOIF TBox T is a nite set of concept inclusions (CIs) C v D, transitivity declarations Tr(r), functionality declarations Fn(r), where C; D are SOIF -concepts and r is a role. We assume that if the TBox contains Tr(r), then it contains neither Fn(r) nor Fn(r ). With an appropriate extension of the signature, each SOIF TBox can be transformed into an equivalent TBox with CIs in the following normal forms: u Ai v t Bj ; where A; Ai; B; Bj are concept names, empty conjunction is equivalent to > and empty disjunction to ?. We assume that all knowledge bases we work with are normalized; normalization yields at most linearly larger knowledge base, so the size estimations still hold. We also assume that for each concept name A used in T there is a complementary concept name A axiomatised using CIs > v A t A and A u A v ?.

SOI and SOF TBoxes are restrictions of SOIF TBoxes. SOI TBoxes do not contain functionality declarations, whereas concept inclusions in SOF do not contain inverse roles. As the inverse of a transitive role is transitive anyway, for SOI we shall assume that if Tr(r) is present in the TBox, then so is Tr(r ).

An ABox is a nite set of concept and role assertions of the form A(a) and r(a; b), where A 2 NC, r 2 NR and fa; bg NI. A knowledge base (KB) is a pair K = (T ; A). We write jKj for jAj + jT j. We use CN(K), Rol(K), TRol(K), Nom(K), and Ind(K) to denote, respectively, the set of all concept names, roles, transitive roles, nominals, and individuals occurring in K. Each nominal is an individual, so Nom(K) Ind(K).

A unary type is a subset of CN(K) that contains exactly one of the concept names A, A for each A 2 CN(K). We let Tp(K) denote the set of all unary types.

The semantics is de ned via interpretations I = ( I ; I ) with a non-empty domain I and an interpretation function I assigning to each A 2 CN(K) a set AI I and to each role name r with r 2 Rol(K), a binary relation rI I I . The interpretation of complex concepts and roles is de ned as usual [ 2 ]. We only consider interpretations complying with the standard name assumption and set aI = a for each individual name a used in ABox or TBox. We write d 2 I as a shorthand for d 2 I .

Interpretation I is a subinterpretation of interpretation J , written as I J , if I J , AI AJ , and rI rJ for all A 2 CN(K), r 2 Rol(K). An interpretation I is a subinterpretation of J induced by 0 J , written as I = J 0, if I = 0, AI = AJ \ 0, and rI = rJ \ 0 0 for all A 2 CN(K), r 2 Rol(K). We write J n X for the subinterpretation of J induced by J n X.

An interpretation I satis es 2 T [ A, written as I j= , if the following holds: if is a CI C v D then CI DI , if is a transitivity declaration Tr(r) then rI is transitive, if is a functionality declaration Fn(r) then rI is a partial function, if is an assertion A(a) then a 2 AI , and if is an assertion r(a; b) then (a; b) 2 rI . Finally, I is a model of: a TBox T , denoted I j= T , if I j= for all 2 T ; an ABox A, denoted I j= A, if I j= for all 2 A; and a KB K if I j= T and I j= A.

Let I and J be interpretations of K. A homomorphism from I to J , written as h : I ! J is a function h : I ! J that preserves roles, concepts, and individual names; that is, (h(d); h(d0)) 2 rJ whenever (d; d0) 2 rI , r 2 Rol(K), h(d) 2 AJ whenever d 2 AI , A 2 CN(K), and h(a) = a for all a 2 Ind(K). Note that I J i the identity mapping id is a homomorphism id : I ! J .

Let NV be a countably in nite set of variables. An atom is an expression of the form A(x) or r(x; y) with A 2 NC, r 2 NR, and x; y 2 NV, referred to as concept atoms and role atoms, respectively. A conjunctive query (CQ) Q is an existentially quanti ed conjunction q of atoms, 9x1 9xn q : For simplicity we restrict q to Boolean; that is, q uses only variables x1; ; xn. This is without loss of generality since the case of non-Boolean CQs can be reduced to the case of Boolean queries; see e.g. [ 12 ].

Let var denote the function from queries to sets of variables, such that var(Q) is the set of all the variables used in Q. A match for Q in I is a total function : var(Q) ! I such that I; j= q under the standard semantics of rst-order logic. An interpretation I satis es Q, written as I j= Q, if there exists a match for Q in I. A partial match for Q in I is a partial function : var(Q) ! I such that I; j= q0, where q0 is obtained from q by dropping atoms which bind some variable not in the domain of .

Note that we do not consider queries with constants (i.e., individual names): such queries can be viewed as non-boolean queries with a xed valuation of free variables, and thus are covered by the reduction to the Boolean case. We do consider unions of conjunctive queries (UCQs), which are disjunctions of CQs. An interpretation I satis es a UCQ Q if it satis es one of its disjuncts. It follows immediately that UCQs are preserved under homomorphisms ; that is, if I j= Q and there is a homomorphism from I to J , then also J j= Q.

A query Q is entailed by a KB K, denoted as K j= Q, if every model of K satis es Q. A model of K that does not satisfy Q is called a counter-model. The query entailment problem asks whether a KB K entails a (U)CQ Q. Moreover, this problem is equivalent to that of nding a counter-model. It is well known that the query answering problem can be reduced to query entailment.

In this paper, we address the problem of nite query entailment, which is a variant of query entailment where only nite interpretations are considered: an interpretation I is nite if I is nite, and a query Q is nitely entailed by K, denoted as K j= n Q, if every nite model of K satis es Q. 3

Counter-model surgery

Later in this paper, we transform a tree-like counter-model into a nite countermodel of bounded size, using a variant of the blocking principle: a systematic policy of reusing elements. Before we dive into this construction, we show that we can add certain edges to a counter-model without satisfying the query. These edges must connect a pair of points, for which a di erent pair of similar points is already connected via an edge over the same role. Throughout this section, we assume no transitivity declarations are present; we will handle these later.

Sharing the same unary type is a natural similarity criterion. Without transitivity and functional declarations, concept inclusions rely only on the unary types of directly linked elements, so adding an r-edge between a pair of elements yields a correct model of the knowledge base if a pair of elements of the same unary types is already connected with an r-edge. However, that criterion is not su cient, since this way we might accidentally satisfy the query, for example by creating a cycle of given length when the query asks for one. This happens to be a key di culty to overcome: we cannot introduce cycles shorter than the size of the query.

The rst step to enhance the criterion is to look at su ciently large neighbourhoods, rather than just unary types. We adapt the de nition of neighbourhoods from [ 6 ] and parameterize it with a set of constants Const I : elements which are considered neighbours of all the elements in the interpretation, regardless of the actual connections between them. Throughout the paper, we assume Const = Nom(K) unless speci ed otherwise. By the distance between two elements we understand the length of the shortest undirected path over the union of all roles between these two elements.

De nition 1. For d in I n Const and an integer n, the n-neighbourhood NnI (d)

of d with respect to Const is the subinterpretation of I induced by Const and all elements e in I n Const within distance n from d in I n Const, enriched with a concept D interpreted as fdg, and a concept interpreted as fag for each a 2 Const. For a 2 Const, NnI (a) is the subinterpretation induced by Const, enriched similarly.

Replacing unary types with large neighbourhoods is still not enough, because nearby elements can have arbitrary large isomorphic neighbourhoods: in the integers with the successor relation, all n-neighbourhoods are isomorphic. The next step is to enrich the initial counter-model in such a way that overlapping neighbourhoods are not isomorphic, following an idea from [ 7 ].

De nition 2. A colouring with k colours of an interpretation I is an exten

sion J of I with J = I , such that J coincides with I in every element in the signature of I, and interprets k fresh concept names B1; : : : ; Bk such that

B1J ; : : : ; BkJ is a partition of J . We say that d 2 BiJ has colour Bi. A colour

ing J of I is n-proper if for each d 2 J all elements of NnI (d) have di erent colours.

Lemma 1. If I n Const has bounded degree, then for all n 0 there exists an

n-proper colouring of I with at most M + jConstj colours, where M is the size of the largest (2n)-neighbourhood.

We write In for an arbitrarily chosen n-proper colouring of I. De nition 3. Let i n and let d, e be elements of In. We say that (d; e) is an

i-link along role r if either d has an r-successor e0 in In such that N In (e0) ' i

NiIn (e), or e has an r-predecessor d0 in In such that NiIn (d0) ' NiIn (d).

Theorem 1 (Lifting Theorem). For a conjunctive query with at most k binary atoms and n k and an interpretation In in which the query is not satised, adding n-links to In does not make the query satis ed.

This theorem was established in [ 6 ] for n k2. We prove the stronger variant in the appendix. The strengthening is not necessary for our approach, but it tightens the resulting size estimations.

Clique-forest counter-models

In this section, we revisit prior work showing that for SOI and SOF , the existence of a nite counter-model for Q is equivalent to the existence of a so called counter-example, a possibly in nite counter-model of a special form, which generalises tree-shaped models [ 6 ]. We impose tighter size constraints on counterexamples and construct automata recognizing them.

The special form is based on the notion of clique-forests.

De nition 4 ([ 6 ]). Let us x a KB K. A clique-forest for an interpretation I is a forest whose each node v is labelled with a subinterpretation Iv of I n Nom(K) such that: { the sets Iv are a partition of InNom(K); { each Iv is either a single element with all roles empty (element node) or a clique over some transitive role with all other roles empty and no repetitions of unary types (clique node); { apart from edges within cliques and between individuals, in I n Nom(K) there is exactly one edge between Iu and Iv for every two adjacent nodes u and v: assuming u is the parent of v, it is an r-edge from an element of Iu to an element of Iv for some r 2 Rol(K). We call v the origin of Iv.

De nition 5. Let K denote the KB obtained from K by dropping transitivity

declarations and let I denote the interpretation obtained from I by closing transitively the interpretation of each transitive role.

Note that each existential restriction satis ed in I is also satis ed in I . The same holds for quanti er-free CI, and for universal restrictions involving nontransitive roles. For universal restrictions involving transitive roles, we ensure this property by preprocessing the interpretation: for each pair of B 2 CN(K) and r 2 TRol(K), we add a fresh concept name Pr;B, axiomatised as Pr;B v 8r:(B u Pr;B), and replace each CI of the form A v 8r:B with A v 8r:Pr;B.

An interpretation is safe if it does not contain an in nite simple r-path for any transitive role r.

De nition 6. A counter-example for Q is a safe interpretation I such that:

I j= K , I j6= Q, and I admits a clique-forest with at most jInd(K)j + jNom(K)j jCI(K)j trees, nodes containing at most jCN(K)j elements, at most jCI(K)j children nodes connected to a single element, and each element of Ind(K) n Nom(K) occurring in some root.

If I is a counter-example for Q, then I is a counter-model for Q, thanks to the initial preprocessing.

Theorem 2. Q has a nite counter-model i Q has a counter-example.

The theorem was originally stated in [ 6 ], where counter-examples by de nition admitted clique-forests with at most jKj2 trees and branching at most jKj2, but the proof there is still valid for the present stronger statement. The bound on the branching of clique-forests is replaced with bounds on the size of cliquetree nodes and number of children nodes connected to a single element within any node; their precise values follow from the construction of a single clique-tree in the proof. The bound on the number of trees can be tightened, as the constructed clique-forests are built of trees rooted in non-nominal individuals and children nodes of nominals.

To obtain the nal counter-model of bounded size, we are going to apply the blocking principle to a counter-example. Thanks to Theorem 2, we already know that a counter-example exists whenever Q has a nite counter-model. In the next section, we require that the clique-forests of counter-examples we work with are of certain regularity. We will show that in regular interpretations with a so-called completeness property, if two elements are connected with a path over a transitive role, then there is at least one short path over that role which connects them. To be able to focus on counter-examples with regular clique-forests, we shall construct tree automata recognizing the latter.

The following de nitions refer to edges and paths in a xed interpretation I, either known from the context or explicitly speci ed.

De nition 7. Fix an ordered pair of elements a; b. We say that (a; b) is an redge if there is an edge over r from a to b. We say that (a; b) is an r -edge, if either r is transitive and there is a path over r from a to b, or (a; b) is an r-edge.

De nition 8. An interpretation J is individual-complete, if in J each r -edge

over two individuals used in K is an r-edge. An interpretation J is nominalcomplete, if in J each r -edge with a nominal as one of the endpoints is an r-edge. An interpretation is complete, if it is individual- and nominal-complete.

Note the di erence between the individual- and nominal-completeness definitions: in the former, both endpoints of the edges considered have to be individuals, whereas in the latter, only one endpoint has to be a nominal. Each counter-model can be extended to a complete one in a natural way so that the transitive closure stays the same. We can thus restate Theorem 2: Q has a nite counter-model i Q has a complete counter-example.

Theorem 3. For a union Q of CQs, with each CQ of size at most m, and a complete interpretation M with M = Ind(K), there exists an automaton with at most 22 34mjQjjT j3m states recognizing clique-forests of complete counter-examples for Q, for which the subinterpretation induced by Ind(K) is equal to M.

See Appendix for the automaton construction. Theorem 3 is inspired by a similar theorem in [ 6 ]. We add the completeness requirement and restrict the size of the automaton state-space, rather than its overall size. This way, we are able to get a bound which is independent of the ABox; it depends only on the TBox and query sizes. The automaton size still depends on the ABox, since its description has to contain all the initial lists of states, each state corresponding to one tree, and we consider clique-forest with at least one tree per individual occurring in K.

These tweaks result in signi cant di erences between our construction of the automaton and the construction presented in [ 6 ]. Most importantly, we cannot transfer the information about the edges between individuals to the TBox, as that would introduce a dependency of the automaton state-space on the ABox. This in uences especially the automaton component responsible for verifying that the query cannot be satis ed. 5

Bounded counter-models

We say that DL has the linear-size counter-model property if whenever there exists a counter-model for a given query Q and a knowledge base K = (A; T ) expressed in that DL, there exists a counter-model for Q of size bounded by jAj B(Q; T ) for some function B. The goal of the larger part of this section is to prove the following theorem.

Theorem 4. SOI and SOF have the linear-size counter-model property.

Clique-forests of complete counter-examples can be recognized by a family of automata by Theorem 3. Suppose K and Q admit a nite counter-model. As discussed in the last section, they admit then a complete counter-example, which in turn can be recognized by an automaton. Any automaton accepting some clique-forest must also accept some regular forest, which has only nitely many non-isomorphic subtrees. The number of these subtrees can be bounded by the number of states of the automaton. Following the bound in Theorem 3, we pick a complete counter-example I admitting a regular clique-forest with at most p = 22 34mjQjjT j3m non-isomorphic subtrees.

We shall turn I into a nite counter-model for Q, using the blocking principle. The main obstacle is that Q uses transitive roles, which are not fully represented. To solve that, we replace Q with a di erent query. In [ 6 ], this was done by exploiting a bound on the length of simple r-paths for transitive roles r, guaranteed by the regularity of clique-forests of the considered counter-examples. Here, we require that the counter-example we process is complete and consider the distance between elements that are connected by r-paths, which will let us derive better bounds on the size of the obtained nite counter-models. De nition 9. A pair of elements a; b is at r-distance k if the shortest r-path from a to b is of length k. An interpretation is `-ranged if for each r -edge, its endpoints are at r-distance at most `.

For transitive role atoms in queries, the path that leads from one variable to another does not play any role in the query evaluation, only the unary types of the connected variables matter (and the fact that they are connected by some path). We use this intuition to deduce that in `-ranged interpretations, any query with transitive binary atoms can be rewritten so that closing the roles by transitivity does not in uence the query evaluation.

Let Q be obtained from Q by replacing each transitive atom s(x; y) by the disjunction Wi ` si(x; y), where si(x; y) is the conjunctive query expressing the i-fold composition of s. If each disjunct of Q contains at most k binary atoms, Q can be rewritten as a union of conjunctive queries, each using at most k ` binary atoms.

Lemma 2. For all `-ranged J , J j= Q i J j= Q .

Lemma 3. I n Ind(K) is (4p)-ranged.

Proof. Let r be a transitive role in K. Each r-path going down the clique-forest of I contains at most p nodes. Indeed, if there were a longer r-path, then a subtree would occur twice on that path, which immediately leads to an in nite simple r-path in I n Ind(K), contradicting the safety of I. For each clique node, at most one additional step is needed to traverse it with an r-edge. Additionally, if K is a SOI KB, an r-path can be composed of two parts, one going down the tree and the other consisting of r 1 edges going upwards. If follows that, for SOI and SOF , any r-reachable pair in I n Ind(K) is at r-distance at most 4p. tu

The following lemma, regarding a bound on the range of an interpretation, is a stronger version of the analogous lemma presented in [ 6 ], regarding a bound on the length of simple transitive paths. The bound on the range has to be multiplied by a constant to get the bound on the range in the full interpretation, whereas in [ 6 ], the size of the set of nominals also comes into play. We draw upon the individual-completeness of the interpretations we consider to achieve this improvement.

Lemma 4. For each individual-complete interpretation J , if J n Ind(K) is `ranged, then J is (2` + 3)-ranged.

Proof. Take two elements a; b, such that (a; b) is an r -edge in J , but not an r -edge in J n Ind(K); it follows that any r-path from a to b visits an individual. If r is non-transitive, then (a; b) is an r-edge and a; b are at distance 1, otherwise consider such r-path . It starts in one of the trees in J n Ind(K) (or in an individual, in which case the tree degenerates to an empty one), and reaches a nominal connected to this tree or the tree root. Eventually, enters the tree containing its other endpoint, again via a nominal connected to that tree or its root. Let c and d be the rst and last individuals on . As J is individualcomplete, (c; d) is an r-edge, if c 6= d. We can reduce to a path segment starting in a in the rst tree, a single edge to c, possibly a single r-edge (c; d), a single edge from d, and a path segment ending in b in the last visited tree. As the rst and last path segments are in J n Ind(K), each pair of elements in them is at r-distance at most `. This gives a (2` + 3) bound on r-distance in J . tu

We know that I is ` -ranged for ` = (8p + 3) by Lemmas 3 and 4. We conclude that I j6= Q by Lemma 2.

The degree of elements in I nInd(K) is bounded by jCN(K)j+jCI(K)j (elements are connected within clique-nodes of size at most jCN(K)j and with at most jCI(K)j + 1 elements from other nodes). Non-nominal individuals used in K can be connected with any number of other individuals, so even though their degree is bounded, the bound is relatively large. Nominals can be connected with an arbitrary number of elements.

We aim to use the blocking principle. We consider an n-proper colouring In0 of I0 = I n Ind(K), for some xed n, instead of a proper colouring of I; we do so to avoid the in uence of the individuals on the size of the neighbourhoods (and therefore the required number of colours). I0 is a clique-forest too: the children nodes of non-nominal individuals become new roots. On each branch in In0, let D be the rst node for which some earlier node E satis es NnIn (d ) ' NnIn (e ), where d 2 D and e 2 E are the endpoints of the edges connecting D and E to their parent nodes. The new interpretation Fn is obtained by including the branch up to the predecessor of node D , with the edge originally leading to d redirected to e . Individuals are copied from I, together with their adjacent edges. Note that for SOF , all functionality declarations are satis ed, as each redirected edge is a forward edge.

As each node in In0 has nitely many children and we cut all the in nite paths o , Fn is nite by K}onig's lemma. As we start with I j= K and the unary types of edge endpoints are preserved in Fn, we get Fn j= K . By the initial preprocessing, (Fn) j= K. The conclusions drawn so far are valid for any n.

Lemma 5. Fn n Ind(K) is (4p)-ranged for n

4p.

Proof. During the construction of Fn from I, we add only one link for each in nite downwards path in I starting in a tree root. Each of these links goes up the tree. By link origins we shall understand the elements that had an edge added during the construction of Fn. We can visualize Fn [ I as the core Fn with additional hanging subtrees, each attached to Fn with a single edge to a link origin. There is no way of accessing these subtrees from Fn n Ind(K) other than via the corresponding link origin.

Let ` = 4p and x some n `. We prove the following claim by induction on m: if there is a path of length ` + 1 from a in Fn n Ind(K) to b in (Fn [ I) n Ind(K) over a transitive role r using at most m links, then there is an r-path in (Fn [ I) n Ind(K) from a to b over r of length at most `.

If we prove the claim for all m, we can deduce that Fn n Ind(K) is `-ranged. Indeed, take a transitive role s and an s-path in Fn n Ind(K) of length ` + 1, with endpoints (a; b) and m links (possibly none). Applying the claim, we get an s-path in (Fn [ I) n Ind(K) of length at most `, connecting a and b. To show that has to be in Fn n Ind(K), we take advantage of the tree structure: if a path goes down a given branch, it has to use a link to come back to an element higher up in the tree. There are no links from elements in I n Fn, so must only visit nodes in Fn n Ind(K); otherwise would get stuck in I n Fn and have its endpoint there.

Let us approach the induction proof. The induction base for m = 0 is almost immediate. A path without links in (Fn [ I) n Ind(K) is also a path in I n Ind(K), which is `-ranged, as shown in Lemma 3.

Assume the inductive hypothesis holds for m. Suppose that for a transitive role s there is an s-path in (Fn [ I) n Ind(K) of length l + 1 from a to b, such that it uses at most m + 1 links. If uses fewer than m + 1 links, we simply use the inductive hypothesis; thus we can assume uses exactly m + 1 links.

As m 0, there is at least one link in . Let (d; e) be the last link in , with the witnessing element e0 in I n Ind(K). Note that the node containing e0 is not an ancestor of the node containing a in the clique-tree, as otherwise a would lie in a subtree rooted in e0 and would be removed while constructing Fn. Let a;d and e;b denote the parts of from a to d and from e to b. The path e;b is of length at most `. We have chosen n `, so the n-neighbourhood of e0 in I n Ind(K) contains a path e00;b0 isomorphic to e;b. The path consisting of a;d, the edge from d to e0 and e0;b0 is of length ` + 1, but uses only m links, so there is an s-path in (Fn [ I) n Ind(K) from a to b0 of length at most `.

Due to the construction of Fn, each path to e0 starting in Fn n Ind(K) must visit d. Let a;d and e0;b0 denote the parts of from a to d and from e0 to b0. Note that e0;b0 lies in I n Ind(K), so it does not use any links. At least one of the following inequalities holds: j a;dj < j a;dj or j e0;b0 j < j e;bj. If the rst one holds, we can take the path consisting of a;d, the edge from d to e and of e;b as an s-path from a to b of length at most `. If the second inequality holds, we use the fact that e0;b0 lies in the n-neighbourhood of e0 in I n Ind(K) and nd an isomorphic path 0e;b from e to b. The path consisting of a;d, an edge from d to e and of 0e;b is an s-path from a to b of length at most `. This proves the inductive hypothesis for m + 1. tu

Recall that p = 22 34mjQjjT j3m and l = 8p + 3. For n 4p, Fn is ` -ranged by Lemmas 4 and 5. The size of the expanded query Q is bounded by k ` . Furthermore, Fn j6= Q for n k ` by Theorem 1. We conclude that Fn j6= Q for n k ` by Lemma 2.

We can now approach the estimation of the size of Fn. The interpretation I is a counter-example, so in its clique-forest, each node is of size at most jCN(K)j and each element is connected to at most jCI(K)j children nodes. Children nodes of individuals in I form the roots of clique-trees in I0, so the number of trees in In0 is bounded by jInd(K)j jCI(K)j. As described before, the degree of elements in I0 is bounded by D = jCN(K)j+jCI(K)j, so the neighbourhoods in I0 of radius m consist of at most O(Dm) elements. It is enough to use C = O(D2n) + jNom(K)j colours to obtain a proper n-colouring of I0 by Lemma 1. There are 2jCN(K)j unary types, so there are L = 2jCN(K)j C ways of labelling a single element with its unary type and colour. The neighbourhoods of radius n are of size O(Dn), which means there are N = LO(Dn) di erent neighbourhoods in In0. On a downward path in Fn, we cannot come across the same neighbourhood centred in an element node or a clique origin, as otherwise a link upwards would be created higher up on this path; thus N bounds the depth of clique-trees in Fn. The degree of the clique-forest of In0 is bounded by D0 = jCN(K)j jCI(K)j, so the number of nodes in each tree is bounded by (D0)N .

The overall size of Fn is bounded by M S(Q; T ), where M = jInd(K)j jCI(K)j bounds the number of clique-trees in I0 and S(Q; T ) is a bound on the size of a single tree. The size estimation of a single tree is independent of the ABox size; it is only governed by the TBox and query. This nalizes the proof of Theorem 4.

Corollary 1. Finite entailment of UCQs by SOI and SOF KBs is coNP-complete

with respect to data complexity.

Proof. If we x the sizes of the TBox and query, then S(Q; T ) is xed and the size of Fn is linear with respect to the ABox size. We consider the following non-deterministic approach to disproving query entailment: we simply guess a transitively closed counter-model of size at most M S(Q; T ) and verify all the required properties. First, we need to check that it is indeed transitively closed. This can be done in polynomial time with respect to the interpretation size. Next, we need to verify ABox and TBox constraints; as the interpretation is transitively closed, for each element each constraint can be veri ed by looking at that element's immediate neighbours. Finally, we need to check that the query is not satis ed. This comes down to verifying a rst-order logic formula, which can be done in time polynomial of jInd(K)j, where the degree of the polynomial depends only on the sizes of the TBox and query. Altogether, this constitutes a coNP algorithm for deciding query entailment for xed-size TBox and query.

We have shown that nite query entailment is in coNP with respect to data complexity. It is known that (unrestricted) query entailment is coNP-hard (with respect to data complexity) in ALE [ 13 ]. By nite controllability, so is nite query entailment. Since ALE is a fragment of both SOI and SOF , NP-hardness carries over. tu 6

Conclusions

By enhancing the approach to nite query entailment presented in [ 6 ], we were able to show the linear-size counter-model property for SOI and SOF (for xed TBox and query). This let us show that nite query entailment is in coNP with respect to data complexity for these DLs. Basing on previous work, we deduced that it is a coNP-complete problem. To improve the nal size estimations, we also established a stronger version of the Lifting Theorem.

The techniques used to construct the nal counter-models were not e cient enough to provide asymptotically tight size estimations for arbitrary (not xed) TBoxes and queries. One limiting factor is the doubly-exponential bound on the range of the considered counter-models, stemming from the automata statespace size. It would be interesting to see if this could be improved. Another natural question is whether our techniques are applicable to SIF . Acknowledgements. The author was supported by Poland's National Science Centre grant 2018/30/E/ST6/00042. The whole work has been prepared with huge help from Filip Murlak, the supervisor of the author's Master thesis. We thank the anonymous reviewers of DL 2020 for their helpful comments.

Proof of Lemma 1

The following proof is taken from [ 6 ], where the lemma was formulated with no mention of the number of required colours. Let n 0. Because I n Nom(K) has bounded degree, 2n-neighbourhoods in I have size bounded by some m. We colour the elements of I one by one, with m colours. Pick an uncoloured element d. At most m 1 colours are already used in N2In(d). Assign to d any colour that is not yet used in N2In(d). This procedure gives an n-proper colouring. Indeed, consider di erent e, e0 from NnI(d) for some d 2 I. Without loss of generality we can assume that e was coloured before e0. But e belongs to N2In(e0), so the colours of e and e0 are di erent by construction.

Lifting Theorem

B.1

Formulation De nition 10. For i n, let Ini be the interpretation obtained from In by including into the interpretation of each role r all i-links along r; that is, for i every role r and every i-link (d; e) along r, (d; e) 2 rIn .

Notice that for i < j, each j-link is also an i-link. Note also that (d; e) is an i-link along role r if and only if (e; d) is an i-link along r . We have In

n In

n 1 In

1 In

0 In ; but the domains of all these interpretations coincide. We keep referring to the edges present in Ini but not in In as i-links, even though they are ordinary edges now.

De nition 11. We say that functions f; f 0 : X ! if for all x 2 X:

NdI(f (x)) ' NdI(f 0(x)):

I are d-indistinguishable

Throughout the paper, we consider neighbourhoods with respect to Const = Nom(K), however, all the de nitions reliant on the notion of neighbourhoods are correct for every choice of the set of constants Const. Let us note that for every pair of d-indistinguishable functions f; f 0, if for some x 2 X and c 2 Const either f (x) = c or f 0(x) = c, then f (x) = f 0(x). This is because whenever we consider any neighbourhoods, we add a nominal-like concept for each constant.

Theorem 5. Let P be a CQ with at most k binary atoms and let n k. For each homomorphism h : P ! Inn there exists a homomorphism h0 : P ! In which is (n k)-indistinguishable from h.

Theorem 5 holds for any interpretation I of any SOIF KB. It was originally stated in [ 6 ] for n k2 and regarded a pair of (n k2)-indistinguishable homomorphisms. Furthermore, it was proved only for Const = Nom(K), and here we consider an arbitrary choice of the set of constants Const. We prove the enhanced version of the theorem in the next subsection; let us now see that it implies the Lifting Theorem (Theorem 1).

Proof (Lifting Theorem). Interpretation Jn we obtain in the process satis es Jn Inn. Consequently, if there was a homomorphism h : P ! Jn Inn for some CQ P constituting Q, Theorem 5 would yield a homomorphism h0 : P ! In, contradicting I j6= Q. tu B.2

Proof of Theorem 5 We start with the following observation.

Lemma 6. All links involving elements of Const are ordinary edges. Proof. Assume (d; e) is an i-link for d in Const (the symmetrical case is analogous). Assume that d has a successor f such that NiIn (f ) ' N In (e). Then, the i edge (d; e) must also exist in N In (e), which shows that (d; e) is an ordinary edge.

i Assume now that e has a predecessor f such that N In (f ) ' NiIn (d). Then, f i must be equal to d, due to the concepts used to mark constants in the de nition of a neighbourhood, so (d; e) is an ordinary edge. tu

Let h(P ) denote the subinterpretation of Inn obtained by restricting the domain to h(var(P )), and only keeping in each role r edges (h(x); h(y)) such that r(x; y) is an atom from P . We say that h uses an r-edge of Inn if this r-edge is present in h(P ). We say that h uses a link over r of Inn if this link is an edge of h(P ), but not an edge of In.

De nition 12. By J J =

J 0 \ J 00 , AJ = AJ 00 \\AJJ0000 wfoer malelacnontcheeptinntaemrpersetAat,ioanndJrJsu=chrJth0a\trJ 00 for all role names r.

Let us analyze the structure of h(P ). Let In0 = In nConst and let C denote the set of connected components in h(P )\In0, understood as graphs. Each component in C contains at most k edges, as P contains at most k binary atoms. We shall now prove that h is far-linked ; that is, h does not use links that join two elements within the same connected component in C. This is a general property of such homomorphisms, which stems from the fact that each link connects elements that are far apart, as expressed in the following lemma.

Lemma 7. Elements forming a link are at distance at least 2n in In0.

Proof. Let (a; b) be a link along r. Suppose that the distance between a and b in In0 is smaller than 2n. Take the shortest path in In0 between a and b; is of length at most 2n 1. As (a; b) is a link, either a has an r-successor f in In0 with the same colour as b, or b has an r-predecessor f in In0 with the same colour as a. In the rst case, there is an element on , which is at distance at most n from both b and f , in the second case, there is an element on , which is at distance at most n from both a and f . Since a; b; f share the same colour, are pairwise unequal and are in an n-neighbourhood of another element, we obtain a contradiction. Thus, elements forming a link must be at distance at least 2n in In0. tu Lemma 8. Each homomorphism h00 : P ! Ini is far-linked.

Proof. Let r(x; y) be an atom of the query P such that h00(x) and h00(y) are in the same connected component C 2 h00(P ) \ In0. As P has at most k binary atoms, elements in C are connected with at most k edges. Since h00(x) and h00(y) are both in C, they at distance at most k < 2n in In0 (unless k = 0; we verify this case separately). We use Lemma 7 to deduce that (h00(x); h00(y)) is an r-edge in tu In0.

We see that all the links used by h join elements from di erent connected components in C. We will construct a series of homomorphisms (h0; h1; :::; hl), with l k, hj : P ! Innj , nj < nj 1 for each j > 0, starting with h0 = h and n0 = n, satisfying the following constraints. First, we require that hj+1 uses strictly fewer links than hj . Second, we require that hj+1 and hj are (nj+1)indistinguishable. Third, we require that hl uses no links and that nl n k. If we manage to satisfy these constraints, hl treated as h0 : P ! In is the homomorphism postulated in the theorem statement.

Let Cj denote the connected components of hj (P )\In0. We keep the invariant that (Cj \ C) is non-empty for j < l. We construct the components graph of Cj : the graph with elements of Cj as vertices and edges between vertices representing connected components from Cj that are joined by a link between their elements. The second invariant is that for j > 0, there is exactly one connected component in Cj that is not present in Cj 1; let Dj0 denote that component. If j > 0 and there is a component Dj 2 (Cj \ C) joined by a link to Dj0 , set Cj = Dj and Cj0 = Dj0 . Otherwise, either there are Cj ; Cj0 2 (Cj \ C) joined by a link, or hj uses no links and we nish the construction. In case Cj ; Cj0 exist, homomorphism hj+1 is obtained from hj by pulling Cj closer to Cj0 by the link that joins these components; as a result, Cj ; Cj0 are combined into one component, namely Dj0+1. The details of that construction are presented in the next subsection.

We will later show that hj+1 uses links for a subset of these binary atoms of P , for which hj uses links and which do not join Cj with Cj0 ; the links joining Cj and Cj0 in hj are replaced with regular edges of In in hj+1. We compensate for eliminating these links by paying a certain cost: homomorphism hj+1 is less strict than hj when it comes to preserving the neighbourhoods of the elements joined with a link, which is why we have to set nj+1 = nj jCj j.

We will now prove that nj n k for all j. This will entail that h0 is indeed (n k)-indistinguishable from h. We use the following invariant: altogether there are at least n nj edges in connected components from (Cj n C). This can be proved by induction on j. The base step for j = 0 reduces to the statement that an empty graph has no edges. In the inductive step, we pull the component Cj towards C0 , merging the two within a larger connected component. We have that j Cj 2 Cj is in C and the new connected component Dj0+1 2 Cj+1 is not in C. This means that the number of edges in connected components from Cj+1 n C must be larger than the number of edges in connected components from Cj n C. The total increase is equal to at least jCjj; this stems from incorporating jCjj 1 edges within Cj, and from reducing at least one link. Using the inductive hypothesis, we get that there are at least n nj edges in connected components from Cj n C, and at least n nj + jCjj edges in connected components from Cj+1 n C. As nj+1 = nj jCjj, we get n nj + jCjj = n nj+1, which proves the inductive step. If there was j such that nj < n k, the number of edges in connected components from (Cj n C) would be at least n nj; as n nj > n (n k) = k, this would contradict the fact that hj(P ) has at most k edges.

B.3

De nition of hj+1 In this subsection, we mark the names of interpretation elements with bold colour fonts. Whenever we mention distance between elements, we refer to distance in

We are going to de ne hj+1 based on hj and Cj; Cj0 2 Cj.

Let (d; e) be a link along s used by hj and joining Cj with C0 . We only j consider the case when d 2 Cj0 and e 2 Cj; the proof for the other case is symmetrical. Either d has an s-successor e0 such that NnIjn (e) ' NnIjn (e0), or e has an s-predecessor d0 such that NnIjn (d) ' NnIjn (d0); once again we take advantage of almost full symmetry and only present the proof for the rst case.

Let g : NnIjn (e) ! NnIjn (e0) be the witnessing isomorphism for the (d; e) link. We de ne hj+1 : P ! Innj+1 for nj+1 = nj jCjj as follows: for each z 2 var(P ), let hj+1(z) = g(hj(z)) if hj(z) 2 Cj, and hj+1(z) = hj(z) otherwise. To ensure that the de nition of hj+1 is correct, we need to show that nj+1 0. Altogether, components in C have at most n edges (more precisely, at most k n edges), and we have proven an invariant that there are at least n nj edges in connected components from (Cj n C). As Cj is in (Cj \ C), and (d; e) is a link, Cj can have at most n 1 (n nj) = nj 1 edges. Thus, nj+1 = nj jCjj 0.

By de nition, hj and hj+1 are (nj+1)-indistinguishable: NnIjn (e) ' NnIjn (e0) and nj+1 = nj jCjj, so g preserves (nj+1)-neighbourhoods of elements within distance jCjj from e. We need to verify that hj+1 is indeed a homomorphism.

Let r(x; y) be an atom of the query P and let p = hj(x); q = hj(y), p0 = hj+1(x), q0 = hj+1(y). We have to consider the following cases:

Case 1. By de nition, (p0; q0) = (p; q). As hj is a homomorphism into Innj Innj+1 , we have that (p0; q0) is an r-edge in Innj+1 . Clearly, hj+1 uses no new links for such atoms.

Case 2. By de nition, (p0; q0) = (g(p); g(q)). We assumed that p and q are both in Cj, and connected components in Cj contain at most k 1 edges (we subtract one, because (d; e) is a link). This means p; q are at distance at most k 1 < 2n, and thus (p; q) is an r-edge in In by Lemma 7. As g(p) and g(q) are both in Dj0+1 2 Cj+1, they are at distance at most k. This means that (p0; q0) is an r-edge in In.

Case 3. By de nition, (p0; q0) = (p; g(q)). As g preserves (nj+1)-neighbourhoods of elements within distance jCjj from e and the distance between q and e is at most jCjj, we see that NnIjn+1 (q) ' NnIjn+1 (g(q)). Let us analyze the three subcases.

Subcase 3 (a). As p 2 Cj0 and q 2 Cj, and Cj0 and Cj are di erent connected components in Cj, we have that (p; q) is a link along r. Since Cj0 and Cj are merged together into Dj0+1, both p and g(q) are in Dj0+1. This means they are at distance at most k from each other, which allows us to use Lemma 7 to deduce that (p0; q0) is an r-edge in In.

Subcase 3 (b). Like above, p and q are in di erent connected components from Cj, so it must be that (p; q) is a link along r. We need a di erent line of reasoning than in the previous case, since we cannot assume Cj and Cj00 are merged together. Since (p; q) is an nj-link, it is also an (nj+1)-link. We have two subcases to consider.

Suppose that p has an r-successor f in In0 such that NnIjn+1 (f) ' NnIjn+1 (q) ' NnIjn+1 (g(q)). We see that then (p; g(q)) is an (nj+1)-link. If f = g(q)), then (p0; q0) is an r-edge in In. Furthermore, the connected component Cj00 of p merges with Cj0 ; all three components Cj, Cj0 and Cj00 form subgraphs of Dj0+1. As elements in Dj0+1 are at distance at most k from each other, there can be no links between them by Lemma 7. This means that all the links between Cj00 and Cj used by hj become regular edges in hj+1. We also see that hj cannot use any links between Cj00 and Cj0 , as elements of these components are at distance at most k from each other in In0. The other possibility is that f 6= g(q)), in which case (p0; q0) is a link along r. By analogous reasoning, this also entails that hj+1 uses links for all the links used by hj which join Cj and the connected component of f.

Suppose now that q has an r-predecessor f in In0 such that NnIjn+1 (f) ' NnIjn+1 (p). We prove in the same way as above that either (p0; q0) is an r-edge in In0 and the connected component of p and Cj0 merge, or (p0; q0) is a link along r.

Subcase 3 (c). (p; q) is an r-edge in In by Lemma 6. As NnIjn+1 (q) ' NnIjn+1 (g(q)), (p; g(q)) must be an r-edge in NnIjn+1 (g(q)). Thus, (p0; q0) is an r-edge in In.

This proves that hj+1 is a homomorphism, which uses links for a subset of these binary atoms of P , for which hj uses links and which do not join Cj with C0 .

j C C.1

Proof of Theorem 3

Construction C.1.1 Outline The authors of [ 6 ] present a product of tree automata that independently verify the following acceptance conditions: { consistency of the forest encoding, { satisfying the constraints from the knowledge base K { not satisfying the query Q, { path safety. = (T ; A),

We enhance their construction by adapting these components and adding veri cation of individual- and nominal-completeness. Our automaton is parametrized by a xed subinterpretation M induced by individuals (note that it is not the case in [ 6 ]).

We describe the construction with SOI in mind, but it adapts easily to SOF . The only di erences are that for SOF , we do not validate parts referring to inverse roles, and we add another component verifying functionality declarations, which does not a ect our size estimations. This component is decribed in the last subsection.

C.1.2 Preprocessing The interpretations we work with are preprocessed to cater for transitivity: for an interpretation I, if I j= K, then also I j= K. This is achieved by supplementing the interpretation with auxiliary concepts expressing universal restrictions on transitive roles, as described under De nition 5: for each pair of B 2 CN(K) and a transitive role r, we add a fresh concept name Pr;B, axiomatised as Pr;B v 8r:(B u Pr;B), and replace each CI of the form A v 8r:B with A v Pr;B.

The information about the edges to and from nominals is also stored in auxiliary concepts: for each a 2 Nom(K) and r 2 Rol(K) fresh concept names Nr;a and Nr ;a are added to the knowledge base, axiomatised as Nr;a 9r:fag and fag 8r:Nr ;a. These concepts do not substantially increase the size of the automaton.

We aim to accept only these counter-examples which are complete. First, we tackle nominal-completeness. For each a 2 Nom(K) and r 2 TRol(K) we add the following concept inclusions to TBox: Nr;a v 8r:Nr;a and Nr ;a v 8r:Nr ;a. This ensures that whenever for a transitive role r there is no edge r from some element to a nominal (or vice versa), then there is also no r-path from that element to the nominal.

Recall that we assume the completeness of M (the subinterpretation induced by individuals). Consequently, any path along a transitive role between two individuals not connected with a single edge must enter some clique-tree and lead through a nominal. Additional CIs described in the last paragraph ensure that whenever there is a path from an individual to a nominal in the clique-tree associated with that individual, these elements are also connected with a single edge. If there was a path breaking individual-completeness of the whole forest, it would entail that also nominal-completeness does not hold. Thus, individualcompleteness follows from nominal-completeness.

C.1.3 Automaton To make clique-forests accessible to automata, we encode them as nitely labelled forests. Let TRol(K) be the set of transitive roles from Rol(K), and let [X] k be the family of subsets of X of size at most k. In the encoding, nodes are labelled with elements of the alphabet = Tp(K) [

TRol(K) [Tp(K)] jCN(K)j and edges are labelled with elements of the alphabet = Tp(K)

Rol(K)

Tp(K) :

We build an automaton parametrized by the subinterpretation M. Let I denote the interpretation whose clique-forest is encoded in the input forest. The automaton runs on the encoding of the clique-forest of I n Ind(K) (not of I, like in [ 6 ]; this change is motivated by the way we modify the query component). We say that an individual a is the super-root of a clique-tree t in I n Ind(K), if the root of t is a successor of a in I; we also say that t is super-rooted in a.

For a complete counter-example I, we obtain the encoding of the clique-forest for I n Ind(K) as follows. We x some order on individuals and order the trees in I n Ind(K) so that they are sorted with respect to the order on their superroots. We add a separating tree stub (represented with a fresh symbol added to ) between contiguous blocks of trees super-rooted in the same individual; it is only used as a marker for the automaton so that it can recognize which trees are associated with a given individual. All the components of the automaton always accept these tree stubs.

Next, we label each element node with the single unary type it realises and each clique node with its single nonempty role and the set of unary types it realises. We do not represent nominals explicitly in the encoding, but thanks to the initial preprocessing, all relevant information about them is contained in the unary types of the remaining elements. Finally, if in I n Ind(K) there is an r-edge from an element of type in some parent node to an element of type in some child node, then in the encoding the edge from the parent node to the child node is labelled with ( ; r; ). Because unary types do not repeat within cliques, this uniquely determines the endpoints.

There are at most N = jInd(K)j jCI(K)j trees in the input forests, with branching bounded by D = jCN(K)j jCI(K)j. Transition relation has the form Q (

Q) D ; where Q is the set of states. The automaton processes the forests top down. Whenever a node is reached after an edge labelled ( ; r; ), we refer to the unique element of type in that node as the current element, and to the unique element of type in the parent node as the parent element. The initial states are speci ed for each tree separately: the automaton has a set I Q N of sequences of initial states. A run is a labelling of the input forest with states in such a way that the sequence of states in the roots belongs to I, and if a node has state q, label , and its children are connected via edges with labels 1; 2; : : : ; n and have states q1; q2; : : : ; qn, then q; ; ( 1; q1); : : : ; ( n; qn) 2 :

We use Buchi acceptance condition: we specify a set F Q of marked states that need to be revisited, and consider a run accepting if on each branch marked states occur in nitely often. A forest is accepted by the automaton if there exists an accepting run over it.

An automaton has trivial acceptance condition if F = Q. Then, each run is accepting but the automaton may still reject some forests, because there may be no run for them: a branch of the computation can get stuck if no transition is consistent with the current state, label and edge labels. An automaton is weak if on each branch of each run, once a marked state is visited, all subsequent states are marked. Notice that all automata with trivial acceptance condition are weak. Given a weak automaton and an arbitrary Buchi automaton it is particularly easy to construct an automaton recognising trees accepted by both input automata: it su ces to take the standard (synchronous) product automaton and mark all states that contain a marked states on both coordinates. C.1.4 Automata components The nal automaton is obtained as a product of automata verifying independently various parts of the condition.

The rst thing to check is the consistency of the encoding: if an edge has label ( ; r; ), then must occur in the label of the parent node, and must occur in the label of the child node. To check this, it su ces to examine for each node the labels of all edges incident to it plus the label of the node itself. When a transition is made, all these are available except the label on the edge to the parent: it must be stored in the state. The automaton has j j states and trivial acceptance condition.

The second thing to check is that the clique-forest is a model of K . If M does not satisfy the restrictions from the ABox A of K , the nal automaton rejects everything. form

To verify that the TBox is satis ed we need to check each CI. For CIs of the u Ai v tj Bj i

A v 8r:B we have a two-state automaton with trivial acceptance condition that simply tests that each type used in the encoding satis es this CI. If the type of some individual violates this CI, the nal automaton rejects everything. To verify CIs of the form for elements in I nInd(K), it su ces to check that in the input interpretation there is no r-edge from an element whose unary type contains A to an element whose unary type contains B. This amounts to verifying that none of the following are used in the encoding: { node labels (r; T ) such that A 2 2 T and B 2 2 T for some , ; { edge labels ( ; r; ) with A 2 and B 2 ; { edge labels ( ; r ; ) with A 2 and B 2 ; { unary types containing both A and Nr;b for some b such that b 2 BM; { unary type containing both B and Nr ;b for some b such that b 2 AM. These conditions simply disallow certain labels; they can be checked by a twostate automaton with a trivial acceptance condition.

Let us take a CI of the form

A v 9r:B : For elements in I n Ind(K), this condition can be tested in a similar way as above, except that one needs access to the label of the current node and all edges incident to it. Like for the initial consistency check, it su ces to store in the state the label of the edge to the parent. To ensure the existential restrictions are met for nominals, for each nominal a such that a 2 AM, if that CI is not met in M, we construct a two-state weak automaton looking for a label that uses a type such that B 2 and Nr ;a 2 . Note that this automaton has a nontrivial acceptance condition, but it is weak: as soon as it nds an appropriate label, it loops in a marked state. The initial lists of states for that weak consist of a single unmarked state and all the remaining initial states already marked.

We also have to verify that all existential and universal restrictions are met for non-nominal individuals. The former are ensured by constraining initial lists of states so that all the existential restrictions for an individual not satis ed in M are satis ed by roots of clique-trees super-rooted in that individual. If the universal restrictions are not met in M, the nal automaton rejects everything; otherwise, we ensure that the restrictions hold for non-individual elements connected to individuals by constraining initial lists of states so that the roots of clique-trees super-rooted in a given individual do not break universal restrictions imposed on that individual.

Summing up, the total size of the state-space of the TBox component is not larger than 2jCI(K)j 2jCI(K)j j jjCI(K)j 2jNom(K)j jCI(K)j. The rst three factors stem from the sizes of automata used to verify on I n Ind(K) all the CIs of the forms u Ai v tj Bj , A v 8r:B, and A v 9r:B, respectively. The last factor stems i from the veri cation of existential restrictions for nominals.

Next, we verify that the query cannot be satis ed. We need an automaton recognizing counter-examples such that the number of states in the automaton does not depend on the number of individuals in K. This is achieved by making sure each tree in the forest is processed without explicitly referring to the knowledge about individuals.

In [ 6 ], all the information about edges between individuals is transferred to the TBox. In particular, this means that the query parts are self-contained in single trees, as individuals are not connected (and the edges to nominals are hidden from the automata perspective). It lets the query component build partial query matchings independently for each tree, but at the cost of enlarging the TBox.

Here, we cannot use the same approach: the number of states depends on the number of unary types in particular, which would be tied to the ABox size if we were to store the information about edges between individuals in concepts. Matchings of connected parts of the query are no longer contained in single trees: they might span over multiple trees. We have to accommodate for that by enhancing the query component.

We start by replacing query Q with a query Q0 such that for each model I of K: I j= Q i (I n Nom(K)) j= Q0. The query Q0 is obtained in two steps.

In the rst step, we account for the possibility that binary atoms of the form r(x; y) for transitive r can be satis ed with paths that cross di erent trees. Take such path , and let a and b denote respectively the rst and the last individual on (possibly a = b). Thanks to the individual-completeness property, can be reduced to a path consisting of: a path segment ending in a and contained in a single tree, a single edge from a to b (if a 6= b), and another path segment contained in a single tree. To detect such paths, for each CQ P constituting Q, we add to Q each CQ that can be obtained from P by subdividing some transitive atoms; that is, by replacing some atoms of the form r(x; y) either with atoms r(x; z) and r(z; y) for a fresh variable z, or with atoms r(x; z1), r(z1; z2), r(z2; y) for fresh variables z1; z2.

In the second step, we account for the in uence of nominals on the query satisfaction. For each CQ P of the modi ed Q, we add to Q each CQ that can be obtained from P by performing the following operation any number of times. Let tp(x) be the set of all A such that P contains A(x). Choose x 2 var(P ) and a 2 Nom(K) such that a 2 AM whenever A 2 tp(x). Drop all atoms of the form A(x) from P . Replace in P each atom of the form r(y; x) by Nr;a(y) and each atom of the form r(x; y) by Nr ;a(y).

The resulting query Q0 satis es the condition I j= Q i (I nNom(K)) j= Q0. After the rst step, the number of CQs grows by the factor 3m and their size is at most 3m. After the second step, the number of CQs grows by the factor jNom(K)j3m and their size is still at most 3m. Thus, the size of the resulting query is at most jQj 3m jNom(K)j3m, and its CQs have size at most 3m.

For each CQ P of the preprocessed union of queries Q0, we construct an automaton that ensures that I j6= P . Its states consist of the parent edge label and a set F representing all partial matchings of P to elements represented in the subtree rooted in the currently processed node.

The set F consists of partial functions f : var(P ) ! L, with L = fsucc; otherg, representing di erent ways variables can be assigned in relation to the parent element. Each such function represents a partial matching, that is a partial assignment of variables such that the restricted query induced by these variables is satis ed (not all the collected partial matchings can potentially be extended to a total match; discussion on that part is presented in the next subsection). The identi er succ is used for each variable mapped to an element in the current subtree that is an r-successor of the parent element in I n Ind(K). If r is non-transitive, this simply means the current element (the clique origin, if we are processing a clique node). If r is transitive, it means any element r-reachable from the current element. The identi er other is used for variables assigned to any other elements in the current subtree.

The rationale behind the automaton is that it constructs all the possible partial matchings across the forest. The trick here is that the states reached before traversing a given node already contain the information about all the partial matchings that will be constructed in the subtree rooted at that node. The automaton will have at most one accepting run on each forest|the states in the initial list of states in that run will contain all the possible partial matchings across the forest.

We de ne the transition relation in the next subsection. The de nition will ensure the intended semantics of the states; when processing a given node, sets F in the automaton state will represent all partial matchings of P to elements represented in the subtree rooted in that node.

All states are accepting. Apart from the partial matchings collected in trees rooted in children nodes of individuals, we also compute all partial matchings in M n Nom(K) (the possibility that some variables are assigned to nominals has already been accounted for in the second step of the query preprocessing). As lists of initial states, we take all lists of states such that a total match over the whole forest cannot be constructed by fusing together a partial matching in M n Nom(K) and one partial matching per initial state. We know how to stitch these partial matchings together, because we know the edges from super-roots to new clique-tree roots and the way we collect partial matchings in a cliquetree accounts for the label on that edge. This sums up the query automaton construction. This extra information about the edges from super-roots motivates the approach of traversing clique forests built of trees rooted in children nodes of individuals, rather than trees rooted in individuals.

The last component of the automaton checks that the input interpretation is safe. Observe that it is unsafe if in the input clique-forest there is a branch with consecutive node and edge labels 1 1 2 2 : : : such that for some transitive r and all i large enough, i = ( i; r; i) and either i = i+1 (consecutive edges are incident in the clique-forest) or i+i = (r; Ti) (consecutive edges are incident with an r-clique). An automaton can easily check that there is no such branch. Each time it sees a transitive role it moves to an unmarked state, storing the role. It moves to a marked state as soon as the condition above is broken. The automaton has jTRol(K)j states.

The automaton recognising safe counter-examples can be obtained from these components by the simple product construction described above, because only the last component is not weak.

C.2

Transition relation of the query component Below we describe the set of transitions of the query satisfaction component of the product automaton.

A transition

(( ; F ); ; (( i; ( i; Fi))in=1) over an element node of unary type , with = ( ; r; ) and i = ( ; ri; i), is a transition of the automaton if F is the set of all functions f : var(P ) ! L which are both upward- and downward-compatibile with some witnessing functions f1 2 F1; f2 2 F2; :::; fn 2 Fn; we de ne these conditions below.

Fix a function f : var(P ) ! L and a sequence of domain-disjoint functions (f1; f2; :::; fn), with fi : var(P ) ! L.

f is upward-compatibile with functions fi if for each variable x 2 var(P ): { if 9i : fi(x) = other, then f (x) = other, { if 9i : fi(x) = succ, then f (x) = succ if r is transitive and ri = r, otherwise f (x) = other, { otherwise either x 2= dom(f ), or f (x) = succ and tp(x) , which we interpret as matching x to the current element. In the latter case x is in (dom(f ) n S dom(fi)).

f is downward-compatibile with functions fi if for each atom s(x; y) of P : { if x 2 (dom(f ) n S dom(fi)) and y 2 dom(fi), then ri = s and fi(y) = succ, { if y 2 (dom(f ) n S dom(fi)) and x 2 dom(fi), then ri = s and fi(x) = succ, { if x 2 dom(fi) and y 2 dom(fj) for i 6= j, then s is transitive, ri = rj = s, and fi(x) = fj(y) = succ.

Note that we only seem to notice binary atoms for which both variables are already present in the subtree. When a variable gets the identi er other and there are unsatis ed binary atoms containing that variable, we know that such a matching cannot be extended to a total matching. We still gather such matchings in the automaton, which might be unintuitive. For clarity, we can forget such unextendable matchings in the automaton runs by adding the following condition to the downward-compatibility de nition:

{ if fi(x) = other or fi(y) = other, then x; y 2 dom(fi).

The transitions over clique-nodes are described using analogous compatibility concepts, adapted to the clique-node setting. A transition

(( ; F ); (r0; T ); (( i; ( i; Fi))in=1) over a clique node encoded by a role r0 and a set of unary types T , with = ( ; r; ) and i = ( i; ri; i), is a transition of the automaton if F is the set of all functions f : var(P ) ! L which are both upward- and downward-compatibile with some witnessing functions f1 2 F1; f2 2 F2; :::; fn 2 Fn; we de ne these conditions below.

Fix a function f : var(P ) ! L and a sequence of domain-disjoint functions (f1; f2; :::; fn), with fi : var(P ) ! L.

f is upward-compatibile with functions fi if for each variable x 2 var(P ): { if 9i : fi(x) = other, then f (x) = other, { if 9i : fi(x) = succ, then f (x) = succ if r is transitive, ri = r, and either i = or r0 = r, otherwise f (x) = other, { otherwise one of the following holds: x 2= dom(f ), f (x) = succ and tp(x) , which we interpret as matching x to the current element, f (x) = other and tp(x) 0 2 T n f g, which we interpret as matching x to an element in the current clique-node.

f is downward-compatibile with functions fi if for each atom s(x; y) of P : { if x 2 (dom(f ) n S dom(fi)) and y 2 dom(fi), then ri = s, fi(y) = succ, and either tp(x) i, or s is transitive and r0 = s, { if y 2 (dom(f ) n S dom(fi)) and x 2 dom(fi), then ri = s , fi(x) = succ, and either tp(y) i, or s is transitive and r0 = s, { if x 2 dom(fi) and y 2 dom(fj) for i 6= j, then s is transitive, ri = rj = s, fi(x) = fj(y) = succ, and either i = j or r0 = s.

Once again, we can enhance the downward-compatibility de nition to get rid of unextendable partial matchings; it is enough to add the same condition as before.

To see that this transition relation ensures the intended semantics of the states one needs to argue that each partial matching is accurately represented. This can be done by induction on the size of the image. For size one, the matching will be accounted for based solely on the labels. For larger images, use the inductive hypothesis for restrictions of the match to variables mapped to the trees rooted at the children of the current node.

C.3

Size estimation Let us approach the size estimation of the automaton state space. We start by analyzing the factors that contribute to the size of automata components. Note that k and m denote, respectively, the maximum number of binary atoms and the maximum size of a single conjunctive query in the union of CQs Q.

Measured entity number of unary types size of node alphabet size of edge alphabet number of trees in a forest

clique-trees branching size of preprocessed query

Notated as jTp(K)j j j j j N D jQ0j

The bound of the size of the unary types set is not just 2jCN(K)j due to the encoding of neighbouring nominals, which requires O(jNom(K)j jRol(K)j) new concepts and concept inclusions.

The size estimation of the automata components is presented below.

Component consistency of encoding

TBox constraints: - CIs of form: uAi v tBj - CIs of form: A v 8r:B - CIs of form: A v 9r:B

path safeness query unsatisfaction

States count upper bound

j j 2jCI(K)j 2jCI(K)j j jjCI(K)j 2jNom(K)j jCI(K)j jT3R3mol(K)j 2 jQ0j

By taking a product of the sizes of automata components, we bound the number of states of the product automaton by 22 34m jQj jT j3m , where jT j = jCI(K)j + jCN(K)j + jNom(K)j jTRol(K)j. The factor 2 in the exponent is introduced to eliminate summands of lower order. The query component makes the size of the state-space of the product automaton double exponential. This nalizes the proof of Theorem 3.

As a side note, an automaton with k states has total size O(k j j (k j j)N + kN ), which in our case can be expanded to 2jQj jKjO(m) .

C.4

Functionality component for SOF We include an additional component for each functionality declaration Fn(r). To check functionality of r for ordinary nodes it su ces to examine the label of the node and the labels on all incident edges, which only requires storing in the state the label of the edge to the parent. Additionally, for all a 2 Nom(K): if a 2 (Nr;b)M and a 2 (Nr;b0 )M for some b 6= b0, the nal automaton rejects everything; if a 2 (Nr;b)M, the automaton checks that no type used in the input forest contains Nr ;a; if a 2= (Nr;b)M for each b 2 Nom(K), the automaton checks that a type with Nr ;a occurs at most once in the input forest. The total number of states in the described component is j jjCI(K)j 2jNom(K)j jCI(K)j, so including it does not a ect the overall upper bound.

Proof of Lemma 2

For each transitive role s, all the s-atoms in Q are replaced with disjunctions of s-chains of lengths from 1 to ` in Q . If such an atom can be satis ed in J for elements a and b, then there is an s-path from a to b of length at most `. This shows that if J j= Q , then J j= Q. On the other hand, if J j= Q, then each transitive role atom must be satis ed by two endpoints of some s-path from J , of length at most `, which satis es the subquery replacing this atom in Q . This proves the converse implication.

1. Baader , F. , Brandt , S. , Lutz , C. : Pushing the EL envelope . In: Proc. Int. J. C. on Artif. Int . pp. 364 { 369 ( 2005 )

2. Baader , F. , Calvanese , D. , McGuinness , D.L. , Nardi , D. , Patel-Schneider , P.F. : The Description Logic Handbook: Theory, Implementation and Applications . Cambridge University Press, New York, NY, USA, 2nd edn. ( 2010 )

3. Bienvenu , M. , Ortiz , M. : Ontology-mediated query answering with data-tractable description logics . In: Reasoning Web. LNCS , vol. 9203 , pp. 218 { 307 . Springer ( 2015 )

4. Calvanese , D. , Giacomo , G.D. , Lembo , D. , Lenzerini , M. , , Riccardo: Tractable reasoning and e cient query answering in description logics: The DL-Lite family . J. of Autom. Reasoning 39 ( 3 ), 385 { 429 ( 2007 )

5. Glimm , B. , Lutz , C. , Horrocks , I. , Sattler , U. : Conjunctive query answering for the description logic SHIQ . J. Artif. Intell. Res . 31 , 157 { 204 ( 2008 )

6. Gogacz , T. , Iban~ez-Garc a , Y.A. , Murlak , F. : Finite query answering in expressive description logics with transitive roles . In: Proc. KR 2018 . pp. 369 { 378 ( 2018 )

7. Gogacz , T. , Marcinkowski , J.: On the BDD/FC conjecture . In: PODS . pp. 127 { 138 ( 2013 ). https://doi.org/10.1145/2463664.2463668

8. Hustadt , U. , Motik , B. , Sattler , U. : Reasoning in description logics by a reduction to disjunctive datalog . J. Autom. Reasoning 39 ( 3 ), 351 { 384 ( 2007 )

9. Lutz , C. , Wolter , F. : The data complexity of description logic ontologies . Log. Met. in Comp. Sci . 13 ( 4 :7), 1 { 46 ( 2017 )

10. Ortiz , M. , Calvanese , D. , Eiter , T. : Characterizing data complexity for conjunctive query answering in expressive description logics . In: Proc. Nat. C. on Artif. Int . pp. 275 { 280 . AAAI Press ( 2006 )

11. Rudolph , S. : Undecidability results for database-inspired reasoning problems in very expressive description logics . In: KR . pp. 247 { 257 . AAAI Press ( 2016 )

12. Rudolph , S. , Glimm , B. : Nominals, inverses, counting, and conjunctive queries or: Why in nity is your friend! J. Artif. Intell. Res . 39 , 429 { 481 ( 2010 )

13. Schaerf , A. : On the complexity of the instance checking problem in concept languages with existential quanti cation . J. of Intel. Inf. Systems 2 , 265 { 278 ( 1993 )