=Paper=
{{Paper
|id=Vol-2663/paper-13
|storemode=property
|title=Data Complexity of Finite Query Entailment in Description Logics with Transitive Roles
|pdfUrl=https://ceur-ws.org/Vol-2663/paper-13.pdf
|volume=Vol-2663
|authors=Jakub Kuklis
|dblpUrl=https://dblp.org/rec/conf/dlog/Kuklis20
}}
==Data Complexity of Finite Query Entailment in Description Logics with Transitive Roles==
Data complexity of finite query entailment in
description logics with transitive roles
Jakub Kuklis
jk.kuklis@gmail.com
University of Warsaw, 02-097 Warszawa, Poland
Abstract. We study the problem of finite ontology-mediated query an-
swering (FOMQA), the variant of OMQA where the represented world
is assumed to be finite, and thus only finite models of the ontology are
considered. We adapt the setting of unions of conjunctive queries and
ontologies expressed in description logics with transitive role declara-
tions. FOMQA is already undecidable for SHOIF, but was proved to
be 2Exp-complete for SOI, SOF and SIF. The proof relies on the fact
that any finite counter-model can be transformed to a counter-model of
certain regularity. By adjusting this transformation, we are able to pro-
vide an upper bound on the size of the smallest finite counter-model for
SOI and SOF ; the bound is linear in the size of the ABox. This lets
us show that for these logics FOMQA is in coNP with respect to data
complexity; as coNP-hardness follows from previous results on weaker
logics, we deduce coNP-completeness. 1
Keywords: FOMQA · Data complexity · Counter-models · SOI· SOF
1 Introduction
Evaluating queries in the presence of background knowledge has been extensively
studied in several communities. A particularly prominent take on this problem is
ontology mediated query answering (OMQA) where background knowledge rep-
resented by an ontology is leveraged to infer more complete answers to queries [3].
A widely accepted family of ontology languages with varying expressive power is
offered by Description Logics (DLs) [2], while the most commonly studied query
language is that of (unions of) conjunctive queries.
Identifying ontology languages, for which query evaluation scales to large
amounts of instance data, is one of the key research goals, as data tends to
dominate in size both the ontology used and the evaluated queries. An early
reference on data complexity in DLs, that is, complexity in the settings with
fixed TBox and query, is showing coNP-hardness of instance queries in ALE [13].
In the much more expressive DL SHIQ, a coNP upper bound was obtained for
instance queries [8] and conjunctive queries without transitive roles [10], and later
1
Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
mons License Attribution 4.0 International (CC BY 4.0).
2 J. Kuklis
generalized to any CQs [5]. The classical approach to avoiding intractability is
to replace ALC and SHIQ with less expressive Horn DLs. These often admit
query evaluation in PTime, examples include a variety of logics from the EL [1]
and DL-Lite families [4], as well as Horn-SHIQ [8]. An alternative approach is
to study complexity with respect to fixed TBoxes only [9].
Often, the intended models of the ontology are finite and this additional
assumption allows to infer more properties: finite ontology mediated query an-
swering (FOMQA) is the variant of OMQA restricted to finite models. For some
logics, like ALC, the finite variant and the unrestricted variant of the problem
coincide; we then say that OMQA is finitely controllable. Studying FOMQA is
interesting in settings lacking finite controllability. This is the case not only for
DLs lacking the finite model property (e.g., DLs allowing both inverse roles and
number restrictions), but also for logics allowing transitive role declarations, like
logics from the S family. FOMQA is undecidable for SHOIF ontologies [11],
but recently its decidability has been shown for three proper SOIF fragments:
SOI, SOF, and SIF [6]. For all three fragments, the problem turned out to be
2Exp-complete in terms of combined complexity, but its data complexity was
not studied. The present paper aims to fill this gap, thus initiating the study of
the data complexity of FOMQA.
In [6], FOMQA is reduced to query entailment over a certain class of tree-like
models recognisable by tree automata. Here, we adjust and adapt this approach
to obtain the linear-size counter-model property for SOI and SOF (with respect
to fixed TBox and query). This lets us derive an algorithm for FOMQA, which
is coNP with respect to data complexity. The matching lower bound can be ob-
tained using finite controllability from the results on answering instance queries,
which is coNP-hard even for the ALE fragment of ALC [13] (see also [2]).
2 Preliminaries
The DL SOIF extends the classical DL ALC with transitivity declarations
on roles (S), nominals (O), inverses (I), and role functionality declarations
(F) [2]. We assume a signature of countably infinite disjoint sets of concept
names NC = {A1 , A2 , . . . }, role names NR = {r1 , r2 , . . . } and individual names
NI = {a1 , a2 , . . . }. SOIF-concepts C, D are defined by the grammar:
C, D ::= > | A | ¬C | C u D | {a} | ∃r.C ,
where r ∈ NR ∪ {s− | s ∈ NR } is a role. Roles of the form r− are called inverse
roles. A SOIF TBox T is a finite set of concept inclusions (CIs) C v D,
transitivity declarations Tr(r), functionality declarations Fn(r), where C, D are
SOIF-concepts and r is a role. We assume that if the TBox contains Tr(r),
then it contains neither Fn(r) nor Fn(r− ). With an appropriate extension of the
signature, each SOIF TBox can be transformed into an equivalent TBox with
CIs in the following normal forms:
u Ai v t B j , A ≡ {a} , A v ∀r.B , A v ∃r.B ,
Data complexity of FOMQA in DLs with transitive roles 3
where A, Ai , B, Bj are concept names, empty conjunction is equivalent to > and
empty disjunction to ⊥. We assume that all knowledge bases we work with are
normalized; normalization yields at most linearly larger knowledge base, so the
size estimations still hold. We also assume that for each concept name A used in
T there is a complementary concept name Ā axiomatised using CIs > v A t Ā
and A u Ā v ⊥.
SOI and SOF TBoxes are restrictions of SOIF TBoxes. SOI TBoxes do
not contain functionality declarations, whereas concept inclusions in SOF do
not contain inverse roles. As the inverse of a transitive role is transitive anyway,
for SOI we shall assume that if Tr(r) is present in the TBox, then so is Tr(r− ).
An ABox is a finite set of concept and role assertions of the form A(a) and
r(a, b), where A ∈ NC , r ∈ NR and {a, b} ⊆ NI . A knowledge base (KB) is a
pair K = (T , A). We write |K| for |A| + |T |. We use CN(K), Rol(K), TRol(K),
Nom(K), and Ind(K) to denote, respectively, the set of all concept names, roles,
transitive roles, nominals, and individuals occurring in K. Each nominal is an
individual, so Nom(K) ⊆ Ind(K).
A unary type is a subset of CN(K) that contains exactly one of the concept
names A, Ā for each A ∈ CN(K). We let Tp(K) denote the set of all unary types.
The semantics is defined via interpretations I = (∆I , ·I ) with a non-empty
domain ∆I and an interpretation function ·I assigning to each A ∈ CN(K)
a set AI ⊆ ∆I and to each role name r with r ∈ Rol(K), a binary relation
rI ⊆ ∆I × ∆I . The interpretation of complex concepts and roles is defined as
usual [2]. We only consider interpretations complying with the standard name
assumption and set aI = a for each individual name a used in ABox or TBox.
We write d ∈ I as a shorthand for d ∈ ∆I .
Interpretation I is a subinterpretation of interpretation J , written as I ⊆ J ,
if ∆I ⊆ ∆J , AI ⊆ AJ , and rI ⊆ rJ for all A ∈ CN(K), r ∈ Rol(K). An
interpretation I is a subinterpretation of J induced by ∆0 ⊆ ∆J , written as
I = J ∆0 , if ∆I = ∆0 , AI = AJ ∩ ∆0 , and rI = rJ ∩ ∆0 × ∆0 for all
A ∈ CN(K), r ∈ Rol(K). We write J \ X for the subinterpretation of J induced
by ∆J \ X.
An interpretation I satisfies α ∈ T ∪ A, written as I |= α, if the following
holds: if α is a CI C v D then C I ⊆ DI , if α is a transitivity declaration Tr(r)
then rI is transitive, if α is a functionality declaration Fn(r) then rI is a partial
function, if α is an assertion A(a) then a ∈ AI , and if α is an assertion r(a, b)
then (a, b) ∈ rI . Finally, I is a model of: a TBox T , denoted I |= T , if I |= α
for all α ∈ T ; an ABox A, denoted I |= A, if I |= α for all α ∈ A; and a KB K
if I |= T and I |= A.
Let I and J be interpretations of K. A homomorphism from I to J , written
as h : I → J is a function h : ∆I → ∆J that preserves roles, concepts, and
individual names; that is, (h(d), h(d0 )) ∈ rJ whenever (d, d0 ) ∈ rI , r ∈ Rol(K),
h(d) ∈ AJ whenever d ∈ AI , A ∈ CN(K), and h(a) = a for all a ∈ Ind(K). Note
that I ⊆ J iff the identity mapping id is a homomorphism id : I → J .
Let NV be a countably infinite set of variables. An atom is an expression of
the form A(x) or r(x, y) with A ∈ NC , r ∈ NR , and x, y ∈ NV , referred to as
4 J. Kuklis
concept atoms and role atoms, respectively. A conjunctive query (CQ) Q is an
existentially quantified conjunction q of atoms, ∃x1 · · · ∃xn q . For simplicity we
restrict q to Boolean; that is, q uses only variables x1 , · · · , xn . This is without
loss of generality since the case of non-Boolean CQs can be reduced to the case
of Boolean queries; see e.g. [12].
Let var denote the function from queries to sets of variables, such that var(Q)
is the set of all the variables used in Q. A match for Q in I is a total function
π : var(Q) → ∆I such that I, π |= q under the standard semantics of first-order
logic. An interpretation I satisfies Q, written as I |= Q, if there exists a match
for Q in I. A partial match for Q in I is a partial function π : var(Q) → ∆I
such that I, π |= q 0 , where q 0 is obtained from q by dropping atoms which bind
some variable not in the domain of π.
Note that we do not consider queries with constants (i.e., individual names):
such queries can be viewed as non-boolean queries with a fixed valuation of free
variables, and thus are covered by the reduction to the Boolean case. We do
consider unions of conjunctive queries (UCQs), which are disjunctions of CQs.
An interpretation I satisfies a UCQ Q if it satisfies one of its disjuncts. It follows
immediately that UCQs are preserved under homomorphisms; that is, if I |= Q
and there is a homomorphism from I to J , then also J |= Q.
A query Q is entailed by a KB K, denoted as K |= Q, if every model of K
satisfies Q. A model of K that does not satisfy Q is called a counter-model. The
query entailment problem asks whether a KB K entails a (U)CQ Q. Moreover,
this problem is equivalent to that of finding a counter-model. It is well known
that the query answering problem can be reduced to query entailment.
In this paper, we address the problem of finite query entailment, which is a
variant of query entailment where only finite interpretations are considered: an
interpretation I is finite if ∆I is finite, and a query Q is finitely entailed by K,
denoted as K |=fin Q, if every finite model of K satisfies Q.
3 Counter-model surgery
Later in this paper, we transform a tree-like counter-model into a finite counter-
model of bounded size, using a variant of the blocking principle: a systematic
policy of reusing elements. Before we dive into this construction, we show that
we can add certain edges to a counter-model without satisfying the query. These
edges must connect a pair of points, for which a different pair of similar points
is already connected via an edge over the same role. Throughout this section,
we assume no transitivity declarations are present; we will handle these later.
Sharing the same unary type is a natural similarity criterion. Without tran-
sitivity and functional declarations, concept inclusions rely only on the unary
types of directly linked elements, so adding an r-edge between a pair of elements
yields a correct model of the knowledge base if a pair of elements of the same
unary types is already connected with an r-edge. However, that criterion is not
sufficient, since this way we might accidentally satisfy the query, for example by
creating a cycle of given length when the query asks for one. This happens to
Data complexity of FOMQA in DLs with transitive roles 5
be a key difficulty to overcome: we cannot introduce cycles shorter than the size
of the query.
The first step to enhance the criterion is to look at sufficiently large neigh-
bourhoods, rather than just unary types. We adapt the definition of neighbour-
hoods from [6] and parameterize it with a set of constants Const ⊆ ∆I : elements
which are considered neighbours of all the elements in the interpretation, regard-
less of the actual connections between them. Throughout the paper, we assume
Const = Nom(K) unless specified otherwise. By the distance between two ele-
ments we understand the length of the shortest undirected path over the union
of all roles between these two elements.
Definition 1. For d in I \ Const and an integer n, the n-neighbourhood NnI (d)
of d with respect to Const is the subinterpretation of I induced by Const and
all elements e in I \ Const within distance n from d in I \ Const, enriched
with a concept D interpreted as {d}, and a concept interpreted as {a} for each
a ∈ Const. For a ∈ Const, NnI (a) is the subinterpretation induced by Const,
enriched similarly.
Replacing unary types with large neighbourhoods is still not enough, because
nearby elements can have arbitrary large isomorphic neighbourhoods: in the
integers with the successor relation, all n-neighbourhoods are isomorphic. The
next step is to enrich the initial counter-model in such a way that overlapping
neighbourhoods are not isomorphic, following an idea from [7].
Definition 2. A colouring with k colours of an interpretation I is an exten-
sion J of I with ∆J = ∆I , such that J coincides with I in every element in
the signature of I, and interprets k fresh concept names B1 , . . . , Bk such that
B1J , . . . , BkJ is a partition of ∆J . We say that d ∈ BiJ has colour Bi . A colour-
ing J of I is n-proper if for each d ∈ ∆J all elements of NnI (d) have different
colours.
Lemma 1. If I \ Const has bounded degree, then for all n ≥ 0 there exists an
n-proper colouring of I with at most M + |Const| colours, where M is the size
of the largest (2n)-neighbourhood.
We write In for an arbitrarily chosen n-proper colouring of I.
Definition 3. Let i ≤ n and let d, e be elements of In . We say that (d, e) is an
i-link along role r if either d has an r-successor e0 in In such that NiIn (e0 ) '
NiIn (e), or e has an r-predecessor d0 in In such that NiIn (d0 ) ' NiIn (d).
Theorem 1 (Lifting Theorem). For a conjunctive query with at most k bi-
nary atoms and n ≥ k and an interpretation In in which the query is not satis-
fied, adding n-links to In does not make the query satisfied.
This theorem was established in [6] for n ≥ k 2 . We prove the stronger variant
in the appendix. The strengthening is not necessary for our approach, but it
tightens the resulting size estimations.
6 J. Kuklis
4 Clique-forest counter-models
In this section, we revisit prior work showing that for SOI and SOF, the exis-
tence of a finite counter-model for Q is equivalent to the existence of a so called
counter-example, a possibly infinite counter-model of a special form, which gen-
eralises tree-shaped models [6]. We impose tighter size constraints on counter-
examples and construct automata recognizing them.
The special form is based on the notion of clique-forests.
Definition 4 ([6]). Let us fix a KB K. A clique-forest for an interpretation I is
a forest whose each node v is labelled with a subinterpretation Iv of I \ Nom(K)
such that:
– the sets ∆Iv are a partition of ∆I\Nom(K) ;
– each Iv is either a single element with all roles empty (element node) or a
clique over some transitive role with all other roles empty and no repetitions
of unary types (clique node);
– apart from edges within cliques and between individuals, in I \ Nom(K) there
is exactly one edge between ∆Iu and ∆Iv for every two adjacent nodes u and
v: assuming u is the parent of v, it is an r-edge from an element of ∆Iu to
an element of ∆Iv for some r ∈ Rol(K). We call v the origin of Iv .
Definition 5. Let K∗ denote the KB obtained from K by dropping transitivity
declarations and let I ∗ denote the interpretation obtained from I by closing
transitively the interpretation of each transitive role.
Note that each existential restriction satisfied in I is also satisfied in I ∗ . The
same holds for quantifier-free CI, and for universal restrictions involving non-
transitive roles. For universal restrictions involving transitive roles, we ensure
this property by preprocessing the interpretation: for each pair of B ∈ CN(K)
and r ∈ TRol(K), we add a fresh concept name Pr,B , axiomatised as Pr,B v
∀r.(B u Pr,B ), and replace each CI of the form A v ∀r.B with A v ∀r.Pr,B .
An interpretation is safe if it does not contain an infinite simple r-path for
any transitive role r.
Definition 6. A counter-example for Q is a safe interpretation I such that:
I |= K∗ , I ∗ |6 = Q, and I admits a clique-forest with at most |Ind(K)| + |Nom(K)| ·
|CI(K)| trees, nodes containing at most |CN(K)| elements, at most |CI(K)| chil-
dren nodes connected to a single element, and each element of Ind(K) \ Nom(K)
occurring in some root.
If I is a counter-example for Q, then I ∗ is a counter-model for Q, thanks to
the initial preprocessing.
Theorem 2. Q has a finite counter-model iff Q has a counter-example.
The theorem was originally stated in [6], where counter-examples by defini-
tion admitted clique-forests with at most |K|2 trees and branching at most |K|2 ,
Data complexity of FOMQA in DLs with transitive roles 7
but the proof there is still valid for the present stronger statement. The bound
on the branching of clique-forests is replaced with bounds on the size of clique-
tree nodes and number of children nodes connected to a single element within
any node; their precise values follow from the construction of a single clique-tree
in the proof. The bound on the number of trees can be tightened, as the con-
structed clique-forests are built of trees rooted in non-nominal individuals and
children nodes of nominals.
To obtain the final counter-model of bounded size, we are going to apply
the blocking principle to a counter-example. Thanks to Theorem 2, we already
know that a counter-example exists whenever Q has a finite counter-model. In
the next section, we require that the clique-forests of counter-examples we work
with are of certain regularity. We will show that in regular interpretations with a
so-called completeness property, if two elements are connected with a path over a
transitive role, then there is at least one short path over that role which connects
them. To be able to focus on counter-examples with regular clique-forests, we
shall construct tree automata recognizing the latter.
The following definitions refer to edges and paths in a fixed interpretation I,
either known from the context or explicitly specified.
Definition 7. Fix an ordered pair of elements a, b. We say that (a, b) is an r-
edge if there is an edge over r from a to b. We say that (a, b) is an r∗ -edge, if
either r is transitive and there is a path over r from a to b, or (a, b) is an r-edge.
Definition 8. An interpretation J is individual-complete, if in J each r∗ -edge
over two individuals used in K is an r-edge. An interpretation J is nominal-
complete, if in J each r∗ -edge with a nominal as one of the endpoints is an
r-edge. An interpretation is complete, if it is individual- and nominal-complete.
Note the difference between the individual- and nominal-completeness def-
initions: in the former, both endpoints of the edges considered have to be in-
dividuals, whereas in the latter, only one endpoint has to be a nominal. Each
counter-model can be extended to a complete one in a natural way so that the
transitive closure stays the same. We can thus restate Theorem 2: Q has a finite
counter-model iff Q has a complete counter-example.
Theorem 3. For a union Q of CQs, with each CQ of size at most m, and a
complete interpretation M with ∆M = Ind(K), there exists an automaton with at
4m 3m
most 22·3 |Q||T | states recognizing clique-forests of complete counter-examples
for Q, for which the subinterpretation induced by Ind(K) is equal to M.
See Appendix for the automaton construction. Theorem 3 is inspired by a
similar theorem in [6]. We add the completeness requirement and restrict the
size of the automaton state-space, rather than its overall size. This way, we are
able to get a bound which is independent of the ABox; it depends only on the
TBox and query sizes. The automaton size still depends on the ABox, since its
description has to contain all the initial lists of states, each state corresponding
to one tree, and we consider clique-forest with at least one tree per individual
occurring in K.
8 J. Kuklis
These tweaks result in significant differences between our construction of the
automaton and the construction presented in [6]. Most importantly, we cannot
transfer the information about the edges between individuals to the TBox, as
that would introduce a dependency of the automaton state-space on the ABox.
This influences especially the automaton component responsible for verifying
that the query cannot be satisfied.
5 Bounded counter-models
We say that DL has the linear-size counter-model property if whenever there
exists a counter-model for a given query Q and a knowledge base K = (A, T )
expressed in that DL, there exists a counter-model for Q of size bounded by
|A| · B(Q, T ) for some function B. The goal of the larger part of this section is
to prove the following theorem.
Theorem 4. SOI and SOF have the linear-size counter-model property.
Clique-forests of complete counter-examples can be recognized by a family
of automata by Theorem 3. Suppose K and Q admit a finite counter-model.
As discussed in the last section, they admit then a complete counter-example,
which in turn can be recognized by an automaton. Any automaton accepting
some clique-forest must also accept some regular forest, which has only finitely
many non-isomorphic subtrees. The number of these subtrees can be bounded
by the number of states of the automaton. Following the bound in Theorem 3,
we pick a complete counter-example I admitting a regular clique-forest with at
4m 3m
most p = 22·3 |Q||T | non-isomorphic subtrees.
We shall turn I into a finite counter-model for Q, using the blocking prin-
ciple. The main obstacle is that Q uses transitive roles, which are not fully
represented. To solve that, we replace Q with a different query. In [6], this was
done by exploiting a bound on the length of simple r-paths for transitive roles r,
guaranteed by the regularity of clique-forests of the considered counter-examples.
Here, we require that the counter-example we process is complete and consider
the distance between elements that are connected by r-paths, which will let us
derive better bounds on the size of the obtained finite counter-models.
Definition 9. A pair of elements a, b is at r-distance k if the shortest r-path
from a to b is of length k. An interpretation is `-ranged if for each r∗ -edge, its
endpoints are at r-distance at most `.
For transitive role atoms in queries, the path that leads from one variable to
another does not play any role in the query evaluation, only the unary types of
the connected variables matter (and the fact that they are connected by some
path). We use this intuition to deduce that in `-ranged interpretations, any
query with transitive binary atoms can be rewritten so that closing the roles by
transitivity does not influence the query evaluation.
Let Q∗ beW obtained from Q by replacing each transitive atom s(x, y) by the
disjunction i≤` si (x, y), where si (x, y) is the conjunctive query expressing the
Data complexity of FOMQA in DLs with transitive roles 9
i-fold composition of s. If each disjunct of Q contains at most k binary atoms,
Q∗ can be rewritten as a union of conjunctive queries, each using at most k · `
binary atoms.
Lemma 2. For all `-ranged J , J ∗ |= Q iff J |= Q∗ .
Lemma 3. I \ Ind(K) is (4p)-ranged.
Proof. Let r be a transitive role in K. Each r-path going down the clique-forest
of I contains at most p nodes. Indeed, if there were a longer r-path, then a
subtree would occur twice on that path, which immediately leads to an infinite
simple r-path in I \ Ind(K), contradicting the safety of I. For each clique node,
at most one additional step is needed to traverse it with an r-edge. Additionally,
if K is a SOI KB, an r-path can be composed of two parts, one going down
the tree and the other consisting of r−1 edges going upwards. If follows that, for
SOI and SOF, any r-reachable pair in I \ Ind(K) is at r-distance at most 4p. t u
The following lemma, regarding a bound on the range of an interpretation,
is a stronger version of the analogous lemma presented in [6], regarding a bound
on the length of simple transitive paths. The bound on the range has to be
multiplied by a constant to get the bound on the range in the full interpretation,
whereas in [6], the size of the set of nominals also comes into play. We draw
upon the individual-completeness of the interpretations we consider to achieve
this improvement.
Lemma 4. For each individual-complete interpretation J , if J \ Ind(K) is `-
ranged, then J is (2` + 3)-ranged.
Proof. Take two elements a, b, such that (a, b) is an r∗ -edge in J , but not an
r∗ -edge in J \ Ind(K); it follows that any r-path from a to b visits an individual.
If r is non-transitive, then (a, b) is an r-edge and a, b are at distance 1, otherwise
consider such r-path π. It starts in one of the trees in J \ Ind(K) (or in an
individual, in which case the tree degenerates to an empty one), and reaches a
nominal connected to this tree or the tree root. Eventually, π enters the tree
containing its other endpoint, again via a nominal connected to that tree or
its root. Let c and d be the first and last individuals on π. As J is individual-
complete, (c, d) is an r-edge, if c 6= d. We can reduce π to a path segment starting
in a in the first tree, a single edge to c, possibly a single r-edge (c, d), a single
edge from d, and a path segment ending in b in the last visited tree. As the first
and last path segments are in J \ Ind(K), each pair of elements in them is at
r-distance at most `. This gives a (2` + 3) bound on r-distance in J . t
u
We know that I is `∗ -ranged for `∗ = (8p + 3) by Lemmas 3 and 4. We
conclude that I |6 = Q∗ by Lemma 2.
The degree of elements in I\Ind(K) is bounded by |CN(K)|+|CI(K)| (elements
are connected within clique-nodes of size at most |CN(K)| and with at most
|CI(K)| + 1 elements from other nodes). Non-nominal individuals used in K can
be connected with any number of other individuals, so even though their degree
10 J. Kuklis
is bounded, the bound is relatively large. Nominals can be connected with an
arbitrary number of elements.
We aim to use the blocking principle. We consider an n-proper colouring In0
of I 0 = I \ Ind(K), for some fixed n, instead of a proper colouring of I; we do so
to avoid the influence of the individuals on the size of the neighbourhoods (and
therefore the required number of colours). I 0 is a clique-forest too: the children
nodes of non-nominal individuals become new roots. On each branch π in In0 , let
Dπ be the first node for which some earlier node Eπ satisfies NnIn (dπ ) ' NnIn (eπ ),
where dπ ∈ Dπ and eπ ∈ Eπ are the endpoints of the edges connecting Dπ and
Eπ to their parent nodes. The new interpretation Fn is obtained by including the
branch π up to the predecessor of node Dπ , with the edge originally leading to
dπ redirected to eπ . Individuals are copied from I, together with their adjacent
edges. Note that for SOF, all functionality declarations are satisfied, as each
redirected edge is a forward edge.
As each node in In0 has finitely many children and we cut all the infinite
paths off, Fn is finite by Kőnig’s lemma. As we start with I |= K∗ and the unary
types of edge endpoints are preserved in Fn , we get Fn |= K∗ . By the initial
preprocessing, (Fn )∗ |= K. The conclusions drawn so far are valid for any n.
Lemma 5. Fn \ Ind(K) is (4p)-ranged for n ≥ 4p.
Proof. During the construction of Fn from I, we add only one link for each
infinite downwards path in I starting in a tree root. Each of these links goes
up the tree. By link origins we shall understand the elements that had an edge
added during the construction of Fn . We can visualize Fn ∪ I as the core Fn
with additional hanging subtrees, each attached to Fn with a single edge to a
link origin. There is no way of accessing these subtrees from Fn \ Ind(K) other
than via the corresponding link origin.
Let ` = 4p and fix some n ≥ `. We prove the following claim by induction
on m: if there is a path of length ` + 1 from a in Fn \ Ind(K) to b in (Fn ∪ I) \
Ind(K) over a transitive role r using at most m links, then there is an r-path in
(Fn ∪ I) \ Ind(K) from a to b over r of length at most `.
If we prove the claim for all m, we can deduce that Fn \ Ind(K) is `-ranged.
Indeed, take a transitive role s and an s-path π in Fn \ Ind(K) of length ` + 1,
with endpoints (a, b) and m links (possibly none). Applying the claim, we get an
s-path ρ in (Fn ∪ I) \ Ind(K) of length at most `, connecting a and b. To show
that ρ has to be in Fn \ Ind(K), we take advantage of the tree structure: if a
path goes down a given branch, it has to use a link to come back to an element
higher up in the tree. There are no links from elements in I \ Fn , so ρ must only
visit nodes in Fn \ Ind(K); otherwise ρ would get stuck in I \ Fn and have its
endpoint there.
Let us approach the induction proof. The induction base for m = 0 is almost
immediate. A path without links in (Fn ∪ I) \ Ind(K) is also a path in I \ Ind(K),
which is `-ranged, as shown in Lemma 3.
Assume the inductive hypothesis holds for m. Suppose that for a transitive
role s there is an s-path π in (Fn ∪ I) \ Ind(K) of length l + 1 from a to b, such
Data complexity of FOMQA in DLs with transitive roles 11
that it uses at most m + 1 links. If π uses fewer than m + 1 links, we simply use
the inductive hypothesis; thus we can assume π uses exactly m + 1 links.
As m ≥ 0, there is at least one link in π. Let (d, e) be the last link in π,
with the witnessing element e0 in I \ Ind(K). Note that the node containing e0 is
not an ancestor of the node containing a in the clique-tree, as otherwise a would
lie in a subtree rooted in e0 and would be removed while constructing Fn . Let
πa,d and πe,b denote the parts of π from a to d and from e to b. The path πe,b
is of length at most `. We have chosen n ≥ `, so the n-neighbourhood of e0 in
I \ Ind(K) contains a path πe0 0 ,b0 isomorphic to πe,b . The path consisting of πa,d ,
the edge from d to e0 and πe0 ,b0 is of length ` + 1, but uses only m links, so there
is an s-path ρ in (Fn ∪ I) \ Ind(K) from a to b0 of length at most `.
Due to the construction of Fn , each path to e0 starting in Fn \ Ind(K) must
visit d. Let ρa,d and ρe0 ,b0 denote the parts of ρ from a to d and from e0 to b0 .
Note that ρe0 ,b0 lies in I \ Ind(K), so it does not use any links. At least one of
the following inequalities holds: |ρa,d | < |πa,d | or |ρe0 ,b0 | < |πe,b |. If the first one
holds, we can take the path consisting of ρa,d , the edge from d to e and of πe,b
as an s-path from a to b of length at most `. If the second inequality holds, we
use the fact that ρe0 ,b0 lies in the n-neighbourhood of e0 in I \ Ind(K) and find
an isomorphic path ρ0e,b from e to b. The path consisting of πa,d , an edge from
d to e and of ρ0e,b is an s-path from a to b of length at most `. This proves the
inductive hypothesis for m + 1. t
u
4m 3m
Recall that p = 22·3 |Q||T | and l∗ = 8p + 3. For n ≥ 4p, Fn is `∗ -ranged
by Lemmas 4 and 5. The size of the expanded query Q∗ is bounded by k · `∗ .
Furthermore, Fn |6 = Q∗ for n ≥ k · `∗ by Theorem 1. We conclude that Fn∗ |6 = Q
for n ≥ k · `∗ by Lemma 2.
We can now approach the estimation of the size of Fn . The interpretation I
is a counter-example, so in its clique-forest, each node is of size at most |CN(K)|
and each element is connected to at most |CI(K)| children nodes. Children nodes
of individuals in I form the roots of clique-trees in I 0 , so the number of trees in
In0 is bounded by |Ind(K)|·|CI(K)|. As described before, the degree of elements in
I 0 is bounded by D = |CN(K)|+|CI(K)|, so the neighbourhoods in I 0 of radius m
consist of at most O(Dm ) elements. It is enough to use C = O(D2n ) + |Nom(K)|
colours to obtain a proper n-colouring of I 0 by Lemma 1. There are 2|CN(K)|
unary types, so there are L = 2|CN(K)| · C ways of labelling a single element
with its unary type and colour. The neighbourhoods of radius n are of size
n
O(Dn ), which means there are N = LO(D ) different neighbourhoods in In0 .
On a downward path in Fn , we cannot come across the same neighbourhood
centred in an element node or a clique origin, as otherwise a link upwards would
be created higher up on this path; thus N bounds the depth of clique-trees in
Fn . The degree of the clique-forest of In0 is bounded by D0 = |CN(K)| · |CI(K)|,
so the number of nodes in each tree is bounded by (D0 )N .
The overall size of Fn is bounded by M ·S(Q, T ), where M = |Ind(K)|·|CI(K)|
bounds the number of clique-trees in I 0 and S(Q, T ) is a bound on the size of a
single tree. The size estimation of a single tree is independent of the ABox size;
it is only governed by the TBox and query. This finalizes the proof of Theorem 4.
12 J. Kuklis
Corollary 1. Finite entailment of UCQs by SOI and SOF KBs is coNP-complete
with respect to data complexity.
Proof. If we fix the sizes of the TBox and query, then S(Q, T ) is fixed and the
size of Fn is linear with respect to the ABox size. We consider the following
non-deterministic approach to disproving query entailment: we simply guess a
transitively closed counter-model of size at most M · S(Q, T ) and verify all the
required properties. First, we need to check that it is indeed transitively closed.
This can be done in polynomial time with respect to the interpretation size.
Next, we need to verify ABox and TBox constraints; as the interpretation is
transitively closed, for each element each constraint can be verified by looking
at that element’s immediate neighbours. Finally, we need to check that the query
is not satisfied. This comes down to verifying a first-order logic formula, which
can be done in time polynomial of |Ind(K)|, where the degree of the polynomial
depends only on the sizes of the TBox and query. Altogether, this constitutes a
coNP algorithm for deciding query entailment for fixed-size TBox and query.
We have shown that finite query entailment is in coNP with respect to data
complexity. It is known that (unrestricted) query entailment is coNP-hard (with
respect to data complexity) in ALE [13]. By finite controllability, so is finite
query entailment. Since ALE is a fragment of both SOI and SOF, NP-hardness
carries over. t
u
6 Conclusions
By enhancing the approach to finite query entailment presented in [6], we were
able to show the linear-size counter-model property for SOI and SOF (for fixed
TBox and query). This let us show that finite query entailment is in coNP with
respect to data complexity for these DLs. Basing on previous work, we deduced
that it is a coNP-complete problem. To improve the final size estimations, we
also established a stronger version of the Lifting Theorem.
The techniques used to construct the final counter-models were not efficient
enough to provide asymptotically tight size estimations for arbitrary (not fixed)
TBoxes and queries. One limiting factor is the doubly-exponential bound on
the range of the considered counter-models, stemming from the automata state-
space size. It would be interesting to see if this could be improved. Another
natural question is whether our techniques are applicable to SIF.
Acknowledgements. The author was supported by Poland’s National Science
Centre grant 2018/30/E/ST6/00042. The whole work has been prepared with
huge help from Filip Murlak, the supervisor of the author’s Master thesis. We
thank the anonymous reviewers of DL 2020 for their helpful comments.
References
1. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proc. Int. J. C. on
Artif. Int. pp. 364–369 (2005)
Data complexity of FOMQA in DLs with transitive roles 13
2. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.:
The Description Logic Handbook: Theory, Implementation and Applications. Cam-
bridge University Press, New York, NY, USA, 2nd edn. (2010)
3. Bienvenu, M., Ortiz, M.: Ontology-mediated query answering with data-tractable
description logics. In: Reasoning Web. LNCS, vol. 9203, pp. 218–307. Springer
(2015)
4. Calvanese, D., Giacomo, G.D., Lembo, D., Lenzerini, M., , Riccardo: Tractable
reasoning and efficient query answering in description logics: The DL-Lite family.
J. of Autom. Reasoning 39(3), 385–429 (2007)
5. Glimm, B., Lutz, C., Horrocks, I., Sattler, U.: Conjunctive query answering for the
description logic SHIQ. J. Artif. Intell. Res. 31, 157–204 (2008)
6. Gogacz, T., Ibáñez-Garcı́a, Y.A., Murlak, F.: Finite query answering in expressive
description logics with transitive roles. In: Proc. KR 2018. pp. 369–378 (2018)
7. Gogacz, T., Marcinkowski, J.: On the BDD/FC conjecture. In: PODS. pp. 127–138
(2013). https://doi.org/10.1145/2463664.2463668
8. Hustadt, U., Motik, B., Sattler, U.: Reasoning in description logics by a reduction
to disjunctive datalog. J. Autom. Reasoning 39(3), 351–384 (2007)
9. Lutz, C., Wolter, F.: The data complexity of description logic ontologies. Log. Met.
in Comp. Sci. 13(4:7), 1–46 (2017)
10. Ortiz, M., Calvanese, D., Eiter, T.: Characterizing data complexity for conjunctive
query answering in expressive description logics. In: Proc. Nat. C. on Artif. Int.
pp. 275–280. AAAI Press (2006)
11. Rudolph, S.: Undecidability results for database-inspired reasoning problems in
very expressive description logics. In: KR. pp. 247–257. AAAI Press (2016)
12. Rudolph, S., Glimm, B.: Nominals, inverses, counting, and conjunctive queries or:
Why infinity is your friend! J. Artif. Intell. Res. 39, 429–481 (2010)
13. Schaerf, A.: On the complexity of the instance checking problem in concept lan-
guages with existential quantification. J. of Intel. Inf. Systems 2, 265–278 (1993)
14 J. Kuklis
A Proof of Lemma 1
The following proof is taken from [6], where the lemma was formulated with
no mention of the number of required colours. Let n ≥ 0. Because I \ Nom(K)
has bounded degree, 2n-neighbourhoods in I have size bounded by some m. We
colour the elements of I one by one, with m colours. Pick an uncoloured element
I
d. At most m − 1 colours are already used in N2n (d). Assign to d any colour that
I
is not yet used in N2n (d). This procedure gives an n-proper colouring. Indeed,
consider different e, e0 from NnI (d) for some d ∈ I. Without loss of generality
we can assume that e was coloured before e0 . But e belongs to N2n I
(e0 ), so the
0
colours of e and e are different by construction.
B Lifting Theorem
B.1 Formulation
Definition 10. For i ≤ n, let Ini be the interpretation obtained from In by
including into the interpretation of each role r all i-links along r; that is, for
i
every role r and every i-link (d, e) along r, (d, e) ∈ rIn .
Notice that for i < j, each j-link is also an i-link. Note also that (d, e) is an
i-link along role r if and only if (e, d) is an i-link along r− . We have
In ⊆ Inn ⊆ Inn−1 ⊆ · · · ⊆ In1 ⊆ In0 ,
but the domains of all these interpretations coincide. We keep referring to the
edges present in Ini but not in In as i-links, even though they are ordinary edges
now.
Definition 11. We say that functions f, f 0 : X → ∆I are d-indistinguishable
if for all x ∈ X:
NdI (f (x)) ' NdI (f 0 (x)).
Throughout the paper, we consider neighbourhoods with respect to
Const = Nom(K), however, all the definitions reliant on the notion of neigh-
bourhoods are correct for every choice of the set of constants Const. Let us note
that for every pair of d-indistinguishable functions f, f 0 , if for some x ∈ X and
c ∈ Const either f (x) = c or f 0 (x) = c, then f (x) = f 0 (x). This is because
whenever we consider any neighbourhoods, we add a nominal-like concept for
each constant.
Theorem 5. Let P be a CQ with at most k binary atoms and let n ≥ k. For
each homomorphism h : P → Inn there exists a homomorphism h0 : P → In
which is (n − k)-indistinguishable from h.
Data complexity of FOMQA in DLs with transitive roles 15
Theorem 5 holds for any interpretation I of any SOIF KB. It was originally
stated in [6] for n ≥ k 2 and regarded a pair of (n − k 2 )-indistinguishable homo-
morphisms. Furthermore, it was proved only for Const = Nom(K), and here we
consider an arbitrary choice of the set of constants Const. We prove the enhanced
version of the theorem in the next subsection; let us now see that it implies the
Lifting Theorem (Theorem 1).
Proof (Lifting Theorem). Interpretation Jn we obtain in the process satisfies
Jn ⊆ Inn . Consequently, if there was a homomorphism h : P → Jn ⊆ Inn for some
CQ P constituting Q, Theorem 5 would yield a homomorphism h0 : P → In ,
contradicting I |6 = Q. t
u
B.2 Proof of Theorem 5
We start with the following observation.
Lemma 6. All links involving elements of Const are ordinary edges.
Proof. Assume (d, e) is an i-link for d in Const (the symmetrical case is analo-
gous). Assume that d has a successor f such that NiIn (f ) ' NiIn (e). Then, the
edge (d, e) must also exist in NiIn (e), which shows that (d, e) is an ordinary edge.
Assume now that e has a predecessor f such that NiIn (f ) ' NiIn (d). Then, f
must be equal to d, due to the concepts used to mark constants in the definition
of a neighbourhood, so (d, e) is an ordinary edge. t
u
Let h(P ) denote the subinterpretation of Inn obtained by restricting the do-
main to h(var(P )), and only keeping in each role r edges (h(x), h(y)) such that
r(x, y) is an atom from P . We say that h uses an r-edge of Inn if this r-edge is
present in h(P ). We say that h uses a link over r of Inn if this link is an edge of
h(P ), but not an edge of In .
Definition 12. By J 0 ∩ J 00 we mean the interpretation J such that ∆J =
0 00 0 00 0 00
∆J ∩ ∆J , AJ = AJ ∩ AJ for all concept names A, and rJ = rJ ∩ rJ for
all role names r.
Let us analyze the structure of h(P ). Let In0 = In \Const and let C denote the
set of connected components in h(P )∩In0 , understood as graphs. Each component
in C contains at most k edges, as P contains at most k binary atoms. We shall
now prove that h is far-linked ; that is, h does not use links that join two elements
within the same connected component in C. This is a general property of such
homomorphisms, which stems from the fact that each link connects elements
that are far apart, as expressed in the following lemma.
Lemma 7. Elements forming a link are at distance at least 2n in In0 .
Proof. Let (a, b) be a link along r. Suppose that the distance between a and b
in In0 is smaller than 2n. Take the shortest path π in In0 between a and b; π
is of length at most 2n − 1. As (a, b) is a link, either a has an r-successor f in
16 J. Kuklis
In0 with the same colour as b, or b has an r-predecessor f in In0 with the same
colour as a. In the first case, there is an element on π, which is at distance at
most n from both b and f , in the second case, there is an element on π, which is
at distance at most n from both a and f . Since a, b, f share the same colour, are
pairwise unequal and are in an n-neighbourhood of another element, we obtain
a contradiction. Thus, elements forming a link must be at distance at least 2n
in In0 . t
u
Lemma 8. Each homomorphism h00 : P → Ini is far-linked.
Proof. Let r(x, y) be an atom of the query P such that h00 (x) and h00 (y) are in
the same connected component C ∈ h00 (P ) ∩ In0 . As P has at most k binary
atoms, elements in C are connected with at most k edges. Since h00 (x) and h00 (y)
are both in C, they at distance at most k < 2n in In0 (unless k = 0; we verify this
case separately). We use Lemma 7 to deduce that (h00 (x), h00 (y)) is an r-edge in
In0 . t
u
We see that all the links used by h join elements from different connected
components in C. We will construct a series of homomorphisms (h0 , h1 , ..., hl ),
n
with l ≤ k, hj : P → In j , nj < nj−1 for each j > 0, starting with h0 = h
and n0 = n, satisfying the following constraints. First, we require that hj+1 uses
strictly fewer links than hj . Second, we require that hj+1 and hj are (nj+1 )-
indistinguishable. Third, we require that hl uses no links and that nl ≥ n − k.
If we manage to satisfy these constraints, hl treated as h0 : P → In is the
homomorphism postulated in the theorem statement.
Let Cj denote the connected components of hj (P )∩In0 . We keep the invariant
that (Cj ∩ C) is non-empty for j < l. We construct the components graph of Cj :
the graph with elements of Cj as vertices and edges between vertices representing
connected components from Cj that are joined by a link between their elements.
The second invariant is that for j > 0, there is exactly one connected component
in Cj that is not present in Cj−1 ; let Dj0 denote that component. If j > 0 and
there is a component Dj ∈ (Cj ∩ C) joined by a link to Dj0 , set Cj = Dj and
Cj0 = Dj0 . Otherwise, either there are Cj , Cj0 ∈ (Cj ∩ C) joined by a link, or hj
uses no links and we finish the construction. In case Cj , Cj0 exist, homomorphism
hj+1 is obtained from hj by pulling Cj closer to Cj0 by the link that joins these
components; as a result, Cj , Cj0 are combined into one component, namely Dj+1 0
.
The details of that construction are presented in the next subsection.
We will later show that hj+1 uses links for a subset of these binary atoms of
P , for which hj uses links and which do not join Cj with Cj0 ; the links joining
Cj and Cj0 in hj are replaced with regular edges of In in hj+1 . We compensate
for eliminating these links by paying a certain cost: homomorphism hj+1 is less
strict than hj when it comes to preserving the neighbourhoods of the elements
joined with a link, which is why we have to set nj+1 = nj − |Cj |.
We will now prove that nj ≥ n − k for all j. This will entail that h0 is indeed
(n−k)-indistinguishable from h. We use the following invariant: altogether there
are at least n − nj edges in connected components from (Cj \ C). This can be
proved by induction on j. The base step for j = 0 reduces to the statement that
Data complexity of FOMQA in DLs with transitive roles 17
an empty graph has no edges. In the inductive step, we pull the component Cj
towards Cj0 , merging the two within a larger connected component. We have that
0
Cj ∈ Cj is in C and the new connected component Dj+1 ∈ Cj+1 is not in C. This
means that the number of edges in connected components from Cj+1 \ C must be
larger than the number of edges in connected components from Cj \ C. The total
increase is equal to at least |Cj |; this stems from incorporating |Cj | − 1 edges
within Cj , and from reducing at least one link. Using the inductive hypothesis,
we get that there are at least n − nj edges in connected components from Cj \ C,
and at least n − nj + |Cj | edges in connected components from Cj+1 \ C. As
nj+1 = nj − |Cj |, we get n − nj + |Cj | = n − nj+1 , which proves the inductive
step. If there was j such that nj < n − k, the number of edges in connected
components from (Cj \ C) would be at least n − nj ; as n − nj > n − (n − k) = k,
this would contradict the fact that hj (P ) has at most k edges.
B.3 Definition of hj+1
In this subsection, we mark the names of interpretation elements with bold colour
fonts. Whenever we mention distance between elements, we refer to distance in
In0 .
We are going to define hj+1 based on hj and Cj , Cj0 ∈ Cj .
Let (d, e) be a link along s used by hj and joining Cj with Cj0 . We only
consider the case when d ∈ Cj0 and e ∈ Cj ; the proof for the other case is
symmetrical. Either d has an s-successor e0 such that NnIjn (e) ' NnIjn (e0 ), or
e has an s-predecessor d0 such that NnIjn (d) ' NnIjn (d0 ); once again we take
advantage of almost full symmetry and only present the proof for the first case.
Let g : NnIjn (e) → NnIjn (e0 ) be the witnessing isomorphism for the (d, e) link.
n
We define hj+1 : P → In j+1 for nj+1 = nj − |Cj | as follows: for each z ∈ var(P ),
let hj+1 (z) = g(hj (z)) if hj (z) ∈ Cj , and hj+1 (z) = hj (z) otherwise. To ensure
that the definition of hj+1 is correct, we need to show that nj+1 ≥ 0. Altogether,
components in C have at most n edges (more precisely, at most k ≤ n edges),
and we have proven an invariant that there are at least n−nj edges in connected
components from (Cj \ C). As Cj is in (Cj ∩ C), and (d, e) is a link, Cj can have
at most n − 1 − (n − nj ) = nj − 1 edges. Thus, nj+1 = nj − |Cj | ≥ 0.
By definition, hj and hj+1 are (nj+1 )-indistinguishable: NnIjn (e) ' NnIjn (e0 )
and nj+1 = nj − |Cj |, so g preserves (nj+1 )-neighbourhoods of elements within
distance |Cj | from e. We need to verify that hj+1 is indeed a homomorphism.
Let r(x, y) be an atom of the query P and let p = hj (x), q = hj (y), p0 =
hj+1 (x), q0 = hj+1 (y). We have to consider the following cases:
1. p, q ∈
/ Cj ,
2. p, q ∈ Cj ,
3. exactly one of p, q is in Cj ; we take advantage of the symmetry and only
consider the case p ∈ / Cj , q ∈ Cj , with the following subcases:
(a) p ∈ Cj0 ,
(b) p ∈ Cj00 for Cj00 ∈ Cj such that Cj00 6= Cj0 .
(c) p ∈ Const,
18 J. Kuklis
Case 1. By definition, (p0 , q0 ) = (p, q). As hj is a homomorphism into
nj n n
In ⊆ In j+1 , we have that (p0 , q0 ) is an r-edge in In j+1 . Clearly, hj+1 uses no
new links for such atoms.
Case 2. By definition, (p0 , q0 ) = (g(p), g(q)). We assumed that p and q are
both in Cj , and connected components in Cj contain at most k − 1 edges (we
subtract one, because (d, e) is a link). This means p, q are at distance at most
k − 1 < 2n, and thus (p, q) is an r-edge in In by Lemma 7. As g(p) and g(q) are
0
both in Dj+1 ∈ Cj+1 , they are at distance at most k. This means that (p0 , q0 ) is
an r-edge in In .
Case 3. By definition, (p0 , q0 ) = (p, g(q)). As g preserves (nj+1 )-neighbourhoods
of elements within distance |Cj | from e and the distance between q and e is at
most |Cj |, we see that NnIj+1
n
(q) ' NnIj+1n
(g(q)). Let us analyze the three sub-
cases.
Subcase 3 (a). As p ∈ Cj0 and q ∈ Cj , and Cj0 and Cj are different connected
components in Cj , we have that (p, q) is a link along r. Since Cj0 and Cj are
0 0
merged together into Dj+1 , both p and g(q) are in Dj+1 . This means they are at
distance at most k from each other, which allows us to use Lemma 7 to deduce
that (p0 , q0 ) is an r-edge in In .
Subcase 3 (b). Like above, p and q are in different connected components
from Cj , so it must be that (p, q) is a link along r. We need a different line
of reasoning than in the previous case, since we cannot assume Cj and Cj00 are
merged together. Since (p, q) is an nj -link, it is also an (nj+1 )-link. We have
two subcases to consider.
Suppose that p has an r-successor f in In0 such that NnIj+1n
(f) ' NnIj+1
n
(q) '
In
Nnj+1 (g(q)). We see that then (p, g(q)) is an (nj+1 )-link. If f = g(q)), then
(p0 , q0 ) is an r-edge in In . Furthermore, the connected component Cj00 of p merges
with Cj0 ; all three components Cj , Cj0 and Cj00 form subgraphs of Dj+1 0
. As ele-
0
ments in Dj+1 are at distance at most k from each other, there can be no links
between them by Lemma 7. This means that all the links between Cj00 and Cj
used by hj become regular edges in hj+1 . We also see that hj cannot use any
links between Cj00 and Cj0 , as elements of these components are at distance at
most k from each other in In0 . The other possibility is that f 6= g(q)), in which
case (p0 , q0 ) is a link along r. By analogous reasoning, this also entails that hj+1
uses links for all the links used by hj which join Cj and the connected component
of f.
Suppose now that q has an r-predecessor f in In0 such that NnIj+1 n
(f) '
In 0 0
Nnj+1 (p). We prove in the same way as above that either (p , q ) is an r-edge in
In0 and the connected component of p and Cj0 merge, or (p0 , q0 ) is a link along
r.
Data complexity of FOMQA in DLs with transitive roles 19
Subcase 3 (c). (p, q) is an r-edge in In by Lemma 6. As NnIj+1 n
(q) '
Nnj+1 (g(q)), (p, g(q)) must be an r-edge in Nnj+1 (g(q)). Thus, (p , q0 ) is an
In In 0
r-edge in In .
This proves that hj+1 is a homomorphism, which uses links for a subset of
these binary atoms of P , for which hj uses links and which do not join Cj with
Cj0 .
C Proof of Theorem 3
C.1 Construction
C.1.1 Outline The authors of [6] present a product of tree automata that
independently verify the following acceptance conditions:
– consistency of the forest encoding,
– satisfying the constraints from the knowledge base K∗ = (T , A),
– not satisfying the query Q,
– path safety.
We enhance their construction by adapting these components and adding ver-
ification of individual- and nominal-completeness. Our automaton is parametrized
by a fixed subinterpretation M induced by individuals (note that it is not the
case in [6]).
We describe the construction with SOI in mind, but it adapts easily to
SOF. The only differences are that for SOF, we do not validate parts referring to
inverse roles, and we add another component verifying functionality declarations,
which does not affect our size estimations. This component is decribed in the
last subsection.
C.1.2 Preprocessing The interpretations we work with are preprocessed to
cater for transitivity: for an interpretation I, if I |= K, then also I ∗ |= K. This is
achieved by supplementing the interpretation with auxiliary concepts expressing
universal restrictions on transitive roles, as described under Definition 5: for each
pair of B ∈ CN(K) and a transitive role r, we add a fresh concept name Pr,B ,
axiomatised as Pr,B v ∀r.(B u Pr,B ), and replace each CI of the form A v ∀r.B
with A v Pr,B .
The information about the edges to and from nominals is also stored in
auxiliary concepts: for each a ∈ Nom(K) and r ∈ Rol(K) fresh concept names
Nr,a and Nr− ,a are added to the knowledge base, axiomatised as Nr,a ≡ ∃r.{a}
and {a} ≡ ∀r.Nr− ,a . These concepts do not substantially increase the size of the
automaton.
We aim to accept only these counter-examples which are complete. First, we
tackle nominal-completeness. For each a ∈ Nom(K) and r ∈ TRol(K) we add
the following concept inclusions to TBox: N̄r,a v ∀r.N̄r,a and Nr− ,a v ∀r.Nr− ,a .
20 J. Kuklis
This ensures that whenever for a transitive role r there is no edge r from some
element to a nominal (or vice versa), then there is also no r-path from that
element to the nominal.
Recall that we assume the completeness of M (the subinterpretation induced
by individuals). Consequently, any path along a transitive role between two
individuals not connected with a single edge must enter some clique-tree and
lead through a nominal. Additional CIs described in the last paragraph ensure
that whenever there is a path from an individual to a nominal in the clique-tree
associated with that individual, these elements are also connected with a single
edge. If there was a path breaking individual-completeness of the whole forest,
it would entail that also nominal-completeness does not hold. Thus, individual-
completeness follows from nominal-completeness.
C.1.3 Automaton To make clique-forests accessible to automata, we encode
them as finitely labelled forests. Let TRol(K) be the set of transitive roles from
Rol(K), and let [X]≤k be the family of subsets of X of size at most k. In the
encoding, nodes are labelled with elements of the alphabet
Σ = Tp(K) ∪ TRol(K) × [Tp(K)]≤|CN(K)|
and edges are labelled with elements of the alphabet
Γ = Tp(K) × Rol(K) × Tp(K) .
We build an automaton parametrized by the subinterpretation M. Let I
denote the interpretation whose clique-forest is encoded in the input forest. The
automaton runs on the encoding of the clique-forest of I \ Ind(K) (not of I, like
in [6]; this change is motivated by the way we modify the query component). We
say that an individual a is the super-root of a clique-tree t in I \ Ind(K), if the
root of t is a successor of a in I; we also say that t is super-rooted in a.
For a complete counter-example I, we obtain the encoding of the clique-forest
for I \ Ind(K) as follows. We fix some order on individuals and order the trees
in I \ Ind(K) so that they are sorted with respect to the order on their super-
roots. We add a separating tree stub (represented with a fresh symbol added to
Σ) between contiguous blocks of trees super-rooted in the same individual; it is
only used as a marker for the automaton so that it can recognize which trees are
associated with a given individual. All the components of the automaton always
accept these tree stubs.
Next, we label each element node with the single unary type it realises and
each clique node with its single nonempty role and the set of unary types it
realises. We do not represent nominals explicitly in the encoding, but thanks to
the initial preprocessing, all relevant information about them is contained in the
unary types of the remaining elements. Finally, if in I \ Ind(K) there is an r-edge
from an element of type τ in some parent node to an element of type σ in some
child node, then in the encoding the edge from the parent node to the child node
Data complexity of FOMQA in DLs with transitive roles 21
is labelled with (τ, r, σ). Because unary types do not repeat within cliques, this
uniquely determines the endpoints.
There are at most N = |Ind(K)| · |CI(K)| trees in the input forests, with
branching bounded by D = |CN(K)| · |CI(K)|. Transition relation has the form
δ ⊆ Q × Σ × (Γ × Q)≤D ,
where Q is the set of states. The automaton processes the forests top down.
Whenever a node is reached after an edge labelled (τ, r, σ), we refer to the unique
element of type τ in that node as the current element, and to the unique element
of type σ in the parent node as the parent element. The initial states are specified
for each tree separately: the automaton has a set I ⊆ Q≤N of sequences of initial
states. A run is a labelling of the input forest with states in such a way that
the sequence of states in the roots belongs to I, and if a node has state q, label
α, and its children are connected via edges with labels β1 , β2 , . . . , βn and have
states q1 , q2 , . . . , qn , then
q, α, (β1 , q1 ), . . . , (βn , qn ) ∈ δ .
We use Büchi acceptance condition: we specify a set F ⊆ Q of marked states
that need to be revisited, and consider a run accepting if on each branch marked
states occur infinitely often. A forest is accepted by the automaton if there exists
an accepting run over it.
An automaton has trivial acceptance condition if F = Q. Then, each run is
accepting but the automaton may still reject some forests, because there may be
no run for them: a branch of the computation can get stuck if no transition is
consistent with the current state, label and edge labels. An automaton is weak if
on each branch of each run, once a marked state is visited, all subsequent states
are marked. Notice that all automata with trivial acceptance condition are weak.
Given a weak automaton and an arbitrary Büchi automaton it is particularly
easy to construct an automaton recognising trees accepted by both input au-
tomata: it suffices to take the standard (synchronous) product automaton and
mark all states that contain a marked states on both coordinates.
C.1.4 Automata components The final automaton is obtained as a product
of automata verifying independently various parts of the condition.
The first thing to check is the consistency of the encoding: if an edge has
label (τ, r, σ), then τ must occur in the label of the parent node, and σ must
occur in the label of the child node. To check this, it suffices to examine for each
node the labels of all edges incident to it plus the label of the node itself. When
a transition is made, all these are available except the label on the edge to the
parent: it must be stored in the state. The automaton has |Γ | states and trivial
acceptance condition.
The second thing to check is that the clique-forest is a model of K∗ . If M
does not satisfy the restrictions from the ABox A of K∗ , the final automaton
rejects everything.
22 J. Kuklis
To verify that the TBox is satisfied we need to check each CI. For CIs of the
form
u Ai v t B j
i j
we have a two-state automaton with trivial acceptance condition that simply
tests that each type used in the encoding satisfies this CI. If the type of some
individual violates this CI, the final automaton rejects everything. To verify CIs
of the form
A v ∀r.B
for elements in I\Ind(K), it suffices to check that in the input interpretation there
is no r-edge from an element whose unary type contains A to an element whose
unary type contains B̄. This amounts to verifying that none of the following are
used in the encoding:
– node labels (r, T ) such that A ∈ τ ∈ T and B̄ ∈ σ ∈ T for some τ , σ;
– edge labels (τ, r, σ) with A ∈ τ and B̄ ∈ σ;
– edge labels (σ, r− , τ ) with A ∈ τ and B̄ ∈ σ;
– unary types containing both A and Nr,b for some b such that b ∈ B̄ M ;
– unary type containing both B̄ and Nr− ,b for some b such that b ∈ AM .
These conditions simply disallow certain labels; they can be checked by a two-
state automaton with a trivial acceptance condition.
Let us take a CI of the form
A v ∃r.B .
For elements in I \ Ind(K), this condition can be tested in a similar way as
above, except that one needs access to the label of the current node and all
edges incident to it. Like for the initial consistency check, it suffices to store in
the state the label of the edge to the parent. To ensure the existential restrictions
are met for nominals, for each nominal a such that a ∈ AM , if that CI is not met
in M, we construct a two-state weak automaton looking for a label that uses a
type τ such that B ∈ τ and Nr− ,a ∈ τ . Note that this automaton has a non-
trivial acceptance condition, but it is weak: as soon as it finds an appropriate
label, it loops in a marked state. The initial lists of states for that weak consist
of a single unmarked state and all the remaining initial states already marked.
We also have to verify that all existential and universal restrictions are met
for non-nominal individuals. The former are ensured by constraining initial lists
of states so that all the existential restrictions for an individual not satisfied in
M are satisfied by roots of clique-trees super-rooted in that individual. If the
universal restrictions are not met in M, the final automaton rejects everything;
otherwise, we ensure that the restrictions hold for non-individual elements con-
nected to individuals by constraining initial lists of states so that the roots of
clique-trees super-rooted in a given individual do not break universal restrictions
imposed on that individual.
Summing up, the total size of the state-space of the TBox component is not
larger than 2|CI(K)| ·2|CI(K)| ·|Γ ||CI(K)| ·2|Nom(K)|·|CI(K)| . The first three factors stem
Data complexity of FOMQA in DLs with transitive roles 23
from the sizes of automata used to verify on I \ Ind(K) all the CIs of the forms
ui Ai v tj Bj , A v ∀r.B, and A v ∃r.B, respectively. The last factor stems
from the verification of existential restrictions for nominals.
Next, we verify that the query cannot be satisfied. We need an automaton
recognizing counter-examples such that the number of states in the automaton
does not depend on the number of individuals in K. This is achieved by mak-
ing sure each tree in the forest is processed without explicitly referring to the
knowledge about individuals.
In [6], all the information about edges between individuals is transferred
to the TBox. In particular, this means that the query parts are self-contained
in single trees, as individuals are not connected (and the edges to nominals are
hidden from the automata perspective). It lets the query component build partial
query matchings independently for each tree, but at the cost of enlarging the
TBox.
Here, we cannot use the same approach: the number of states depends on
the number of unary types in particular, which would be tied to the ABox size
if we were to store the information about edges between individuals in concepts.
Matchings of connected parts of the query are no longer contained in single
trees: they might span over multiple trees. We have to accommodate for that by
enhancing the query component.
We start by replacing query Q with a query Q0 such that for each model I
of K: I ∗ |= Q iff (I \ Nom(K))∗ |= Q0 . The query Q0 is obtained in two steps.
In the first step, we account for the possibility that binary atoms of the form
r(x, y) for transitive r can be satisfied with paths that cross different trees. Take
such path π, and let a and b denote respectively the first and the last individual
on π (possibly a = b). Thanks to the individual-completeness property, π can
be reduced to a path consisting of: a path segment ending in a and contained
in a single tree, a single edge from a to b (if a 6= b), and another path segment
contained in a single tree. To detect such paths, for each CQ P constituting
Q, we add to Q each CQ that can be obtained from P by subdividing some
transitive atoms; that is, by replacing some atoms of the form r(x, y) either with
atoms r(x, z) and r(z, y) for a fresh variable z, or with atoms r(x, z1 ), r(z1 , z2 ),
r(z2 , y) for fresh variables z1 , z2 .
In the second step, we account for the influence of nominals on the query
satisfaction. For each CQ P of the modified Q, we add to Q each CQ that can
be obtained from P by performing the following operation any number of times.
Let tp(x) be the set of all A such that P contains A(x). Choose x ∈ var(P ) and
a ∈ Nom(K) such that a ∈ AM whenever A ∈ tp(x). Drop all atoms of the form
A(x) from P . Replace in P each atom of the form r(y, x) by Nr,a (y) and each
atom of the form r(x, y) by Nr− ,a (y).
The resulting query Q0 satisfies the condition I ∗ |= Q iff (I \Nom(K))∗ |= Q0 .
After the first step, the number of CQs grows by the factor 3m and their size
is at most 3m. After the second step, the number of CQs grows by the factor
|Nom(K)|3m and their size is still at most 3m. Thus, the size of the resulting
query is at most |Q| · 3m · |Nom(K)|3m , and its CQs have size at most 3m.
24 J. Kuklis
For each CQ P of the preprocessed union of queries Q0 , we construct an
automaton that ensures that I ∗ |6 = P . Its states consist of the parent edge label
β and a set F representing all partial matchings of P to elements represented in
the subtree rooted in the currently processed node.
The set F consists of partial functions f : var(P ) → L, with L = {succ, other},
representing different ways variables can be assigned in relation to the parent
element. Each such function represents a partial matching, that is a partial as-
signment of variables such that the restricted query induced by these variables
is satisfied (not all the collected partial matchings can potentially be extended
to a total match; discussion on that part is presented in the next subsection).
The identifier succ is used for each variable mapped to an element in the cur-
rent subtree that is an r-successor of the parent element in I \ Ind(K). If r is
non-transitive, this simply means the current element (the clique origin, if we
are processing a clique node). If r is transitive, it means any element r-reachable
from the current element. The identifier other is used for variables assigned to
any other elements in the current subtree.
The rationale behind the automaton is that it constructs all the possible
partial matchings across the forest. The trick here is that the states reached
before traversing a given node already contain the information about all the
partial matchings that will be constructed in the subtree rooted at that node.
The automaton will have at most one accepting run on each forest—the states in
the initial list of states in that run will contain all the possible partial matchings
across the forest.
We define the transition relation in the next subsection. The definition will
ensure the intended semantics of the states; when processing a given node, sets
F in the automaton state will represent all partial matchings of P to elements
represented in the subtree rooted in that node.
All states are accepting. Apart from the partial matchings collected in trees
rooted in children nodes of individuals, we also compute all partial matchings
in M \ Nom(K) (the possibility that some variables are assigned to nominals
has already been accounted for in the second step of the query preprocessing).
As lists of initial states, we take all lists of states such that a total match over
the whole forest cannot be constructed by fusing together a partial matching in
M \ Nom(K) and one partial matching per initial state. We know how to stitch
these partial matchings together, because we know the edges from super-roots
to new clique-tree roots and the way we collect partial matchings in a clique-
tree accounts for the label on that edge. This sums up the query automaton
construction. This extra information about the edges from super-roots motivates
the approach of traversing clique forests built of trees rooted in children nodes
of individuals, rather than trees rooted in individuals.
The last component of the automaton checks that the input interpretation is
safe. Observe that it is unsafe if in the input clique-forest there is a branch with
consecutive node and edge labels α1 β1 α2 β2 . . . such that for some transitive r
and all i large enough, βi = (τi , r, σi ) and either σi = τi+1 (consecutive edges
are incident in the clique-forest) or αi+i = (r, Ti ) (consecutive edges are incident
Data complexity of FOMQA in DLs with transitive roles 25
with an r-clique). An automaton can easily check that there is no such branch.
Each time it sees a transitive role it moves to an unmarked state, storing the
role. It moves to a marked state as soon as the condition above is broken. The
automaton has |TRol(K)| states.
The automaton recognising safe counter-examples can be obtained from these
components by the simple product construction described above, because only
the last component is not weak.
C.2 Transition relation of the query component
Below we describe the set of transitions of the query satisfaction component of
the product automaton.
A transition
((β, F ), τ, ((βi , (βi , Fi ))ni=1 )
over an element node of unary type τ , with β = (σ, r, τ ) and βi = (τ, ri , τi ), is a
transition of the automaton if F is the set of all functions f : var(P ) → L which
are both upward- and downward-compatibile with some witnessing functions
f1 ∈ F1 , f2 ∈ F2 , ..., fn ∈ Fn ; we define these conditions below.
Fix a function f : var(P ) → L and a sequence of domain-disjoint functions
(f1 , f2 , ..., fn ), with fi : var(P ) → L.
f is upward-compatibile with functions fi if for each variable x ∈ var(P ):
– if ∃i : fi (x) = other, then f (x) = other,
– if ∃i : fi (x) = succ, then f (x) = succ if r is transitive and ri = r, otherwise
f (x) = other,
– otherwise either x ∈ / dom(f ), or f (x) = succ and tp(x) ⊆ τ , which we
interpret as S matching x to the current element. In the latter case x is in
(dom(f ) \ dom(fi )).
f is downward-compatibile with functions fi if for each atom s(x, y) of P :
S
– if x ∈ (dom(f ) \S dom(fi )) and y ∈ dom(fi ), then ri = s and fi (y) = succ,
– if y ∈ (dom(f ) \ dom(fi )) and x ∈ dom(fi ), then ri = s− and fi (x) = succ,
– if x ∈ dom(fi ) and y ∈ dom(fj ) for i 6= j, then s is transitive, ri = rj− = s,
and fi (x) = fj (y) = succ.
Note that we only seem to notice binary atoms for which both variables are
already present in the subtree. When a variable gets the identifier other and
there are unsatisfied binary atoms containing that variable, we know that such a
matching cannot be extended to a total matching. We still gather such matchings
in the automaton, which might be unintuitive. For clarity, we can forget such
unextendable matchings in the automaton runs by adding the following condition
to the downward-compatibility definition:
– if fi (x) = other or fi (y) = other, then x, y ∈ dom(fi ).
26 J. Kuklis
The transitions over clique-nodes are described using analogous compatibility
concepts, adapted to the clique-node setting. A transition
((β, F ), (r0 , T ), ((βi , (βi , Fi ))ni=1 )
over a clique node encoded by a role r0 and a set of unary types T , with β =
(σ, r, τ ) and βi = (σi , ri , τi ), is a transition of the automaton if F is the set of all
functions f : var(P ) → L which are both upward- and downward-compatibile
with some witnessing functions f1 ∈ F1 , f2 ∈ F2 , ..., fn ∈ Fn ; we define these
conditions below.
Fix a function f : var(P ) → L and a sequence of domain-disjoint functions
(f1 , f2 , ..., fn ), with fi : var(P ) → L.
f is upward-compatibile with functions fi if for each variable x ∈ var(P ):
– if ∃i : fi (x) = other, then f (x) = other,
– if ∃i : fi (x) = succ, then f (x) = succ if r is transitive, ri = r, and either
σi = τ or r0 = r, otherwise f (x) = other,
– otherwise one of the following holds:
• x∈ / dom(f ),
• f (x) = succ and tp(x) ⊆ τ , which we interpret as matching x to the
current element,
• f (x) = other and tp(x) ⊆ τ 0 ∈ T \ {τ }, which we interpret as matching
x to an element in the current clique-node.
f is downward-compatibile with functions fi if for each atom s(x, y) of P :
S
– if x ∈ (dom(f ) \ dom(fi )) and y ∈ dom(fi ), then ri = s, fi (y) = succ, and
either tp(x) ⊆ σi ,Sor s is transitive and r0 = s,
– if y ∈ (dom(f ) \ dom(fi )) and x ∈ dom(fi ), then ri = s− , fi (x) = succ,
and either tp(y) ⊆ σi , or s is transitive and r0 = s,
– if x ∈ dom(fi ) and y ∈ dom(fj ) for i 6= j, then s is transitive, ri = rj− = s,
fi (x) = fj (y) = succ, and either σi = σj or r0 = s.
Once again, we can enhance the downward-compatibility definition to get
rid of unextendable partial matchings; it is enough to add the same condition as
before.
To see that this transition relation ensures the intended semantics of the
states one needs to argue that each partial matching is accurately represented.
This can be done by induction on the size of the image. For size one, the match-
ing will be accounted for based solely on the labels. For larger images, use the
inductive hypothesis for restrictions of the match to variables mapped to the
trees rooted at the children of the current node.
C.3 Size estimation
Let us approach the size estimation of the automaton state space. We start by
analyzing the factors that contribute to the size of automata components. Note
that k and m denote, respectively, the maximum number of binary atoms and
the maximum size of a single conjunctive query in the union of CQs Q.
Data complexity of FOMQA in DLs with transitive roles 27
Measured entity Notated as Size estimation
|CN(K)|+|Nom(K)|·|TRol(K)|
number of unary types |Tp(K)| 2
size of node alphabet |Σ| O(|TRol(K)| · |Tp(K)||CN(K)| )
size of edge alphabet |Γ | |Tp(K)|2 · |TRol(K)|
number of trees in a forest N |Ind(K)| · |CI(K)|
clique-trees branching D |CI(K)| · |CN(K)|
size of preprocessed query |Q0 | |Q| · 3m · |Nom(K)|3m
The bound of the size of the unary types set is not just 2|CN(K)| due to the
encoding of neighbouring nominals, which requires O(|Nom(K)| · |Rol(K)|) new
concepts and concept inclusions.
The size estimation of the automata components is presented below.
Component States count upper bound
consistency of encoding |Γ |
TBox constraints:
- CIs of form: uAi v tBj 2|CI(K)|
- CIs of form: A v ∀r.B 2|CI(K)|
|CI(K)|
- CIs of form: A v ∃r.B |Γ | · 2|Nom(K)|·|CI(K)|
path safeness |TRol(K)|
3m 0
query unsatisfaction 23 ·|Q |
By taking a product of the sizes of automata components, we bound the
4m 3m
number of states of the product automaton by 22·3 ·|Q|·|T | , where |T | =
|CI(K)| + |CN(K)| + |Nom(K)| · |TRol(K)|. The factor 2 in the exponent is in-
troduced to eliminate summands of lower order. The query component makes
the size of the state-space of the product automaton double exponential. This
finalizes the proof of Theorem 3.
As a side note, an automaton with k states has total size O(k ·|Σ|·(k ·|Γ |)N +
O(m)
k ), which in our case can be expanded to 2|Q|·|K|
N
.
C.4 Functionality component for SOF
We include an additional component for each functionality declaration Fn(r).
To check functionality of r for ordinary nodes it suffices to examine the label
of the node and the labels on all incident edges, which only requires storing in
the state the label of the edge to the parent. Additionally, for all a ∈ Nom(K):
if a ∈ (Nr,b )M and a ∈ (Nr,b0 )M for some b 6= b0 , the final automaton rejects
everything; if a ∈ (Nr,b )M , the automaton checks that no type used in the input
/ (Nr,b )M for each b ∈ Nom(K), the automaton checks
forest contains Nr− ,a ; if a ∈
that a type with Nr− ,a occurs at most once in the input forest. The total number
of states in the described component is |Γ ||CI(K)| · 2|Nom(K)|·|CI(K)| , so including it
does not affect the overall upper bound.
28 J. Kuklis
D Proof of Lemma 2
For each transitive role s, all the s-atoms in Q are replaced with disjunctions of
s-chains of lengths from 1 to ` in Q∗ . If such an atom can be satisfied in J ∗ for
elements a and b, then there is an s-path from a to b of length at most `. This
shows that if J |= Q∗ , then J ∗ |= Q. On the other hand, if J ∗ |= Q, then each
transitive role atom must be satisfied by two endpoints of some s-path from J ,
of length at most `, which satisfies the subquery replacing this atom in Q∗ . This
proves the converse implication.