<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Navigational Queries Based on Frontier-Guarded Datalog: Preliminary Results?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Meghyn Bienvenu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Magdalena Ortiz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mantas Sˇ imkus</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Information Systems, Vienna University of Technology</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LRI - CNRS &amp; Universite ́ Paris Sud</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we introduce a navigational query language that extends binary frontier-guarded Datalog by allowing regular expressions in rule bodies and a limited use of higher-arity intensional predicates. Our query language strictly extends conjunctive two-way regular path queries (C2RPQs) and captures some of the key features advocated in recent works aimed at extending C2RPQs. We compare our language to existing proposals and establish decidability with elementary complexity of query evaluation in the presence of ontologies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The current importance of graph databases has led to renewed interest in navigational
query languages that ask for nodes connected by paths that satisfy given patterns.
Conjunctive two-way regular path queries (C2RPQs) and their unions (UC2RPQs), which
simultaneously extend both the well-known conjunctive queries (CQs) and two-way
regular path queries (RPQs), have established themselves as fundamental query
languages for accessing graph databases. However, recent works have argued that such
queries are lacking important features, which has motivated several extensions. For
example, C2RPQs have been extended with path variables to support naming, comparing,
and outputting paths [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Other extensions are motivated by the fact that C2RPQs and
similar languages can only describe patterns over graphs labeled with a finite alphabet,
and aim to overcome this by allowing values from possibly infinite datasets [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. A third
direction aims at increasing the navigational power of C2RPQs and related languages,
that is, allowing queries to express more complex patterns that cannot be expressed
using regular languages. This paper pursues the latter direction.
      </p>
      <p>
        A very natural way to increase the navigational power of C2RPQs is to use nested
regular expressions (NREs), that can enforce that at some points along a path there exists
an outgoing path, which is itself described by a (nested) regular expression.
(Conjunctive) nested 2RPQs ((C)N2RPQs), which extend (C)2RPQs by allowing NREs in the
place of plain regular expressions, have received significant attention recently [
        <xref ref-type="bibr" rid="ref14 ref3 ref4">4,3,14</xref>
        ],
due to their improved navigational capabilities. For example, consider a graph database
of academic relationships between researchers that contains advisor and co-author
relationships, as well as the fields of expertise of the researchers. With (conjunctive) regular
path queries, one can find pairs of people that are connected by an arbitrary long chain
of advisor or co-author relations, or find three researchers from different fields that are
? This work has been supported by ANR project PAGODA (ANR-12-JS02-007-01) and the
      </p>
      <p>Austrian Science Fund (FWF) projects T515 and P25207-N23.
pairwise connected by such a chain. However, we cannot find pairs connected by an
arbitrary long chain of advisor or co-author relations, where additionally each node must,
for example, have an academic ancestor that is a biologist. The latter can be easily done
using nested regular expressions. However, neither C2RPQs nor CN2RPQs can express
the query that requires that all people along such a chain have the same biologist as
academic ancestor. Another query that cannot be expressed is to find pairs connected
by an arbitrary chain where all nodes are connected by both the co-author and the
advisor relationship. These examples illustrate two shortcomings of UCN2RPQs that have
been pointed out in the literature, namely, the inability to express transitive closure over
complex relations defined using UCN2RPQs and the impossibility of joining objects
inside a nested expression.</p>
      <p>
        Recent works have aimed at overcoming these limitations. A query language that
tackles the first issue was introduced by Bourhis et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] with the name of nested
positive 2RPQs, and further studied more recently (using a different syntax) by
Reutter et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] under the name of regular queries. Essentially, this language allows full
UCN2RPQs to be nested inside the regular expressions of UCN2RPQs, allowing one
to define relations such as the pair of all researchers that are connected by a chain of
parallel advisor and coauthor relation. This language strictly increases the expressive
power of UC2NRPQs at little or no computational cost: its query evaluation problem
over plain graph databases remains complete for NL in data complexity and for NP
complete in combined complexity, while its query containment problem is complete
for 2EXPSPACE, and even in EXPSPACE for a less succinct version of equal expressive
power [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. A way to overcome the second limitation can be found in monadically
defined queries (MODEQs) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], which extend monadic Datalog by using some special
‘flag’ constants, that are then instantiated with a given tuple of ordinary constants that
behave as ‘parameters’. These queries can easily express relations like pairs connected
by a chain of coauthors where all of them have the same biologist academic ancestor,
using one flag constant as a ‘placeholder’ for this common object. However, they
cannot express even the simple nested patterns in CN2RPQs. This limitation is overcome
by nested MODEQs [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Some variations and generalizations of MODEQs and nested
MODEQs, under the generic name of (nested) flag-and-check queries were studied
recently in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Many of these languages strictly generalize regular queries and nested
MODEQs, and can express all of the example queries we have mentioned. However,
query evaluation becomes P-complete for data complexity, and PSPACE-complete for
combined complexity, and query containment becomes non-elementary.
      </p>
      <p>In this paper, we introduce a navigational query language that extends binary
frontierguarded Datalog by allowing regular expressions in rule bodies. It also allows the use
of intensional predicates of higher arity, but the additional positions are designated as
‘parameter positions’ and subjected to special syntactic restrictions. These parameter
positions can simulate the use of flag constants in flag-and-check queries and allow
our query language to refer to common points inside nested subqueries as in the
examples discussed above. They cannot fully simulate the power of the flag-and-check
special constants in the nested versions of the query languages though, but this restriction
seems to play a crucial role in avoiding the non-elementary increase in the complexity
of reasoning. Our query language also allows for nesting subqueries inside regular
expressions, and can in fact express all the examples discuss so far. However, in terms of
nesting, our language is orthogonal to regular queries and to the nested flag-and-check
queries, since they only allow for acyclic nesting, by considering only non-recursive
Datalog or by allowing to nest only queries of a strictly lower nesting level. By
contrast, our query language naturally supports recursive nesting: that is, an intensional
predicate can occur in the C2RPQ defining it. In exchange for this recursiveness, our
queries impose a ‘guardedness’ requirement that only allows to define complex nested
relations for pairs of objects that are guaranteed to be related by an extensional relation
in the input database. The resulting language seems to provide an interesting trade-off
between expressiveness and complexity. As mentioned earlier, it can express all of the
queries mentioned above, and the complexity of query evaluation is similar to that of
nested MODEQs: P-complete for data complexity, NP-complete for combined
complexity if the number of parameters is bounded, and PSPACE-hard and in EXPTIME for
the full language (without arity restrictions). We do not study query containment in this
preliminary note, but consider instead another challenging problem: query evaluation
in the presence of ontological knowledge.</p>
      <p>
        In the paradigm of ontology-mediated query answering, (graph) databases are
enriched with an ontology that captures domain knowledge and defines additional
relations for querying. These ontologies are usually expressed in description logics (DLs)
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or in closely related formalisms based on existential rules. In general, these
languages support recursion and can imply the existence of additional objects (unnamed
constants or nulls). In the presence of such ontologies, the query answering problem
consists in computing the certain answers to the queries over the set of all the
potentially infinitely databases that extend the input data and satisfy all the formulas in the
ontology. Hence, query answering is algorithmically much harder than the usual query
evaluation over databases. In fact, for most query languages studied so far in this
setting, query evaluation in the presence of ontologies formulated using expressive DLs
is at least as hard as query containment. Query evaluation in the presence of rich
ontologies has been studied for plain CN2RPQs [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], but not for any of the extensions
mentioned above. Bourhis et al. provided tight non-elementary bounds for a closely
related language, positive first order logic with a parametrized unary transitive closure
operator (PFO+TC1), which falls strictly between regular queries and nested MODEQs
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], thus implying non-elementary hardness of query answering for other variations of
nested flag-and-check queries that generalize nested MODEQs. The language we
consider here, in contrast, allows for elementary query answering even for very expressive
DLs, strongly suggesting that its containment problem may also remain elementary.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Guarded Regular Queries</title>
      <p>In frontier-guarded rules, all variables in a rule head must occur together in a body atom.
We extend such rules by allowing regular expressions in the rule bodies. We allow the
use of intensional predicates of arbitrary arity, but we distinguish two different kinds of
positions: each predicate has one or two main positions and any number of parameter
positions. Inside the regular expressions, and with respect to frontier-guardedness, we
consider only the main positions and the intensional predicates behave as unary and
binary. The additional parameter positions are not subjected to guardedness, but an
additional syntactic requirement that ensures they are not ‘forgotten’.</p>
      <p>Syntax Let NP, NV, and NI be countably infinite sets of predicate names, variable
names, and individual names, respectively. The set NV is the disjoint union of the set
NoV of ordinary variables and the set NpV of parameter variables. We assume that there is
a subset NAns NP that consists of a single distinguished k-ary answer predicate ansk
for every k 0; when convenient, we will omit the arity and use ans in place of ansk.</p>
      <p>Each predicate p 2 NP n NAns has a main arity of either 1 or 2, and a parameter
arity that can be any natural number; we will write arity(p) = (k; n) to indicate that
p has main arity k and parameter arity n. Note that usual unary and binary predicates
correspond to predicates having parameter arity 0.</p>
      <p>We define unary and binary alphabets, 1 and 2, as follows:
1 =fpz j p 2 NP n NAns; arity(p) = (1; n); z 2 (NpV [ NI)ng [ ffag j a 2 NIg
2 =fpz; pz j p 2 NP n NAns; arity(p) = (2; n); z 2 (NpV [ NI)ng [ fU ? j U 2
1g
We consider regular languages over 2, which are defined in the usual way and
represented as regular expressions or non-deterministic finite automata (NFAs).</p>
      <p>There are three kinds of atoms: unary atoms E1(x) with E1 2 1 and x 2 NV [ NI,
binary atoms E2(x; y) with E2 a regular language over 2 and x; y 2 NV [ NI, and
answer atoms ansk(z) with z 2 (NV [ NI)k. We use the term base atom to mean unary
atoms, answer atoms, and binary atoms E2(x; y) with E2 2 2. For a unary or binary
atom , its set mvars( ) of main variables is defined as follows: mvars(E1(x)) = fxg\
NV and mvars(E2(x; y)) = fx; yg \ NV. We also define the set pvars( ) of parameter
variables of : pvars(fag(x)) = ;, pvars(pz(x)) = z \ NV, and pvars(E2(x; y)) =
fz 2 NV j pz appears in E2 and z 2 zg.</p>
      <p>A rule is an expression of the form h b1; : : : ; bn, where every bi is an atom and
h is a base atom. We call h the head of and b1; : : : ; bn the body. As usual, we require
that every head variable also appears in the body.</p>
      <p>A query is a set Q of rules such that exactly one predicate ansk from NAns occurs,
and this predicate only appears in rule heads. If Q contains ansk, then the arity of Q
is k. We will distinguish two types of predicates relative to Q: intensional predicates
that occur in some rule head in Q, and extensional predicates that do not appear in any
rule head. We denote by int(Q) (resp. ext(Q)) the set of intensional (resp. extensional)
predicates relative to Q. Observe that ext(Q) = NP n int(Q).</p>
      <p>In this paper, we will mainly focus on guarded regular queries, in which the
following conditions hold for every rule h b1; : : : ; bn with head predicate p from NP nNAns:
1. mvars(h)
2. pvars(bi)
mvars(bi) for some basic atom bi, 1
pvars(h) for all 1 i n.</p>
      <p>
        i
n,
Note that Condition 1 imposes frontier-guardedness w.r.t. the main variables, and
Condition 2 ensures that parameter variables occurring in the body of a rule occur also its
head, hence they are not ‘forgotten’, in a similar spirit to the stickiness property
considered for existential rules [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Semantics The queries we have defined are a special class of Datalog programs.
Indeed, an atom pz(x; y) provides an alternative syntax for the standard Datalog atom
p(x; y; z), and regular expressions can be naturally expressed in Datalog. Hence, the
semantics of our queries is readily obtained from the semantics of Datalog, where rules
are viewed as Horn clauses in first-order logic. However, for our purposes, it will prove
more convenient to have a direct semantics for our queries, which we introduce next.</p>
      <p>A (predicate) signature is any set of predicate names. A -interpretation I is a
tuple ( I ; I ), where I is a non-empty set of domain elements and I is a function
that assigns (i) to each individual c an element cI 2 I , and (ii) to each p 2 a
relation pI ( I )k+n where arity(p) = (k; n). If is irrelevant, we simply call
I an interpretation. For 0, we say that a 0-interpretation I0 extends a
interpretation I if the following conditions hold: (i) I = I0 , (ii) cI = cI0 for every
individual name c, and (iii) pI = pI0 for all p 2 .</p>
      <p>Consider a -interpretation I, a predicate p 2 , and a function : NpV ! I .
The interpretation of symbols from 1 [ 2 in I under is defined as follows:
a I; = aI
f g
(pz)I; = fe 2
(pz)I; = f(e; e0) 2
(pz )I; = f(e; e0) 2
(U ?)I; = f(e; e) 2</p>
      <p>I j (e; (zI )) 2 pI )g</p>
      <p>I
I
I</p>
      <p>I j (e; e0; (zI )) 2 pI )g
I j (e0; e) 2 (pz)I; g
I j e 2 U I; g
for fag 2
for pz 2
for pz 2
for pz 2
for U ? 2</p>
      <p>1
1
2
2
2
where zI denotes the result of replacing every individual c in z by cI and (z) denotes
the result of replacing every v 2 NpV in zI by (v). For a general regular expression
E built over 2, the binary relation EI; is obtained using the composition, union and
transitive closure of the base relations above.</p>
      <p>Next suppose that in addition to I and , we have a function : NoV ! I . We
define the following satisfaction and entailment relations:</p>
      <p>I; ; j= (t)
I; ; j= h</p>
      <p>I; j=</p>
      <p>I j=
I j= Q
b1; : : : ; bn
iff
( (tI )) 2</p>
      <p>I;
iff I; ; j= h or I; ; 6j= bi for some i
iff I; ; j=
iff I; j=
iff I j=
for all rules</p>
      <p>for all : NoV !
for all : NpV !
2 Q</p>
      <p>I</p>
      <p>I</p>
      <p>Now consider a k-ary query Q, and let Q be the set of predicate names that occur
in Q. We call a -interpretation I relevant for Q if \ int(Q) = ; and ext(Q) \ Q
. If I is a -interpretation that is relevant for Q, then we denote by Q(I) the set of all
k-tuples c such that I0 j= ans(c) for every [ int(Q)-interpretation I0 with I0 j= Q
which extends I.</p>
      <p>Examples We now illustrate the expressiveness of our query language with some
examples. Consider a graph database with the following relationships between researchers:
was PhD advisor of (ad), is a coauthor of (ca), is a collaborator of (cl), and shares
research topic (srt). The following query Q1 finds pairs connected by an arbitrarily
long chain of parallel ad and ca relations.</p>
      <p>
        ans(x; y)
ac (x; y)
ac (x; y)
ad [ ad (x; y); ca(x; y)
This simple query cannot be expressed as a CN2RPQ; this is proven for an analogous
query in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The next query Q2 finds all pairs that are connected by a chain of potential
collaborators, where potential collaborators are people that either are collaborators, or
they share a research topic and are connected by a chain of coauthors.
This query is very similar to the regular query given in Example 2 of [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. It can also
be shown along the lines of [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that this query is not expressible as a plain CN2RPQ.
The next query Q3 finds pairs of individuals who are connected by a chain of srt who
are all connected to some z by a coauthorship chain.
This query is not expressible as a regular query; again, this can be shown as in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Finally, we give an example query Q4 that nests over guarded recursion. It builds a
preferred collaborator relation that contains any pair of coauthors that have coauthored
papers with a shared advisor, as well as pairs of x and y that can respectively reach
researchers x0 and y0 via a chain of both srt and cl, where x0 and y0 are themselves
preferred collaborators:
prefcl (x; y)
prefcl (x; y)
ca(x; y); ca(x; z); ca(y; z); ad(x; z); ad(y; z)
ca(x; y); srt (x; x0); cl (x; x0); srt (y; y0); cl (y; y0); prefcl (x0; y0)
Note that Q4 does not conform to the syntax of regular queries, and we conjecture that
it is expressible neither as a regular query, nor as a nested flag-and-check query.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related Query Languages</title>
      <p>In this section, we compare in more detail guarded regular queries with related
languages that also extend the navigational capabilities of C2RPQs. First, it is easy to see
that any nested CN2RPQ can be easily rewritten as a guarded regular query, by
simply replacing exhaustively each nested expression hEi by pE ? for a fresh intensional
predicate pE with arity(pE ) = (1; 0), and adding a rule pE (x) E(x; y).</p>
      <p>Now we compare guarded regular queries with other extensions of CN2RPQs that
are closer in terms of expressiveness.</p>
      <p>
        Regular queries were introduced in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] as non-recursive binary Datalog programs that
additionally allow for transitive closure rules of the form P (x; y) R+(x; y): An
alternative definition, that is equivalent but exponentially less succinct, is given by
taking non-recursive binary Datalog rules and allowing in the body regular expressions
over extensional predicates and expressions of the form R+(x; y) (for arbitrary
predicates), but restricting intensional predicates to occur only once in rule bodies. The
query containment problem for regular queries is complete for 2EXPSPACE, but only
EXPSPACE-complete (like for plain C2RPQs) if the alternative syntax is considered.
The query evaluation problem is also not harder than for C2RPQs: NL-complete in
data complexity and NP-complete complete in combined complexity.
      </p>
      <p>Regular queries and guarded regular queries are closely related, but there are some
significant differences. We have already shown above that their expressive power is
orthogonal. Regular queries have no parameters and nesting is not recursive. On the
other hand, their nesting is not subject to a guardedness restriction.</p>
      <p>
        Nested positive 2RPQs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] allow for positive Boolean combinations of 2RPQs, where
the regular expressions may use queries of strictly lower nesting degree as binary
predicates. This language is in fact equivalent to the regular queries from [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
PFO+TC1 is another extension of CN2RPQs studied by Bourhis et. al. that extends
positive first-order logic with a parameterized unary transitive closure operator. This
language is strictly more expressive than regular queries, and it can fully support the
‘parameters’ (restricted higher arities) that we allow in guarded regular queries. However,
guarded regular queries and PFO+TC1 are incomparable. On the one hand, PFO+TC1
has NL-data complexity, while guarded regular queries are complete for P, as we show
in Section 4. This implies that some guarded regular queries cannot be expressed in
PFO+TC1. On the other hand, PFO+TC1 has been shown to have non-elementary time
complexity for query containment [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and for query answering in the presence of some
DL ontologies [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We argue below that the latter problem is elementary for guarded
regular queries,3, which implies that not all PFO+TC1 queries are expressible as guarded
regular queries without a non-elementary blow-up in the formula size.
Expressive Datalog fragments having a decidable containment problem have been
recently studied by Bourhis et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. They consider well-known languages like linear,
monadic, and frontier-guarded Datalog and extend them with special ‘flag’ constants
that behave analogously to the parameter positions of guarded regular queries (this
had already been done for monadic Datalog [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]); they also consider nested versions
of these languages. These nested flag-and-check queries are in general more
expressive than regular queries and even generalize PFO+TC1 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, similarly as for
PFO+TC1, this has a high computational cost and the containment problem becomes
non-elementary [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Also, in contrast to guarded regular queries, all these nested queries
allow only for acyclic nesting, while we allow for cyclic but guarded nesting.
      </p>
      <p>One of the considered languages, GQ+, is obtained by taking frontier guarded
flagand-check programs (that is, frontier guarded Datalog programs with special parameter
constants), and allowing to nest into their bodies analogous programs of lower nesting
level. Note that, in contrast, we have a (linear) Datalog fragment (that captures C2RPQs)
at the inner query level, and use frontier-guardedness to do cyclic nesting of queries.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluating Guarded Regular Queries</title>
      <p>In this section, we study the complexity of query evaluation over graph databases and
then outline an algorithm for query answering in the presence of DL ontologies.
Query Evaluation over Graph Databases The following proposition summarizes
what we know about the complexity of evaluating guarded regular queries.
Proposition 1. Evaluation of guarded regular queries over graph databases is:
3 We have not yet studied containment of guarded regular queries, but we believe it may be
elementary as well.
– P-complete for data complexity;
– PSPACE-hard and in EXPTIME for combined complexity;
– NP-complete for combined complexity, when the parameter arity of predicates is
bounded by a fixed constant.</p>
      <p>Proof. For Statement 1, observe that guarded regular queries fall between binary monadic
Datalog and full Datalog (cf. Section 2), and for both of these languages, the query
evaluation problem is P-complete in data complexity.</p>
      <p>
        Membership in EXPTIME for combined complexity is a direct consequence of the
EXPTIME-completeness of query evaluation in Datalog. To obtain the PSPACE lower
bound, we modify the Datalog program that was used in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to show PSPACE-hardness
of evaluating linear sirups, in order to ensure that the resulting query satisfies the
‘stickiness’ requirement (Condition 2).
      </p>
      <p>For Statement 3, the lower bound comes from the NP-hardness of CQ evaluation,
and the upper bound is obtained by guessing a sequence of polynomial number of rule
applications (along with the required homomorphisms) that leads to deriving ans(c).
Query Answering with DL KBs We sketch an algorithm for answering guarded
regular queries over DL KBs. We do not give the details of particular DLs, but instead
present the forest model property that many DLs have and on which our method relies.</p>
      <p>Each DL KB K is defined over some set of unary and binary relations with
parameter arity 0. The semantics of K is given as a set M(K) of -interpretations.
Given a k-ary query Q, we let Q(K) be the set of k-tuples c of individuals such that
c 2 Q(I) for all I 2 M(K). The set Q(K) is called the certain answer to Q over K.</p>
      <p>Let N be the set of natural numbers and denote by w 2 N+ the set of all non-empty
words over N. A - interpretation I is forest-shaped if the following conditions hold:
(i) I N+ is prefix-closed, i.e., if w n 2 I and w 6= , then w 2 I ;
(ii) if (e; e0; d) 2 pI for some p 2 , then one of the following holds: (a) e = e0, (b)
jej = 1 and je0j = 1, or (c) e0 = e n or e = e0 n for some n 2 N.</p>
      <p>
        A KB K is said to possess the forest model property if for any I 2 M(K) there exists
a forest-shaped interpretation I0 2 M(K) such that:
(i) there exists a homomorphism from I0 to I, and
(ii) I f0; : : : ; r(jKj)g , where r is a polynomial and jKj denotes the size of K.
The forest-shaped I 2 M(K) that satisfy the condition I f0; : : : ; r(jKj)g are the
forest models of K. It is well known that KBs written in many popular DLs have this
property (cf. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]).
      </p>
      <p>Consider a KB K with the forest model property, a query Q of arity k, and a k-tuple
c of individuals. Since our query language is monotone, it follows that for such a KB K,
c 62 Q(K) implies c 62 Q(I) for some forest model I 2 M(K). If K is in a standard DL
like ALCHIQ, one can define a tree automaton that recognizes (an encoding of)
forestshaped I 2 M(K). However, due to the semantics of the query language, deciding
c 62 Q(I) involves checking the existence of an extension I0 of I such that I0 j= Q
and I0 6j= ans(c), and such I0 need not be forest-shaped. In fact, unrestricted queries
are undecidable and thus can enforce interpretations that are not tree-shaped and which
cannot be coded into forest-shaped interpretations in a meaningful way.</p>
      <p>To obtain decidability for guarded regular queries, we show that it is sufficient
to consider only certain forest-shaped extensions. Given a -interpretation I, we let
pdom(I) denote the domain elements that appear as parameter values in I, i.e. those
e 2 I for which there exists a predicate name p 2 and a tuple he1; : : : ; ek+ni 2 pI
with e = ei and i &gt; k, where arity(p) = (k; n). We can prove the following:
Proposition 2. Consider a KB K with the forest model property, a guarded regular
query Q of arity k, and a k-tuple c of individuals. Let m be the maximal number of
parameter variables over all rules of Q. Then c 62 Q(K) iff (y) there exists a
forestshaped I 2 M(K) such that for any set g I with jgj m there exists a
[int(Q)interpretation I0 such that: (i) I0 is forest-shaped, (ii) I0 extends I, (iii) I0; j= for
every 2 Q and every with ran( ) g, (iv) I0 6j= ans(c), and (v) pdom(I0) g.</p>
      <p>
        We next argue that condition (y) from Proposition 2 can be decided by employing
tree automata. It is standard to represent forest-shaped interpretations of a DL KB as
labeled trees upon which a tree automaton can operate (see e.g. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). In such an encoding,
‘roots’ of the original forest-shaped interpretation correspond to children of a dummy
root in the tree representation. It is a bit more involved to obtain a tree representation of
forest-shaped -interpretations when contains predicates with non-zero parameter
arities. However, if the input interpretation I is such that jpdom(I)j k for a finite k,
then we can essentially employ the standard representation, except that we may need to
increase the alphabet exponentially in the parameter arity of original relations.
      </p>
      <p>
        The automata algorithm to check (y) can be briefly described as follows:
1. If K is in a standard DL like ALCHIQ, we can build a nondeterministic tree
automaton (NTA) A1 that recognizes (the tree encoding of) tuples (I; g; I0) such
that I is a forest-shaped model of K, g I is such that jgj m, and I0 is
a [ int(Q)-interpretation that satisfies conditions (i)-(v) of Proposition 2. The
na¨ıve construction of A1 requires a triple-exponential number of states and can be
done by adapting the construction in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
2. We build an NTA A2 that recognizes tuples (I; g) such that some triple of the form
(I; g; I0) is recognized by A1. Technically, this is done by performing a projection
operation on A1, that projects away the third component of recognized triples.
3. We complement A2 to obtain A3. Intuitively, A3 accepts tuples (I; g) that A2
rejects, i.e. for which there is no triple (I; g; I0) that satisfies conditions (i)-(v). This
step causes an exponential blowup in the number of states.
4. By projecting away the second component, we can obtain from A3 an NTA A4 that
accepts forest interpretations I for which there exists g such that we cannot find an
I0 that satisfies conditions (i)-(v).
5. By complementing A4, we obtain an NTA that checks condition (y) in
Proposition 2. This step also causes an exponential blowup in the number of states.
      </p>
      <p>
        Due to the complexity results on testing nonemptiness of NTAs in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the above
construction leads to an 5EXPTIME upper bound. For standard Horn DLs like
HornS HIQ, E L or DL-Lite, we can improve the upper bound to 4EXPTIME; these DLs have
the so-called canonical model property and thus the last automata complementation step
in the above construction is not necessary.
In this paper, we proposed guarded regular queries as a new navigational query
language and provided some first results regarding its relation to related proposals and the
complexity and decidability of query evaluation, in particular in the presence of DL
ontologies. There are number of questions that we plan to tackle in future work. First,
we will complete our formal comparison of the expressive power of guarded regular
queries, with the aim of showing that they are indeed incomparable to related existing
formalisms. Second, we will pursue our study of the computational properties of the
language by pinpointing the precise complexity of query answering in the presence of
DL ontologies and by investigating the query containment problem. Finally, extending
guarded regular queries by relaxing the use of parameters and frontier-guardedness is
another challenging but interesting problem.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-</surname>
          </string-name>
          Schneider, editors.
          <source>The Description Logic Handbook: Theory, Implementation and Applications</source>
          . Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Barcelo´</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Lin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. T.</given-names>
            <surname>Wood</surname>
          </string-name>
          .
          <article-title>Expressive languages for path queries over graph-structured data</article-title>
          .
          <source>ACM TODS</source>
          ,
          <volume>37</volume>
          (
          <issue>4</issue>
          ):
          <fpage>31</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. P. Barcelo´,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Pe´rez, and</article-title>
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Reutter</surname>
          </string-name>
          .
          <article-title>Relative expressiveness of nested regular expressions</article-title>
          .
          <source>In Proc. of AMW'12, CEUR Workshop Proceedings 866</source>
          , pages
          <fpage>180</fpage>
          -
          <lpage>195</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Bienvenu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          , and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Sˇ imkus. Nested regular path queries in description logics</article-title>
          .
          <source>In Proc. of KR</source>
          <year>2014</year>
          . AAAI Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>P.</given-names>
            <surname>Bourhis</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Kro¨tzsch, and</article-title>
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          .
          <article-title>How to best nest regular path queries</article-title>
          .
          <source>In Proc. of DL'14</source>
          , volume
          <volume>1193</volume>
          , pages
          <fpage>404</fpage>
          -
          <lpage>415</lpage>
          . CEUR-WS.org,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P.</given-names>
            <surname>Bourhis</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Kro¨tzsch, and</article-title>
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          .
          <article-title>Query containment for highly expressive datalog fragments</article-title>
          .
          <source>CoRR, abs/1406.7801</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Cal</surname>
          </string-name>
          <article-title>`ı, G. Gottlob, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pieris</surname>
          </string-name>
          .
          <article-title>Advanced processing for ontological queries</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .,
          <volume>3</volume>
          (
          <issue>1</issue>
          -2):
          <fpage>554</fpage>
          -
          <lpage>565</lpage>
          , Sept.
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          .
          <article-title>Regular path queries in expressive description logics with nominals</article-title>
          .
          <source>In Proc. of IJCAI</source>
          <year>2009</year>
          , pages
          <fpage>714</fpage>
          -
          <lpage>720</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          .
          <article-title>Answering regular path queries in expressive description logics via alternating tree-automata</article-title>
          .
          <source>Inf. Comput.</source>
          ,
          <volume>237</volume>
          :
          <fpage>12</fpage>
          -
          <lpage>55</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. G. Gottlob and
          <string-name>
            <given-names>C.</given-names>
            <surname>Papadimitriou</surname>
          </string-name>
          .
          <article-title>On the complexity of single-rule datalog queries</article-title>
          .
          <source>Information and Computation</source>
          ,
          <volume>183</volume>
          (
          <issue>1</issue>
          ):
          <fpage>104</fpage>
          -
          <lpage>122</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrgoc</surname>
          </string-name>
          .
          <article-title>Regular path queries on graphs with data</article-title>
          . In A. Deutsch, editor,
          <source>Proc. of ICDT'12</source>
          , pages
          <fpage>74</fpage>
          -
          <lpage>85</lpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Muller</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Schupp</surname>
          </string-name>
          .
          <article-title>Simulating alternating tree automata by nondeterministic automata: New results and new proofs of the theorems of Rabin, McNaughton and Safra</article-title>
          .
          <source>Theoretical Computer Science</source>
          ,
          <volume>141</volume>
          (
          <issue>12</issue>
          ):
          <fpage>69</fpage>
          -
          <lpage>107</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>J. Reutter</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Romero</surname>
            , and
            <given-names>M. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Vardi</surname>
          </string-name>
          .
          <article-title>Regular queries on graph databases</article-title>
          . In M. Arenas and M. Ugarte, editors,
          <source>Proc. of ICDT'15</source>
          ,
          <year>2015</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Reutter</surname>
          </string-name>
          .
          <article-title>Containment of nested regular expressions</article-title>
          .
          <source>CoRR Technical Report arXiv:1304.2637</source>
          , arXiv.org e-Print archive,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Kro¨tzsch. Flag &amp; check: Data access with monadically defined queries</article-title>
          .
          <source>In Proc. of PODS'13</source>
          , pages
          <fpage>151</fpage>
          -
          <lpage>162</lpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>