1 Introduction

Navigational Queries Based on Frontier-Guarded Datalog: Preliminary Results?

Meghyn Bienvenu

Magdalena Ortiz

Mantas Sˇ imkus

0 0 Institute of Information Systems, Vienna University of Technology 1 LRI - CNRS & Universite ́ Paris Sud

In this paper, we introduce a navigational query language that extends binary frontier-guarded Datalog by allowing regular expressions in rule bodies and a limited use of higher-arity intensional predicates. Our query language strictly extends conjunctive two-way regular path queries (C2RPQs) and captures some of the key features advocated in recent works aimed at extending C2RPQs. We compare our language to existing proposals and establish decidability with elementary complexity of query evaluation in the presence of ontologies.

1 Introduction

The current importance of graph databases has led to renewed interest in navigational query languages that ask for nodes connected by paths that satisfy given patterns. Conjunctive two-way regular path queries (C2RPQs) and their unions (UC2RPQs), which simultaneously extend both the well-known conjunctive queries (CQs) and two-way regular path queries (RPQs), have established themselves as fundamental query languages for accessing graph databases. However, recent works have argued that such queries are lacking important features, which has motivated several extensions. For example, C2RPQs have been extended with path variables to support naming, comparing, and outputting paths [ 2 ]. Other extensions are motivated by the fact that C2RPQs and similar languages can only describe patterns over graphs labeled with a finite alphabet, and aim to overcome this by allowing values from possibly infinite datasets [ 11 ]. A third direction aims at increasing the navigational power of C2RPQs and related languages, that is, allowing queries to express more complex patterns that cannot be expressed using regular languages. This paper pursues the latter direction.

A very natural way to increase the navigational power of C2RPQs is to use nested regular expressions (NREs), that can enforce that at some points along a path there exists an outgoing path, which is itself described by a (nested) regular expression. (Conjunctive) nested 2RPQs ((C)N2RPQs), which extend (C)2RPQs by allowing NREs in the place of plain regular expressions, have received significant attention recently [ 4,3,14 ], due to their improved navigational capabilities. For example, consider a graph database of academic relationships between researchers that contains advisor and co-author relationships, as well as the fields of expertise of the researchers. With (conjunctive) regular path queries, one can find pairs of people that are connected by an arbitrary long chain of advisor or co-author relations, or find three researchers from different fields that are ? This work has been supported by ANR project PAGODA (ANR-12-JS02-007-01) and the

Austrian Science Fund (FWF) projects T515 and P25207-N23. pairwise connected by such a chain. However, we cannot find pairs connected by an arbitrary long chain of advisor or co-author relations, where additionally each node must, for example, have an academic ancestor that is a biologist. The latter can be easily done using nested regular expressions. However, neither C2RPQs nor CN2RPQs can express the query that requires that all people along such a chain have the same biologist as academic ancestor. Another query that cannot be expressed is to find pairs connected by an arbitrary chain where all nodes are connected by both the co-author and the advisor relationship. These examples illustrate two shortcomings of UCN2RPQs that have been pointed out in the literature, namely, the inability to express transitive closure over complex relations defined using UCN2RPQs and the impossibility of joining objects inside a nested expression.

Recent works have aimed at overcoming these limitations. A query language that tackles the first issue was introduced by Bourhis et al. [ 5 ] with the name of nested positive 2RPQs, and further studied more recently (using a different syntax) by Reutter et al. [ 13 ] under the name of regular queries. Essentially, this language allows full UCN2RPQs to be nested inside the regular expressions of UCN2RPQs, allowing one to define relations such as the pair of all researchers that are connected by a chain of parallel advisor and coauthor relation. This language strictly increases the expressive power of UC2NRPQs at little or no computational cost: its query evaluation problem over plain graph databases remains complete for NL in data complexity and for NP complete in combined complexity, while its query containment problem is complete for 2EXPSPACE, and even in EXPSPACE for a less succinct version of equal expressive power [ 13 ]. A way to overcome the second limitation can be found in monadically defined queries (MODEQs) [ 15 ], which extend monadic Datalog by using some special ‘flag’ constants, that are then instantiated with a given tuple of ordinary constants that behave as ‘parameters’. These queries can easily express relations like pairs connected by a chain of coauthors where all of them have the same biologist academic ancestor, using one flag constant as a ‘placeholder’ for this common object. However, they cannot express even the simple nested patterns in CN2RPQs. This limitation is overcome by nested MODEQs [ 15 ]. Some variations and generalizations of MODEQs and nested MODEQs, under the generic name of (nested) flag-and-check queries were studied recently in [ 6 ]. Many of these languages strictly generalize regular queries and nested MODEQs, and can express all of the example queries we have mentioned. However, query evaluation becomes P-complete for data complexity, and PSPACE-complete for combined complexity, and query containment becomes non-elementary.

In this paper, we introduce a navigational query language that extends binary frontierguarded Datalog by allowing regular expressions in rule bodies. It also allows the use of intensional predicates of higher arity, but the additional positions are designated as ‘parameter positions’ and subjected to special syntactic restrictions. These parameter positions can simulate the use of flag constants in flag-and-check queries and allow our query language to refer to common points inside nested subqueries as in the examples discussed above. They cannot fully simulate the power of the flag-and-check special constants in the nested versions of the query languages though, but this restriction seems to play a crucial role in avoiding the non-elementary increase in the complexity of reasoning. Our query language also allows for nesting subqueries inside regular expressions, and can in fact express all the examples discuss so far. However, in terms of nesting, our language is orthogonal to regular queries and to the nested flag-and-check queries, since they only allow for acyclic nesting, by considering only non-recursive Datalog or by allowing to nest only queries of a strictly lower nesting level. By contrast, our query language naturally supports recursive nesting: that is, an intensional predicate can occur in the C2RPQ defining it. In exchange for this recursiveness, our queries impose a ‘guardedness’ requirement that only allows to define complex nested relations for pairs of objects that are guaranteed to be related by an extensional relation in the input database. The resulting language seems to provide an interesting trade-off between expressiveness and complexity. As mentioned earlier, it can express all of the queries mentioned above, and the complexity of query evaluation is similar to that of nested MODEQs: P-complete for data complexity, NP-complete for combined complexity if the number of parameters is bounded, and PSPACE-hard and in EXPTIME for the full language (without arity restrictions). We do not study query containment in this preliminary note, but consider instead another challenging problem: query evaluation in the presence of ontological knowledge.

In the paradigm of ontology-mediated query answering, (graph) databases are enriched with an ontology that captures domain knowledge and defines additional relations for querying. These ontologies are usually expressed in description logics (DLs) [ 1 ] or in closely related formalisms based on existential rules. In general, these languages support recursion and can imply the existence of additional objects (unnamed constants or nulls). In the presence of such ontologies, the query answering problem consists in computing the certain answers to the queries over the set of all the potentially infinitely databases that extend the input data and satisfy all the formulas in the ontology. Hence, query answering is algorithmically much harder than the usual query evaluation over databases. In fact, for most query languages studied so far in this setting, query evaluation in the presence of ontologies formulated using expressive DLs is at least as hard as query containment. Query evaluation in the presence of rich ontologies has been studied for plain CN2RPQs [ 4 ], but not for any of the extensions mentioned above. Bourhis et al. provided tight non-elementary bounds for a closely related language, positive first order logic with a parametrized unary transitive closure operator (PFO+TC1), which falls strictly between regular queries and nested MODEQs [ 5 ], thus implying non-elementary hardness of query answering for other variations of nested flag-and-check queries that generalize nested MODEQs. The language we consider here, in contrast, allows for elementary query answering even for very expressive DLs, strongly suggesting that its containment problem may also remain elementary. 2

Guarded Regular Queries

In frontier-guarded rules, all variables in a rule head must occur together in a body atom. We extend such rules by allowing regular expressions in the rule bodies. We allow the use of intensional predicates of arbitrary arity, but we distinguish two different kinds of positions: each predicate has one or two main positions and any number of parameter positions. Inside the regular expressions, and with respect to frontier-guardedness, we consider only the main positions and the intensional predicates behave as unary and binary. The additional parameter positions are not subjected to guardedness, but an additional syntactic requirement that ensures they are not ‘forgotten’.

Syntax Let NP, NV, and NI be countably infinite sets of predicate names, variable names, and individual names, respectively. The set NV is the disjoint union of the set NoV of ordinary variables and the set NpV of parameter variables. We assume that there is a subset NAns NP that consists of a single distinguished k-ary answer predicate ansk for every k 0; when convenient, we will omit the arity and use ans in place of ansk.

Each predicate p 2 NP n NAns has a main arity of either 1 or 2, and a parameter arity that can be any natural number; we will write arity(p) = (k; n) to indicate that p has main arity k and parameter arity n. Note that usual unary and binary predicates correspond to predicates having parameter arity 0.

We define unary and binary alphabets, 1 and 2, as follows: 1 =fpz j p 2 NP n NAns; arity(p) = (1; n); z 2 (NpV [ NI)ng [ ffag j a 2 NIg 2 =fpz; pz j p 2 NP n NAns; arity(p) = (2; n); z 2 (NpV [ NI)ng [ fU ? j U 2 1g We consider regular languages over 2, which are defined in the usual way and represented as regular expressions or non-deterministic finite automata (NFAs).

There are three kinds of atoms: unary atoms E1(x) with E1 2 1 and x 2 NV [ NI, binary atoms E2(x; y) with E2 a regular language over 2 and x; y 2 NV [ NI, and answer atoms ansk(z) with z 2 (NV [ NI)k. We use the term base atom to mean unary atoms, answer atoms, and binary atoms E2(x; y) with E2 2 2. For a unary or binary atom , its set mvars( ) of main variables is defined as follows: mvars(E1(x)) = fxg\ NV and mvars(E2(x; y)) = fx; yg \ NV. We also define the set pvars( ) of parameter variables of : pvars(fag(x)) = ;, pvars(pz(x)) = z \ NV, and pvars(E2(x; y)) = fz 2 NV j pz appears in E2 and z 2 zg.

A rule is an expression of the form h b1; : : : ; bn, where every bi is an atom and h is a base atom. We call h the head of and b1; : : : ; bn the body. As usual, we require that every head variable also appears in the body.

A query is a set Q of rules such that exactly one predicate ansk from NAns occurs, and this predicate only appears in rule heads. If Q contains ansk, then the arity of Q is k. We will distinguish two types of predicates relative to Q: intensional predicates that occur in some rule head in Q, and extensional predicates that do not appear in any rule head. We denote by int(Q) (resp. ext(Q)) the set of intensional (resp. extensional) predicates relative to Q. Observe that ext(Q) = NP n int(Q).

In this paper, we will mainly focus on guarded regular queries, in which the following conditions hold for every rule h b1; : : : ; bn with head predicate p from NP nNAns: 1. mvars(h) 2. pvars(bi) mvars(bi) for some basic atom bi, 1 pvars(h) for all 1 i n.

i n, Note that Condition 1 imposes frontier-guardedness w.r.t. the main variables, and Condition 2 ensures that parameter variables occurring in the body of a rule occur also its head, hence they are not ‘forgotten’, in a similar spirit to the stickiness property considered for existential rules [ 7 ].

Semantics The queries we have defined are a special class of Datalog programs. Indeed, an atom pz(x; y) provides an alternative syntax for the standard Datalog atom p(x; y; z), and regular expressions can be naturally expressed in Datalog. Hence, the semantics of our queries is readily obtained from the semantics of Datalog, where rules are viewed as Horn clauses in first-order logic. However, for our purposes, it will prove more convenient to have a direct semantics for our queries, which we introduce next.

A (predicate) signature is any set of predicate names. A -interpretation I is a tuple ( I ; I ), where I is a non-empty set of domain elements and I is a function that assigns (i) to each individual c an element cI 2 I , and (ii) to each p 2 a relation pI ( I )k+n where arity(p) = (k; n). If is irrelevant, we simply call I an interpretation. For 0, we say that a 0-interpretation I0 extends a interpretation I if the following conditions hold: (i) I = I0 , (ii) cI = cI0 for every individual name c, and (iii) pI = pI0 for all p 2 .

Consider a -interpretation I, a predicate p 2 , and a function : NpV ! I . The interpretation of symbols from 1 [ 2 in I under is defined as follows: a I; = aI f g (pz)I; = fe 2 (pz)I; = f(e; e0) 2 (pz )I; = f(e; e0) 2 (U ?)I; = f(e; e) 2

I j (e; (zI )) 2 pI )g

I I I

I j (e; e0; (zI )) 2 pI )g I j (e0; e) 2 (pz)I; g I j e 2 U I; g for fag 2 for pz 2 for pz 2 for pz 2 for U ? 2

1 1 2 2 2 where zI denotes the result of replacing every individual c in z by cI and (z) denotes the result of replacing every v 2 NpV in zI by (v). For a general regular expression E built over 2, the binary relation EI; is obtained using the composition, union and transitive closure of the base relations above.

Next suppose that in addition to I and , we have a function : NoV ! I . We define the following satisfaction and entailment relations:

I; ; j= (t) I; ; j= h

I; j=

I j= I j= Q b1; : : : ; bn iff ( (tI )) 2

I; iff I; ; j= h or I; ; 6j= bi for some i iff I; ; j= iff I; j= iff I j= for all rules

for all : NoV ! for all : NpV ! 2 Q

Now consider a k-ary query Q, and let Q be the set of predicate names that occur in Q. We call a -interpretation I relevant for Q if \ int(Q) = ; and ext(Q) \ Q . If I is a -interpretation that is relevant for Q, then we denote by Q(I) the set of all k-tuples c such that I0 j= ans(c) for every [ int(Q)-interpretation I0 with I0 j= Q which extends I.

Examples We now illustrate the expressiveness of our query language with some examples. Consider a graph database with the following relationships between researchers: was PhD advisor of (ad), is a coauthor of (ca), is a collaborator of (cl), and shares research topic (srt). The following query Q1 finds pairs connected by an arbitrarily long chain of parallel ad and ca relations.

ans(x; y) ac (x; y) ac (x; y) ad [ ad (x; y); ca(x; y) This simple query cannot be expressed as a CN2RPQ; this is proven for an analogous query in [ 5 ]. The next query Q2 finds all pairs that are connected by a chain of potential collaborators, where potential collaborators are people that either are collaborators, or they share a research topic and are connected by a chain of coauthors. This query is very similar to the regular query given in Example 2 of [ 13 ]. It can also be shown along the lines of [ 5 ] that this query is not expressible as a plain CN2RPQ. The next query Q3 finds pairs of individuals who are connected by a chain of srt who are all connected to some z by a coauthorship chain. This query is not expressible as a regular query; again, this can be shown as in [ 5 ].

Finally, we give an example query Q4 that nests over guarded recursion. It builds a preferred collaborator relation that contains any pair of coauthors that have coauthored papers with a shared advisor, as well as pairs of x and y that can respectively reach researchers x0 and y0 via a chain of both srt and cl, where x0 and y0 are themselves preferred collaborators: prefcl (x; y) prefcl (x; y) ca(x; y); ca(x; z); ca(y; z); ad(x; z); ad(y; z) ca(x; y); srt (x; x0); cl (x; x0); srt (y; y0); cl (y; y0); prefcl (x0; y0) Note that Q4 does not conform to the syntax of regular queries, and we conjecture that it is expressible neither as a regular query, nor as a nested flag-and-check query. 3

Related Query Languages

In this section, we compare in more detail guarded regular queries with related languages that also extend the navigational capabilities of C2RPQs. First, it is easy to see that any nested CN2RPQ can be easily rewritten as a guarded regular query, by simply replacing exhaustively each nested expression hEi by pE ? for a fresh intensional predicate pE with arity(pE ) = (1; 0), and adding a rule pE (x) E(x; y).

Now we compare guarded regular queries with other extensions of CN2RPQs that are closer in terms of expressiveness.

Regular queries were introduced in [ 13 ] as non-recursive binary Datalog programs that additionally allow for transitive closure rules of the form P (x; y) R+(x; y): An alternative definition, that is equivalent but exponentially less succinct, is given by taking non-recursive binary Datalog rules and allowing in the body regular expressions over extensional predicates and expressions of the form R+(x; y) (for arbitrary predicates), but restricting intensional predicates to occur only once in rule bodies. The query containment problem for regular queries is complete for 2EXPSPACE, but only EXPSPACE-complete (like for plain C2RPQs) if the alternative syntax is considered. The query evaluation problem is also not harder than for C2RPQs: NL-complete in data complexity and NP-complete complete in combined complexity.

Regular queries and guarded regular queries are closely related, but there are some significant differences. We have already shown above that their expressive power is orthogonal. Regular queries have no parameters and nesting is not recursive. On the other hand, their nesting is not subject to a guardedness restriction.

Nested positive 2RPQs [ 6 ] allow for positive Boolean combinations of 2RPQs, where the regular expressions may use queries of strictly lower nesting degree as binary predicates. This language is in fact equivalent to the regular queries from [ 13 ]. PFO+TC1 is another extension of CN2RPQs studied by Bourhis et. al. that extends positive first-order logic with a parameterized unary transitive closure operator. This language is strictly more expressive than regular queries, and it can fully support the ‘parameters’ (restricted higher arities) that we allow in guarded regular queries. However, guarded regular queries and PFO+TC1 are incomparable. On the one hand, PFO+TC1 has NL-data complexity, while guarded regular queries are complete for P, as we show in Section 4. This implies that some guarded regular queries cannot be expressed in PFO+TC1. On the other hand, PFO+TC1 has been shown to have non-elementary time complexity for query containment [ 6 ] and for query answering in the presence of some DL ontologies [ 5 ]. We argue below that the latter problem is elementary for guarded regular queries,3, which implies that not all PFO+TC1 queries are expressible as guarded regular queries without a non-elementary blow-up in the formula size. Expressive Datalog fragments having a decidable containment problem have been recently studied by Bourhis et al. [ 6 ]. They consider well-known languages like linear, monadic, and frontier-guarded Datalog and extend them with special ‘flag’ constants that behave analogously to the parameter positions of guarded regular queries (this had already been done for monadic Datalog [ 15 ]); they also consider nested versions of these languages. These nested flag-and-check queries are in general more expressive than regular queries and even generalize PFO+TC1 [ 5 ]. However, similarly as for PFO+TC1, this has a high computational cost and the containment problem becomes non-elementary [ 6 ]. Also, in contrast to guarded regular queries, all these nested queries allow only for acyclic nesting, while we allow for cyclic but guarded nesting.

One of the considered languages, GQ+, is obtained by taking frontier guarded flagand-check programs (that is, frontier guarded Datalog programs with special parameter constants), and allowing to nest into their bodies analogous programs of lower nesting level. Note that, in contrast, we have a (linear) Datalog fragment (that captures C2RPQs) at the inner query level, and use frontier-guardedness to do cyclic nesting of queries. 4

Evaluating Guarded Regular Queries

In this section, we study the complexity of query evaluation over graph databases and then outline an algorithm for query answering in the presence of DL ontologies. Query Evaluation over Graph Databases The following proposition summarizes what we know about the complexity of evaluating guarded regular queries. Proposition 1. Evaluation of guarded regular queries over graph databases is: 3 We have not yet studied containment of guarded regular queries, but we believe it may be elementary as well. – P-complete for data complexity; – PSPACE-hard and in EXPTIME for combined complexity; – NP-complete for combined complexity, when the parameter arity of predicates is bounded by a fixed constant.

Proof. For Statement 1, observe that guarded regular queries fall between binary monadic Datalog and full Datalog (cf. Section 2), and for both of these languages, the query evaluation problem is P-complete in data complexity.

Membership in EXPTIME for combined complexity is a direct consequence of the EXPTIME-completeness of query evaluation in Datalog. To obtain the PSPACE lower bound, we modify the Datalog program that was used in [ 10 ] to show PSPACE-hardness of evaluating linear sirups, in order to ensure that the resulting query satisfies the ‘stickiness’ requirement (Condition 2).

For Statement 3, the lower bound comes from the NP-hardness of CQ evaluation, and the upper bound is obtained by guessing a sequence of polynomial number of rule applications (along with the required homomorphisms) that leads to deriving ans(c). Query Answering with DL KBs We sketch an algorithm for answering guarded regular queries over DL KBs. We do not give the details of particular DLs, but instead present the forest model property that many DLs have and on which our method relies.

Each DL KB K is defined over some set of unary and binary relations with parameter arity 0. The semantics of K is given as a set M(K) of -interpretations. Given a k-ary query Q, we let Q(K) be the set of k-tuples c of individuals such that c 2 Q(I) for all I 2 M(K). The set Q(K) is called the certain answer to Q over K.

Let N be the set of natural numbers and denote by w 2 N+ the set of all non-empty words over N. A - interpretation I is forest-shaped if the following conditions hold: (i) I N+ is prefix-closed, i.e., if w n 2 I and w 6= , then w 2 I ; (ii) if (e; e0; d) 2 pI for some p 2 , then one of the following holds: (a) e = e0, (b) jej = 1 and je0j = 1, or (c) e0 = e n or e = e0 n for some n 2 N.

A KB K is said to possess the forest model property if for any I 2 M(K) there exists a forest-shaped interpretation I0 2 M(K) such that: (i) there exists a homomorphism from I0 to I, and (ii) I f0; : : : ; r(jKj)g , where r is a polynomial and jKj denotes the size of K. The forest-shaped I 2 M(K) that satisfy the condition I f0; : : : ; r(jKj)g are the forest models of K. It is well known that KBs written in many popular DLs have this property (cf. [ 8 ]).

Consider a KB K with the forest model property, a query Q of arity k, and a k-tuple c of individuals. Since our query language is monotone, it follows that for such a KB K, c 62 Q(K) implies c 62 Q(I) for some forest model I 2 M(K). If K is in a standard DL like ALCHIQ, one can define a tree automaton that recognizes (an encoding of) forestshaped I 2 M(K). However, due to the semantics of the query language, deciding c 62 Q(I) involves checking the existence of an extension I0 of I such that I0 j= Q and I0 6j= ans(c), and such I0 need not be forest-shaped. In fact, unrestricted queries are undecidable and thus can enforce interpretations that are not tree-shaped and which cannot be coded into forest-shaped interpretations in a meaningful way.

To obtain decidability for guarded regular queries, we show that it is sufficient to consider only certain forest-shaped extensions. Given a -interpretation I, we let pdom(I) denote the domain elements that appear as parameter values in I, i.e. those e 2 I for which there exists a predicate name p 2 and a tuple he1; : : : ; ek+ni 2 pI with e = ei and i > k, where arity(p) = (k; n). We can prove the following: Proposition 2. Consider a KB K with the forest model property, a guarded regular query Q of arity k, and a k-tuple c of individuals. Let m be the maximal number of parameter variables over all rules of Q. Then c 62 Q(K) iff (y) there exists a forestshaped I 2 M(K) such that for any set g I with jgj m there exists a [int(Q)interpretation I0 such that: (i) I0 is forest-shaped, (ii) I0 extends I, (iii) I0; j= for every 2 Q and every with ran( ) g, (iv) I0 6j= ans(c), and (v) pdom(I0) g.

We next argue that condition (y) from Proposition 2 can be decided by employing tree automata. It is standard to represent forest-shaped interpretations of a DL KB as labeled trees upon which a tree automaton can operate (see e.g. [ 9 ]). In such an encoding, ‘roots’ of the original forest-shaped interpretation correspond to children of a dummy root in the tree representation. It is a bit more involved to obtain a tree representation of forest-shaped -interpretations when contains predicates with non-zero parameter arities. However, if the input interpretation I is such that jpdom(I)j k for a finite k, then we can essentially employ the standard representation, except that we may need to increase the alphabet exponentially in the parameter arity of original relations.

The automata algorithm to check (y) can be briefly described as follows: 1. If K is in a standard DL like ALCHIQ, we can build a nondeterministic tree automaton (NTA) A1 that recognizes (the tree encoding of) tuples (I; g; I0) such that I is a forest-shaped model of K, g I is such that jgj m, and I0 is a [ int(Q)-interpretation that satisfies conditions (i)-(v) of Proposition 2. The na¨ıve construction of A1 requires a triple-exponential number of states and can be done by adapting the construction in [ 9 ]. 2. We build an NTA A2 that recognizes tuples (I; g) such that some triple of the form (I; g; I0) is recognized by A1. Technically, this is done by performing a projection operation on A1, that projects away the third component of recognized triples. 3. We complement A2 to obtain A3. Intuitively, A3 accepts tuples (I; g) that A2 rejects, i.e. for which there is no triple (I; g; I0) that satisfies conditions (i)-(v). This step causes an exponential blowup in the number of states. 4. By projecting away the second component, we can obtain from A3 an NTA A4 that accepts forest interpretations I for which there exists g such that we cannot find an I0 that satisfies conditions (i)-(v). 5. By complementing A4, we obtain an NTA that checks condition (y) in Proposition 2. This step also causes an exponential blowup in the number of states.

Due to the complexity results on testing nonemptiness of NTAs in [ 12 ], the above construction leads to an 5EXPTIME upper bound. For standard Horn DLs like HornS HIQ, E L or DL-Lite, we can improve the upper bound to 4EXPTIME; these DLs have the so-called canonical model property and thus the last automata complementation step in the above construction is not necessary. In this paper, we proposed guarded regular queries as a new navigational query language and provided some first results regarding its relation to related proposals and the complexity and decidability of query evaluation, in particular in the presence of DL ontologies. There are number of questions that we plan to tackle in future work. First, we will complete our formal comparison of the expressive power of guarded regular queries, with the aim of showing that they are indeed incomparable to related existing formalisms. Second, we will pursue our study of the computational properties of the language by pinpointing the precise complexity of query answering in the presence of DL ontologies and by investigating the query containment problem. Finally, extending guarded regular queries by relaxing the use of parameters and frontier-guardedness is another challenging but interesting problem.

Baader ,

Calvanese ,

McGuinness ,

Nardi , and

P. F.

Patel- Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications . Cambridge University Press, 2003 .

Barcelo´ ,

Libkin ,

A. W.

Lin , and

P. T.

Wood . Expressive languages for path queries over graph-structured data . ACM TODS , 37 ( 4 ): 31 , 2012 .

3. P. Barcelo´, J.

Pe´rez, and

J. L.

Reutter . Relative expressiveness of nested regular expressions . In Proc. of AMW'12, CEUR Workshop Proceedings 866 , pages 180 - 195 , 2012 .

Bienvenu ,

Calvanese ,

Ortiz , and M. Sˇ imkus. Nested regular path queries in description logics . In Proc. of KR 2014 . AAAI Press, 2014 .

Bourhis , M.

Kro¨tzsch, and

Rudolph . How to best nest regular path queries . In Proc. of DL'14 , volume 1193 , pages 404 - 415 . CEUR-WS.org, 2014 .

Bourhis , M.

Kro¨tzsch, and

Rudolph . Query containment for highly expressive datalog fragments . CoRR, abs/1406.7801 , 2014 .

Cal

`ı, G. Gottlob, and

Pieris . Advanced processing for ontological queries . Proc. VLDB Endow ., 3 ( 1 -2): 554 - 565 , Sept. 2010 .

Calvanese ,

Eiter , and

Ortiz . Regular path queries in expressive description logics with nominals . In Proc. of IJCAI 2009 , pages 714 - 720 , 2009 .

Calvanese ,

Eiter , and

Ortiz . Answering regular path queries in expressive description logics via alternating tree-automata . Inf. Comput. , 237 : 12 - 55 , 2014 .

10. G. Gottlob and

Papadimitriou . On the complexity of single-rule datalog queries . Information and Computation , 183 ( 1 ): 104 - 122 , 2003 .

11.

Libkin and

Vrgoc . Regular path queries on graphs with data . In A. Deutsch, editor, Proc. of ICDT'12 , pages 74 - 85 . ACM, 2012 .

12.

D. E.

Muller and

P. E.

Schupp . Simulating alternating tree automata by nondeterministic automata: New results and new proofs of the theorems of Rabin, McNaughton and Safra . Theoretical Computer Science , 141 ( 12 ): 69 - 107 , 1995 .

13. J. Reutter , M.

Romero , and M. Y.

Vardi . Regular queries on graph databases . In M. Arenas and M. Ugarte, editors, Proc. of ICDT'15 , 2015 . To appear.

14.

J. L.

Reutter . Containment of nested regular expressions . CoRR Technical Report arXiv:1304.2637 , arXiv.org e-Print archive, 2013 .

15.

Rudolph and M. Kro¨tzsch. Flag & check: Data access with monadically defined queries . In Proc. of PODS'13 , pages 151 - 162 . ACM, 2013 .