=Paper= {{Paper |id=Vol-2037/paper7 |storemode=property |title=Structural Characterization of Graph Navigational Languages |pdfUrl=https://ceur-ws.org/Vol-2037/paper_7.pdf |volume=Vol-2037 |authors=Valeria Fionda,Giuseppe Pirrò |dblpUrl=https://dblp.org/rec/conf/sebd/FiondaP17 }} ==Structural Characterization of Graph Navigational Languages== https://ceur-ws.org/Vol-2037/paper_7.pdf
           Structural Characterization of Graph
                 Navigational Languages?
                   (Extended Abstract)

                        Valeria Fionda1 and Giuseppe Pirrò2
                       1
                          DeMaCS, University of Calabria, Italy
                                fionda@mat.unical.it
    2
      Institute for High Performance Computing and Networking, ICAR-CNR, Italy
                                  pirro@icar.cnr.it



        Abstract. Graph navigational languages define binary relations in terms
        of pair of nodes in a graph subject to the existence of a path satisfying
        a certain regular expression. The goal of this paper is to give a novel
        characterization of navigational languages in terms of the structure of
        the graph embracing the results of a query. We define novel graph-based
        query evaluation semantics and efficient algorithms able to represent and
        capture intermediate nodes/edges linking pairs of nodes in the answer.
        We enhance the language of Nested Regular Expressions (NREs) with
        our machineries, thus defining the language of Structural NREs (sNREs).


1     Introduction

Graph data pervade everyday’s life; social networks, biological networks, and
Linked Open Data on the Web are just a few examples of its spread and flex-
ibility. The limited support that relational query languages offer in terms of
recursion stimulated the design of graph query languages where navigation is a
first-class citizen. Regular Path Queries (RPQs) and their variants allowing con-
junction (CRPQs) and inverse (C2RPQs) [5], and Nested Regular Expressions
(NREs) [11] are some examples. More recently, also SPARQL has been extended
with a (limited) navigational core called property paths (PPs). The complex-
ity of query evaluation for these languages is now well understood; the only
tractable languages are 2RPQs and NREs while the introduction of conjunction
(C2RPQs) makes the problem intractable (NP-complete) and the evaluation of
PPs still suffers from some problems (mixing set and bag semantics) [2].
    Queries expressible with these languages ask for pairs of nodes connected by
a path conforming to a regular language over binary relations. As an example,
Syd can find that one of his co-authors at distance 3 is Mark; nevertheless, no
information about the structure (i.e., intermediate nodes/edges) of the graph
where the pair (Syd, Mark) appears is given. This missing piece of information
?
    An extended version of this work titled “Explaining Graph Navigational Queries”
    will appear in proc. of the 14th Extended Semantic Web Conference (ESWC).
would allow to understand why a query has returned a certain set of (pairs of)
nodes and is useful in contexts where one needs to connect the dots, such as in
bibliographic networks, generic exploratory search or query debugging [6]. The
broad goal of this paper is to introduce Structural Nested Regular Expressions
(sNREs in short), an enhancement of Nested Regular Expressions (NREs) tack-
ling query answering from the point of view of the structure of the graph where
the results of a query appear. The challenge that we face with sNREs is to de-
fine formal semantics and efficient algorithms that allow navigational queries to
return graphs while keeping query evaluation tractable.
Related Work. There exists a solid body of work about graph query languages.
The core of such languages are Regular Path Queries (RPQs) that have been
extended with other features, among which, conjunction (CRPQs) [5], inverse
(C2RPQs) and the possibility to return and compare paths (EXPQs) [4]. These
additions do add expressive power at the cost of making query evaluation in-
tractable (combined complexity). Languages such as Nested Regular Expressions
(NREs) [11] allow existential tests in the form of nesting, in a similar spirit to
PDL and XPath while offering tractable query evaluation. Other strands of re-
search (e.g., [10, 3]) feature more expressive languages than NREs with a higher
computational cost. Our work differs from related research in one main respect:
We do not focus on increasing the expressiveness of existing languages with
new features (e.g., path variables and path variable constraints) as done with
CRPQs, ECRPQs. Our goal is to provide a novel structural characterization of
graph navigational languages taking as a yardstick the language of NREs. Such
characterzation amount at studying how NREs can be enhanced to also return
structural information (i.e., graphs) without hindering the complexity of query
evaluation.
Contributions and Outline. We make the following main contributions: (i) a
structural characterization of Nested Regular Expressions that we call Structural
NREs (sNREs), which to the best of our knowledge is the first graph navigational
language able to return graphs as the answer to a navigational query; (ii) formal
semantics based on novel operators that work with graphs besides pairs of nodes;
(iii) efficient algorithms for query evaluation under the novel semantics.
    The remainder of the paper is organized as follows. Section 2 presents the
sNREs language. Section 3 presents query evaluation algorithms and a study of
their complexity. We conclude and sketch future work in Section 4.


2   Structural Nested Regular Expressions

We focus our attention on the Resource Description Framework (RDF). An
RDF triple is a tuple of the form hs, p, oi ∈ I × I × I ∪ L, where I (IRIs) and
L (literals) are countably infinite sets; for ease of exposition we do not consider
blank nodes. An RDF graph G is a set of triples. The set of terms of a graph
will be terms(G) ⊆ I ∪ L; nodes(G) will be the set of terms used as a subject
or object of a triple while triples(G) is the set of triples in G. Since SPARQL
property paths offer very limited expressive power we will consider the language
of Nested Regular Expressions (NREs) [11] as reference language.
Nested Regular Expressions. NREs, originally proposed as the navigational core
of SPARQL, allow to express existential tests along the nodes in a path via
nesting (in the same spirit of XPath) while keeping the (combined) complexity
of query evaluation tractable. NREs allow to specify pairs of nodes in a graph G,
subject to the existence of a path satisfying a certain regular condition among
them. Each NRE nexp over an alphabet of symbols Σ defines a binary relation
JnexpKG when evaluated over a graph G. The result of the evaluation of an
NRE is a set of pairs of nodes. In the spirit of NREs, several extensions have
been defined (e.g., EPPs [7]). Although adding expressive power to NREs (e.g.,
EPPs add path conjunction and (safe) negation), these languages all return pairs
of nodes. This motivates the introduction of sNREs, which to the best of our
knowledge is the first language tackling the problem of returning graphs from
the evaluation of navigational queries while remaining tractable.

2.1   The sNREs Language
The syntax of sNREs is defined by the following grammar:

  snrexp := τ #exp     (τ ∈ {f ull, f ilt, set})
               exp := a | ˆa | exp/exp | exp|exp | exp∗ | exp[exp1 ]

where a∈Σ, ˆ denotes backward navigation, / path concatenation, | path union,
and [exp1 ] an existential test in the form of a nested expression. It is interesting
to note that, syntactically, sNREs differs from NREs (and other navigational
languages) only for the construct τ . sNREs offers a flexible way of choosing the
desired output. τ allows to specify different evaluation semantics that output:
(i ) pairs of nodes (i.e., set) as in NREs; (ii) structural information in terms
of the whole portion of the graph “touched” during the evaluation (f ull); (iii)
structural information in terms of (union of) paths leading to some result (f ilt).
On the expressiveness of sNREs. We compare sNREs with NREs and other lan-
guages that deal with paths/subgraphs. As for NREs, it is easy to see that the
output of a sNREs query, when specifying structural information via f ull or
f ilt, is richer than that of its corresponding NREs. This occurs because graph
navigation is only referenced in the semantics of NREs as the means to get the
resulting set of nodes. sNREs enrich NREs in terms of output produced via our
reconstruction algorithms (see Section 3) thus indirectly allowing to ask a larger
class of requests. As an example with sNREs one can ask to retrieve co-directors
and get as by-product movies co-directed.
     Languages like ERPQs [4] or ρ-queries [1] are clearly more expressive than
sNREs as they allow to manipulate paths via e.g., regular relations. Neverthe-
less this higher expressiveness comes at the cost of a higher complexity for query
evaluation. Our goal in this paper is to focus on providing a structural charac-
terization of low-complexity languages like NREs and showing that one can get
more (i.e., structural information) with no additional computational cost.
2.2    Structural Graphs

Structural Graphs are used to formally capture the output of the evaluation of
a sNRE and are the building blocks of the novel sNREs query semantics.

Definition 1 (Structural Graph). Given a graph G, a sNRE expression ex and a
set of starting node S⊆nodes(G), a structural graph is a quadruple Γ =(V, E, S, T )
where V ⊆nodes(G), E ⊆ triples(G) and T ⊆ V is a set of ending nodes, that
is, nodes reachable from nodes in S via paths satisfying ex.

                             Consider the graph G in Fig. 1 and the sNRE
   a   :knows   :knows   c   η=(:knows/:knows)|(:co-author/:co-author). The
              e              answer with the semantics based on pairs of nodes
   b :co-author :co-author d (i.e., NREs semantics) is the set of pairs of nodes:
                             (a,c), (b,d). Under the sNREs semantics, since there
 Fig. 1. An example graph.
                             are two starting nodes a and b from which the evalu-
ation produces results, one possibility would be to consider the Structural Graph
(SG) capturing all results, that is, Γ =(nodes(G), triple(G), {a,b}, {c,d}). How-
ever, one may note that there exists a path (via the node e) from a to d in
the SG even if the pair (a,d) does not belong to the answer. This could lead to
misinterpretation of the query results and their structural information. To avoid
these situations, we define G-sound and G-complete SGs.

Definition 2 (G-Soundness). Given a graph G and a sNRE expression ex, a
SG is G-sound if, and only if, all ending nodes are reachable from all the starting
nodes via some paths satisfying ex.
Definition 3 (G-Completeness). Given a graph G and a sNRE expression
ex, an SG is G-complete if, and only if, all nodes in a graph G reachable from
some starting nodes, via some paths satisfying ex, are in the ending nodes.
    The SG Γ in the above example violates G-soundness because the only path
existing (via the node e) from a to d (and from b to c) in Γ does not satisfy the
expression η. The following lemma guarantees G-soundness and G-completeness.

Lemma 4 (G-Sound and G-Complete SGs). Structural Graphs having a
single starting node v∈nodes(G) are G-sound and G-complete.
We are now ready to define answer graphs that capture the output of sNREs.

Definition 5 (Answer Graph). Given a sNRE expression ex and a graph G,
an answer graph E is a set of G-sound and G-complete structural graphs.

The answer graph of our previous example is the set E={Γa , Γb } s.t.:

 1. Γa =({a,e,c}, {ha, :knows, ei, he, :knows, ci}, a, {c})
 2. Γb =({b,e,d}, {ha, :co-author, ei, he, :co-author, ci}, b, {d}).

    Note that E is an answer graph according to both f ull and f ilt semantics
since there are no edges that do not contribute to reach some result node.
Formal Semantics. We devised two different evaluation semantics for sNREs that
match the syntactic constructs f ilt and f ull. In particular, the f ull semantics
considers all the parts of G that were visited during the evaluation of an expres-
sion, while the second one f ilt, only the portion of G that actually contributed
to build the answer; in other words, the set of SGs such that each Γv only con-
siders paths that starts from v∈nodes(G) and satisfy the expression (i.e., union
of all the successful paths). For sake of space we do not report the semantics
that are available in an extended version of this paper [9].


3    Algorithms and Complexity

We now describe algorithms for the evaluation of sNREs expressions under the
novel semantics. The interesting result is that the evaluation of a sNRE expres-
sion e in this new setting can be done efficiently. Let e be a sNRE expression
and G a graph. Let |e| be the size of e, Σe the set of edge labels appearing in
it, and |G|=|nodes(G)| + |triples(G)| be the size of G. Algorithms that build
answer graphs according to the f ull or f ilt semantics are automata-based and
work in two steps. The first step is shared and leverages product automata; the
second step requires a marking phase only for the f ilt semantics and is needed
to include nodes and edges that are relevant for the answer.
Building Product Automata. The idea is to associate to e (and to each
[exp]) a non deterministic finite state automaton with  transitions Ae (Aexp ,
resp.). Such automata can be built according to           S the standard Thomson con-
struction rules over the alphabet V oc(e)=Σe ∪ [e1 ]∈e , that is, by considering
also [e1 ] in e as basic symbols. The product automaton is a tuple G × Ae =
                                                                                           e
hQe , V oc(e), δ e , Qe0 , F e i where Qe is a set of states, δ e :Qe × (V oc(e) ∪ ) → 2Q
                                     e     e                                      e     e
is the transition function, Q0 ⊆ Q is the set of initial states, and F ⊆ Q is
the set of final states. The building of the product automaton G × Ae is based
on the algorithm used by [11] based on the labeling of the nodes of G wrt nested
subexpressions in e.
Building Answer Graphs. We now discuss algorithms that leverage prod-
uct automata (of the sNRE expression e and all nested subexpressions) to pro-
duce answer graphs according to the f ull and f ilt semantics. To access the
elements of a structural graph Γ (see Definition 1) we use the notation Γ.x,
with x ∈ {V, E, S, T }. The main algorithm is Algorithm 1, which receives the
sNRE expression and the type of answer graph to be built. In case of the f ilt
semantics the data structure reached, which maintains a set of states (ni , qj ), is
initialized via the procedure mark (line 3) reported in Algorithm 2; otherwise, it
is initialized as the union of: (i) all the states of the product automaton G × Ae ;
(ii) all the states of the product automata (G × Aexp ) of all the nested expres-
sions in e (line 5). The procedure mark fills the set reached with all the states
in all the product automata that contribute to obtain an answer; these are the
states in a path from an initial state to a final state in the product automata. As
shown in Algorithm 2, reached is populated by navigating the product automata
backward from the final states to the initial ones.
   Input : sNRE e, graph G, output (full or filt)
   Output: E: an answer graph as set of SGs Γs
    1: build the product automaton G×Ae
    2: if filt /* filtered semantics */ then
    3:     reached = mark(G×Ae , ∅)
           /* reached keeps nodes in G×Ae that are in a path to a final state */
    4: else
    5:    reached = Qe ∪ [exp] in e Qexp
                           S

    6: for all (s, qo ) ∈ Q0 do
                           e

    7:    Γs = h{s}, ∅, s, ∅i
    8:    seen(s,qo ) = {s}
           /* seen for each state sj keeps nodes in G×Ae from which it has been reached */
    9: E = S(s,qo )∈Qe0 {Γs }
   10: visit = S(s,qo )∈Qe0 {((s, q0 ), {s})}
        /* visit keeps nodes to be visited */
   11: buildA(G × Ae , reached, E, visit)
                                   Algorithm 1: BuildAnsG

   Input: product automaton G × A, set of states reached
   Build: set of states reached
     1 reached = reached ∪ (n,q )∈F e {(n, qf )}
                             S
                                    f
     2 visit = (n,q )∈F e {(n, qf )} s.t. qf ∈ F
               S
                     f
     3 visitN = ∅
     4 while visit 6= ∅ do
     5    for all (n, q) in visit do
     6       for all transition δ((n0 , q 0 ), x) ∈ G × Ae s.t. (n, q) ∈ δ((n0 , q 0 ), x) do
     7          if (n0 , q 0 ) ∈
                               / reached then
     8              visitN = visitN ∪ {(n0 , q 0 )}
     9              reached = reached ∪ {(n0 , q 0 )}
    10          for all [exp] in x do
    11              mark(G × Aexp , reached)
    12    visit = visitN
    13    visitN = ∅
    14 return reached
                         Algorithm 2: mark(G × Ae , reached)


    Then, the set of SGs composing an answer graph E are initialized (lines 6-7;
9) by adding to E an SG Γs for each initial state (s, q0 ) of G × Ae . Moreover, the
data structure seen is also initialized (line 8) by associating to each state (s, q0 )
the node s (associated to the initial state (s, q0 )) from which it has been visited.
Hence, seen maintains for each state, reached during the visit of the product
automata, the starting nodes from which this state has already been visited.
    The usage of seen avoids to visit the same state more than once for each
starting node. The data structure visit is also initialized with the initial states
of G × Ae (line 10); it contains all the states to be visited in the subsequent step
plus the set of starting nodes for which these states have to be visited. Then, the
SGs are built via buildA (Algorithm 3). All the states in visit are considered
(line 2) only once for the set Bn,q , which keeps starting nodes for which states
in visit have to be processed (line 3). Then, for each state (n, q)∈visit all its
transitions are considered (line 7); in particular, for each state (n0 , q 0 )∈reached,
reachable from some (n, q) ∈ visit via some transitions (line 8), the set of “new”
starting nodes (D) for which (n0 , q 0 ) has to be visited in the subsequent step is
computed with a possible update of the sets visit and seen (lines 9-12).
   Input: product automaton G × Aē , set of states reached, Answer Graph E, list
          of states to visit visit
     1 visitN = ∅
     2 for all (n, q)
                   S in visit do
     3    Bn,q = ((n,q),S)∈visit S
     4    for all s ∈ Bn,q do
     5       if q ∈ F ē then
     6           add n to Γs .T
     7    for all transition δ ē ((n, q), x) do
     8       for all (n0 , q 0 ) ∈ δ ē ((n, q), x) s.t (n0 , q 0 ) ∈ reached do
     9           D = Bn,q \ seen(n,q)
    10           if D 6= ∅ then
    11              visitN = visitN ∪ {((n0 , q 0 ), D)}
    12              seen(n0 ,q0 ) = seen(n0 ,q0 ) ∪ D
    13           if x ∈ Σe then
    14              for all s ∈ Bn,q do
    15                  add n0 to Γs .V
    16                  add (n, x, n0 ) to Γs
    17           else if x is a [exp] then
                                        exp
    18              let (n, q0 ) ∈ Q0
    19              seen(n,q ) = seen(n,q ) ∪ Bn,q
                              0                0
    20              buildA(G × Aexp , reached, E, {(n, q), Bn,q })
    21 buildA(G × Aē , reached, E, visitN)
                  Algorithm 3: buildA(G × A, reached, E, visit)


   If the transition is labeled with a predicate symbol in G (line 13), the SGs
corresponding to nodes s∈Bn,q are populated by adding the corresponding nodes
and edges (lines 14-16). Otherwise, if the transition is a nested expression the
building of the answer graph E proceeds recursively by visiting the product
automata associated to all nested (sub)expressions.
Theorem 6. Given a graph G and a sNRE expression e, the answer graph E
(according to both semantics) can be computed in time O(|nodes(G)| × |G| × |e|).
Proof (Sketch). The answer graph built according to the f ull semantics can be
constructed by visiting G×Ae (Algorithm 3). In particular, for each starting state
(n, q), the states and transitions of G×Ae are all visited at most once (and the
same also holds for the automata corresponding to the nested expressions of e).
The starting and ending nodes of each answer graph are set during the visit of the
product automaton. For each node s corresponding to a starting state (s, qo )∈Qe0
a structural graph is created (Algorithm 1, lines 6-7); the set of nodes reachable
from s is set to be Γs .T = {n | (n, q) ∈ F e and (n, q) is reachable from (s, qo )}
(Algorithm 3 lines 5-6). Thus, each structural graph can be computed    by visiting
each transition and each node exactly once with a cost O(|Qe |+ [exp]∈e |Qexp |+
                                                                  P

|δ e | + [exp]∈e |δ exp |) = O(|G| × |e|). Since the number of SGs to be con-
        P

structed is bound by |nodes(G)|, the total cost of building the answer graph
E, is O(|nodes(G)| × |G| × |e|). This bounds also take into account the cost of
building product automata. In the case of the f ilt semantics, the marking phase
does not increase the complexity bound; this is because the set reachable, which
keeps reachable states, is built by visiting at most once all nodes and transitions
in all the product automata, with a cost O(|G| × |e|).                             t
                                                                                   u
   Note that in Algorithm 3, the amortized processing time per node is lower
than |G| × |e| when visiting the product automaton since the Breadth First
Search(es) from each starting state are concurrently run according to the algo-
rithm in [12]. Finally, the SGs in the E built via Algorithm 1 are both G-sound
and G-complete. It is easy to see by the definition of the product automaton,
that there exists a starting state (n, q0 ) that is connected to a final state (n0 , qf )
in G × Ae and, thus, a path from n to n0 in Γn iff, there exists a path connecting
n to n0 in G satisfying e.

4    Concluding Remarks and Future Work
We have studied the novel problem of characterizing graph languages from the
point of view of the structure of the graph where queries are evaluated. We picked
the language of NREs as yardstick because of its attractive computational prop-
erties and showed how we can get more (in terms of structural information) at
the same computational cost. We had to face challenges in terms of formalization
of novel semantics that can deal with graphs and non-trivial (automata-based)
algorithms that can capture structural information in an efficient way. In an
extended version of this work [9] we discuss an experimental evaluation and an
instantiation of the framework to tackle the problem of explaining graph queries.
This works opens some challenging research questions such as: (i) consider a more
expressive navigational core [7] and enhance it with structural information; (ii)
add negative information (e.g., parts of a query that failed) (iii) provide struc-
tural information according to user-specified preferences [8].
References
 1. K. Anyanwu and A. Sheth. p-Queries: Enabling Querying for Semantic Associa-
    tions on the Semantic Web. In WWW, pages 690–699. ACM, 2003.
 2. M. Arenas, S. Conca, and J. Pérez. Counting Beyond a Yottabyte, or how SPARQL
    1.1 Property Paths will Prevent Adoption of the Standard. In WWW, pages 629–
    638, 2012.
 3. M. Arenas, G. Gottlob, and A. Pieris. Expressive Languages for Querying the
    Semantic Web. In PODS, 2014.
 4. P. Barceló, L. Libkin, A. W. Lin, and P. T. Wood. Expressive Languages for Path
    Queries over Graph-Structured Data. ACM TODS, 37(4):31, 2012.
 5. D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. Containment of
    Conjunctive Regular Path Queries with Inverse. In KR, pages 176–185, 2000.
 6. M. P. Consens, J. W. S. Liu, and F. Rizzolo. Xplainer: Visual explanations of
    xpath queries. In ICDE, pages 636–645. IEEE, 2007.
 7. V. Fionda, Pirrò G., and M. P. Consens. Extended Property Paths: Writing More
    SPARQL Queries in a Succinct Way. In AAAI, 2015.
 8. V. Fionda and G. Pirrò. Querying Graphs with Preferences. In CIKM, pages
    929–938. ACM, 2013.
 9. V. Fionda and G. Pirrò. Explaining Graph Navigational Queries. In ESWC, 2017.
10. L. Libkin, J. Reutter, and D. Vrgoč. Trial for RDF: adapting graph query languages
    for RDF data. In PODS, pages 201–212, 2013.
11. J. Pérez, M. Arenas, and C. Gutierrez. nSPARQL: A Navigational Language for
    RDF. Journal of Web Semantics, 8(4), 2010.
12. M. Then, M. Kaufmann, F. Chirigati, T. Hoang-Vu, K. Pham, A. Kemper, T. Neu-
    mann, and H. T. Vo. The More the Merrier: Efficient Multi-Source Graph Traver-
    sal. VLDB Endowment, 8(4):449–460, 2014.