Semantics for querying paths in graph databases:
   No-repeated-node or no-repeated-edge?

                   Daniel Hernández and Claudio Gutierrez

              Computer Science Dept & CIWS, Universidad de Chile


      Abstract. Patterns for walks in graphs usually include regular expres-
      sions, that may result in loops, thus producing infinite solutions. To
      overcome this issue, practical query languages bound result sets to en-
      sure finiteness. On this paper we compare two bounding methods: edges
      are visited at most once; nodes are visited at most once. We show that
      despite these methods produce languages with different expressiveness,
      there is a duality, in the sense that the queries in one can be evaluated
      using the machinery used to evaluate the other.


1   Introduction
Most graph query languages support querying for walks (or paths) between
nodes by using some fragment of regular path queries. Consider the expression
ue∗ v, where u, v are node and e is an edge; it can be considered a pattern that
represents some type of query involving paths from u to v over edges with label
e. The precise semantics for such query, though, has many possibilities. Some of
them (see [1]) are the following: true if v is reachable from u through a path of
edges with label e; the pairs of nodes (u, v) with the same previous condition;
some of the walks from u to v fulfilling the above condition; the graphs that
are induced by such walks, etc. In this paper we study the problem of returning
some walks.
    An issue when returning walks is that there are possibly infinitely many when
allowing loops. Thus, for practical concerns current implementations limit the
number of walks returned by choosing the shortest ones; or choosing those that
do not repeat nodes; or those that do not repeat edges; or choosing the k-top
“best” walks.
    In this paper we will study two strategies: not repeating nodes and not re-
peating edges. The first is preferred in graph matching applications (see, e.g.,
[2]) where no nodes on the graph should be collapsed and was recently used in
the context of property graph for pattern matching in social networks [3–5]. The
second semantics is currently used by the Cypher query language [6].

Organization of the paper. In Sec. 2 we present the graph and query model.
In Sec. 3 we compare the expressiveness of query languages induced by the
two strategies and show how a machinery designed to evaluate queries in one
language can be used for the other and viceversa. This is relevant to see when
some languages can be used to simulate patterns of another.
2    Graph and query model

For our goals, it will be sufficient to work over a simple graph data model without
direction on edges, and without properties in nodes and edges.
    We assume the existence of three infinite disjoint sets V, E, and X denoting
respectively nodes, edges, and variables. A graph database G (a graph in that
follows) is a set (V, E, δ), where V ⊂ V, E ⊂ E, and δ : E → [V ]2 is a function
that maps each edge e ∈ E to a subset {u, v} ∈ [V ]2 . Observe that according
this model a graph database is a multigraph.
    A join in a graph G, denoted as [aub], is an unordered pair of edges a, b with
a common endpoint u. Think of it as the pair (u, δ(u)). Observe that a join is
unordered (i.e., [aub] = [bua]).
    A walk w in a graph G is an alternating sequence of nodes and edges
v1 e1 . . . en−1 vn (i.e. beginning and ending in a node) such that δ(ej ) = {vj , vj+1 }.
A connection c between two nodes u1 , u2 in G is an alternating sequence of
edges and nodes e1 vn . . . vn−1 en (i.e. beginning and ending with an edge) such
that u1 cu2 is a walk in G. A walk pattern P (a pattern in that follows) is a walk
where some edges are replaced by variables. We will denote nodes and edges with
lower case letters, and variables with upper case letters. For instance, uXveu is
a pattern where u, v are nodes, e is an edge, and X is a variable.
    A solution of the evaluation of a pattern P on a graph G is a mapping µ from
the variables occurring in P to connections in G such that the result, denoted
as µ(P ), is a walk obtained from replacing every variable X by h(X) in P . We
denote the set of all solutions of P on G as JP KG . Formally, a solution in JP KG
is an homomorphism from P to connections on G such that h(P ) is a walk in
G. For example, consider the pattern P = uXv , and the graph G depicted by
Fig. 1. Then, µ1 : X 7→ c, and µ2 : X 7→ awb are solutions in JP KG , because
µ1 (P ) = uawbv and µ2 (P ) = ucv are a walks in G.


3    No-repeated-node versus no-repeated-edge semantics

The semantics of patterns has problems when resulting sets may contain in-
finitely many solutions. For instance, evaluating the pattern uXv over the graph
G depicted in Fig. 1 produces infinitely many solutions because there are in-
finitely many walks between u and v.
    Most graph databases bound solution sets in some form [1]). We study two
of the most common, based respectively in restricting solutions µ such that each
node (respectively each edge) occurs at most once in µ(P ). We will denote them
as JP |V KG (respectively JP |E KG ), and we call them non-repeating-node and
non-repeating-edge semantics [1].
    It is easy to see that JP |V KG ⊆ JP |E KG for every pattern P and a graph G.
In fact, a walk w repeating and edge e with endpoints {u, v} has to repeat at
least v or u. If w does not repeat u, then w contains veuev, thus w repeats v. The
converse is not true: a counterexample is the walk ucvduev of the graph depicted
in Fig. 1 that visits twice the nodes u and v without repeating edges. A corollary
of this is that given an arbitrary pattern P , in general there does not exist a
pattern P 0 such that JP |E KG = JP 0 |V KG for every graph G. Furthermore, it is
easy to see that this result is also true for the equality JP |V KG = JP 0 |E KG . Hence,
these two semantics induce languages with different expressiveness. Moreover,
they are incomparable, that is, no one is more expressive than the other.
    Despite both semantics induce incomparable languages, we will show that
the result of one semantics can be obtained using the machinery that compute
the results of the other.


              G                                             Gd

                     Fig. 1. A graph database G and its dual Gd .

    We assume the existence of an injective function (·)d that maps every edge
to a node, and every join to an edge. Also, given a graph G, we denote by G0 its
dual graph: every node u0 in G0 correspond to an edge e in G such that u0 = ed ,
and every edge e0 in G0 correspond to a join [aub] in G such that e0 = [aub]d ,
and the endpoints of e0 in G0 are ad and bd . Then, we will write Gd to denote
the result of adding two fresh nodes α and ω to G0 and connecting them two
every node in G0 . Abusing notation, we will call Gd the dual of G. For example,
the graph Gd depicted by Fig. 1 is the dual of G1 .
    The dual of a walk w = u1 e1 . . . en−1 un , denoted as wd , will be the walk
α[αu1 e1 ]d ed1 . . . edn−1 [en−1 un ω]d ω in Gd .
    Extending the notion of duality for patterns is more involving than for walks.
Indeed, given the pattern uXv, then it is natural to assume that (uXv)d is the
pattern α[αuY1 ]d X[Y2 vω]d ω, where Y1 and Y2 are two variables that represent
1
    For the sake of the readability, in Gd we denote ad and (awb)d simply as u and w,
    respectively, because the actual symbols can be inferred from the figure.
edges holding that solutions µ of this dual pattern satisfy µ(X) = µ(Y1 ) . . . µ(Y2 ).
This translation introduces the expressions [α, u, Y1 ]d and [Y2 , v, ω]d whose se-
mantics are not already formalized.
    We will define the expression [α, u, Y ]d as the set of all possible replacements
of Y by an edge (i.e., {[αue]d | e ∈ E}). Similarly, given a pattern P , then P d
will represent the set, denoted as repr(P d ), including every possible result of
replacing each edge variable Y occurring in P d by an edge e ∈ E.
    Now we are ready to present our main result.
Theorem 1 (main). Let P be a pattern and G be a graph database. Then:
                            [         −1
             JP |E KG =           {µd : µ ∈ JP 0 |V KGd .
                               P 0 ∈repr(P d )

Proof sketch. Each walk w in G has a corresponding walk wd in Gd . Thus, for
every solution µ ∈ JP |E KG there is a walk wµ such that (wµ )d is in Gd . It suffices
proving that there is a pattern P 0 in repr(P d ) such that (wµ )d is a solution of
evaluating JP 0 |V KG . Determining P 0 can be done by choosing the representation
of P d that maps the edge variables to the values according to what µ maps
around its respective sequence variables. For instance, let P be the pattern uXv
and consider the graphs G and Gd depicted by Fig. 1. Then, a solution of JP |E KG
is the mapping µ : X 7→ awbvcue because µ(P ) = uawbvcuev is a walk in Gd .
The pattern P d is α[αuY1 ]d X[Y2 , v, ω]d ω. A pattern P 0 ∈ repr(P d ) matching
the ends of µ(X) is α[αua]d X[evω]d ω. The mapping µd is a solution of JP 0 |V K.
Hence, P 0 is the pattern needed.                                                   t
                                                                                    u
    Computing P 0 ∈repr(P d ) JP 0 | V KG does not require to evaluate the infinitely
                 S

many patterns in repr(P d ). In fact, it suffices to check substitutions producing
not trivially empty results. For instance, [αuY ]d is not trivial only for edges
whose endpoints include u in G. As a consequence of this optimization, for
data complexity, the time required for evaluating JP |E KG with the method of
Theo. 1 is in the same complexity class than the one required to compute JP |E KG
directly, and the space grows to at most O(|E|2 )
    Finally, let us say that the dual result (i.e. evaluate J· |V KG using J· |E KG )
uses similar techniques and we do not show it here for space reasons.

References
 1. R. Angles, M. Arenas, P. Barcelo, A. Hogan, J. Reutter, D. Vrgoc. Foundations of
    Modern Graph Query Languages. CoRR abs/1610.06264, 2016.
 2. Horst Bunke. Graph matching: Theoretical foundations, algorithms, and applica-
    tions. In Proceedings of Vision Interface, 2000.
 3. Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Yinghui Wu, and Yunpeng Wu.
    Graph Pattern Matching: From Intractable to Polynomial Time. PVLDB, 2010.
 4. Wenfei Fan. Graph pattern matching revised for social network analysis. In 15th
    International Conference on Database Theory (ICDT), 2012.
 5. Wenfei Fan, Xin Wang, and Yinghui Wu. Incremental graph pattern matching.
    ACM Trans. Database Systems, 2013.
 6. The Neo4j Team. The Neo4j Manual v3.0. http://neo4j.com/docs/stable/. 2016.