<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantics for querying paths in graph databases: No-repeated-node or no-repeated-edge?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Hernandez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Gutierrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Dept &amp; CIWS, Universidad de Chile</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Patterns for walks in graphs usually include regular expressions, that may result in loops, thus producing in nite solutions. To overcome this issue, practical query languages bound result sets to ensure niteness. On this paper we compare two bounding methods: edges are visited at most once; nodes are visited at most once. We show that despite these methods produce languages with di erent expressiveness, there is a duality, in the sense that the queries in one can be evaluated using the machinery used to evaluate the other.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Most graph query languages support querying for walks (or paths) between
nodes by using some fragment of regular path queries. Consider the expression
ue v, where u; v are node and e is an edge; it can be considered a pattern that
represents some type of query involving paths from u to v over edges with label
e. The precise semantics for such query, though, has many possibilities. Some of
them (see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) are the following: true if v is reachable from u through a path of
edges with label e; the pairs of nodes (u; v) with the same previous condition;
some of the walks from u to v ful lling the above condition; the graphs that
are induced by such walks, etc. In this paper we study the problem of returning
some walks.
      </p>
      <p>An issue when returning walks is that there are possibly in nitely many when
allowing loops. Thus, for practical concerns current implementations limit the
number of walks returned by choosing the shortest ones; or choosing those that
do not repeat nodes; or those that do not repeat edges; or choosing the k-top
\best" walks.</p>
      <p>
        In this paper we will study two strategies: not repeating nodes and not
repeating edges. The rst is preferred in graph matching applications (see, e.g.,
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) where no nodes on the graph should be collapsed and was recently used in
the context of property graph for pattern matching in social networks [3{5]. The
second semantics is currently used by the Cypher query language [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Organization of the paper. In Sec. 2 we present the graph and query model.
In Sec. 3 we compare the expressiveness of query languages induced by the
two strategies and show how a machinery designed to evaluate queries in one
language can be used for the other and viceversa. This is relevant to see when
some languages can be used to simulate patterns of another.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Graph and query model</title>
      <p>For our goals, it will be su cient to work over a simple graph data model without
direction on edges, and without properties in nodes and edges.</p>
      <p>We assume the existence of three in nite disjoint sets V, E, and X denoting
respectively nodes, edges, and variables. A graph database G (a graph in that
follows) is a set (V; E; ), where V V, E E, and : E ! [V ]2 is a function
that maps each edge e 2 E to a subset fu; vg 2 [V ]2. Observe that according
this model a graph database is a multigraph.</p>
      <p>A join in a graph G, denoted as [aub], is an unordered pair of edges a; b with
a common endpoint u. Think of it as the pair (u; (u)). Observe that a join is
unordered (i.e., [aub] = [bua]).</p>
      <p>A walk w in a graph G is an alternating sequence of nodes and edges
v1e1 : : : en 1vn (i.e. beginning and ending in a node) such that (ej) = fvj; vj+1g.
A connection c between two nodes u1; u2 in G is an alternating sequence of
edges and nodes e1vn : : : vn 1en (i.e. beginning and ending with an edge) such
that u1cu2 is a walk in G. A walk pattern P (a pattern in that follows) is a walk
where some edges are replaced by variables. We will denote nodes and edges with
lower case letters, and variables with upper case letters. For instance, uXveu is
a pattern where u; v are nodes, e is an edge, and X is a variable.</p>
      <p>A solution of the evaluation of a pattern P on a graph G is a mapping from
the variables occurring in P to connections in G such that the result, denoted
as (P ), is a walk obtained from replacing every variable X by h(X) in P . We
denote the set of all solutions of P on G as JP KG. Formally, a solution in JP KG
is an homomorphism from P to connections on G such that h(P ) is a walk in
G. For example, consider the pattern P = uXv , and the graph G depicted by
Fig. 1. Then, 1 : X 7! c, and 2 : X 7! awb are solutions in JP KG, because
1(P ) = uawbv and 2(P ) = ucv are a walks in G.
3</p>
      <p>No-repeated-node versus no-repeated-edge semantics
The semantics of patterns has problems when resulting sets may contain
innitely many solutions. For instance, evaluating the pattern uXv over the graph
G depicted in Fig. 1 produces in nitely many solutions because there are
innitely many walks between u and v.</p>
      <p>
        Most graph databases bound solution sets in some form [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). We study two
of the most common, based respectively in restricting solutions such that each
node (respectively each edge) occurs at most once in (P ). We will denote them
as JP jV KG (respectively JP jEKG), and we call them non-repeating-node and
non-repeating-edge semantics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>It is easy to see that JP jV KG JP jEKG for every pattern P and a graph G.
In fact, a walk w repeating and edge e with endpoints fu; vg has to repeat at
least v or u. If w does not repeat u, then w contains veuev, thus w repeats v. The
converse is not true: a counterexample is the walk ucvduev of the graph depicted
in Fig. 1 that visits twice the nodes u and v without repeating edges. A corollary
of this is that given an arbitrary pattern P , in general there does not exist a
pattern P 0 such that JP jEKG = JP 0jV KG for every graph G. Furthermore, it is
easy to see that this result is also true for the equality JP jV KG = JP 0jEKG. Hence,
these two semantics induce languages with di erent expressiveness. Moreover,
they are incomparable, that is, no one is more expressive than the other.</p>
      <p>Despite both semantics induce incomparable languages, we will show that
the result of one semantics can be obtained using the machinery that compute
the results of the other.</p>
      <p>G</p>
      <p>Gd</p>
      <p>We assume the existence of an injective function ( )d that maps every edge
to a node, and every join to an edge. Also, given a graph G, we denote by G0 its
dual graph: every node u0 in G0 correspond to an edge e in G such that u0 = ed,
and every edge e0 in G0 correspond to a join [aub] in G such that e0 = [aub]d,
and the endpoints of e0 in G0 are ad and bd. Then, we will write Gd to denote
the result of adding two fresh nodes and ! to G0 and connecting them two
every node in G0. Abusing notation, we will call Gd the dual of G. For example,
the graph Gd depicted by Fig. 1 is the dual of G1.</p>
      <p>The dual of a walk w = u1e1 : : : en 1un, denoted as wd, will be the walk
[ u1e1]de1d : : : edn 1[en 1un!]d! in Gd.</p>
      <p>Extending the notion of duality for patterns is more involving than for walks.
Indeed, given the pattern uXv, then it is natural to assume that (uXv)d is the
pattern [ uY1]dX[Y2v!]d!, where Y1 and Y2 are two variables that represent
1 For the sake of the readability, in Gd we denote ad and (awb)d simply as u and w,
respectively, because the actual symbols can be inferred from the gure.
edges holding that solutions of this dual pattern satisfy (X) = (Y1) : : : (Y2).
This translation introduces the expressions [ ; u; Y1]d and [Y2; v; !]d whose
semantics are not already formalized.</p>
      <p>We will de ne the expression [ ; u; Y ]d as the set of all possible replacements
of Y by an edge (i.e., f[ ue]d j e 2 Eg). Similarly, given a pattern P , then P d
will represent the set, denoted as repr(P d), including every possible result of
replacing each edge variable Y occurring in P d by an edge e 2 E.</p>
      <p>Now we are ready to present our main result.</p>
      <p>Theorem 1 (main). Let P be a pattern and G be a graph database. Then:
JP jE KG =
2 JP 0 jV KGd :
Proof sketch. Each walk w in G has a corresponding walk wd in Gd. Thus, for
every solution 2 JP jE KG there is a walk w such that (w )d is in Gd. It su ces
proving that there is a pattern P 0 in repr(P d) such that (w )d is a solution of
evaluating JP 0 jV KG. Determining P 0 can be done by choosing the representation
of P d that maps the edge variables to the values according to what maps
around its respective sequence variables. For instance, let P be the pattern uXv
and consider the graphs G and Gd depicted by Fig. 1. Then, a solution of JP jE KG
is the mapping : X 7! awbvcue because (P ) = uawbvcuev is a walk in Gd.
The pattern P d is [ uY1]dX[Y2; v; !]d!. A pattern P 0 2 repr(P d) matching
the ends of (X) is [ ua]dX[ev!]d!. The mapping d is a solution of JP 0 jV K.
Hence, P 0 is the pattern needed. tu</p>
      <p>Computing SP 02repr(P d)JP 0 j V KG does not require to evaluate the in nitely
many patterns in repr(P d). In fact, it su ces to check substitutions producing
not trivially empty results. For instance, [ uY ]d is not trivial only for edges
whose endpoints include u in G. As a consequence of this optimization, for
data complexity, the time required for evaluating JP jE KG with the method of
Theo. 1 is in the same complexity class than the one required to compute JP jE KG
directly, and the space grows to at most O(jEj2)</p>
      <p>Finally, let us say that the dual result (i.e. evaluate J jV KG using J jE KG)
uses similar techniques and we do not show it here for space reasons.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barcelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Reutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrgoc</surname>
          </string-name>
          .
          <source>Foundations of Modern Graph Query Languages. CoRR abs/1610.06264</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Horst</given-names>
            <surname>Bunke</surname>
          </string-name>
          .
          <article-title>Graph matching: Theoretical foundations, algorithms, and applications</article-title>
          .
          <source>In Proceedings of Vision Interface</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Wenfei</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jianzhong</given-names>
            <surname>Li</surname>
          </string-name>
          , Shuai Ma, Nan Tang,
          <string-name>
            <surname>Yinghui Wu</surname>
            , and
            <given-names>Yunpeng</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Graph Pattern Matching: From Intractable to Polynomial Time</article-title>
          . PVLDB,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Wenfei</given-names>
            <surname>Fan</surname>
          </string-name>
          .
          <article-title>Graph pattern matching revised for social network analysis</article-title>
          .
          <source>In 15th International Conference on Database Theory (ICDT)</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Wenfei</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Xin</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yinghui</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Incremental graph pattern matching</article-title>
          .
          <source>ACM Trans. Database Systems</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <source>The Neo4j Team. The Neo4j Manual v3.0</source>
          . http://neo4j.com/docs/stable/.
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>