<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Uni ed Algorithm to Solve several Graph Problems with Relational Queries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wellington Cabrera</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Ordonez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Houston</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Several important graph algorithms can be solved as an iteration of vector-matrix multiplication over di erent semirings. On this basis, we show that the Bellman-Ford (single source shortest paths), reachability, PageRank, and topological sort algorithms can be expressed as relational queries, to solve analytic graph problems in relational databases. As a main contribution, we present a general algorithm that uni es all graph algorithms aforementioned.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Graph Dataset Let G = (V; E) be a directed graph, where V is a set of
vertices and E is a set of edges, considered as an ordered pairs of vertices. Let
n = jV j vertices and m = jEj edges. An edge (i; j) in E links two vertices
in V and has a direction. We use E to name both the set of edges of G as
well as the adjacency matrix, depending on the context. In general, real-life
graphs are sparse. Therefore, it is reasonable to store the adjacency matrix E
in a sparse representation, which saves both computing, memory and storage
resources. Several methods to represent sparse matrices are well known, and
the interested reader may check [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In our work, the adjacency matrix of a
graph G is stored in a table E(i; j; v), with primary key (i; j). The sparse matrix
representation avoids storing zeroes. Thus, the space required for table E is
O(m), much smaller than the space necessary in the dense representation, O(n2).
      </p>
      <p>
        We consider a matrix sparse when the number of non-zero cells is O(n).
Likewise, we consider a graph as sparse when the number of edges is O(n).
Certain classes of graph are clearly sparse. For instance, both trees and spanning
trees are graphs where m = n 1. Many graph problems are solved computing
powers of E. For large graphs computing such powers is challenging; even if E
is a sparse graph, it is possible that after k multiplications by itself, the number
of non-zero entries increases to O(n2). Thus, Ek is not necessarily sparse.
Matrix Multiplication and Semirings Semirings are algebraic structures
consisting of a set R, an additive operator with identity element 0, a product
operator with identity element 1, and commutative, associative and
distributive properties holding for the two operators in the usual manner, succinctly
represented as (R; ; ; 0; 1). For instance, the regular matrix multiplication is
de ned under (R; +; ; 0; 1). A general de nition of matrix multiplication
expands it to any semiring. On the min-plus semiring, also known as "tropical
semiring", min is the additive operator , and + is the product operator . The
min-plus semiring is used to solve shortest path problems, as in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The boolean
semiring, with _ (logical OR) as operator and _ (logical OR) as operator,
is used frequently in linear algebra to represent some graph algorithms.
3
      </p>
      <p>Graphs Algorithms computed with Relational Queries
We use as starting point known linear algebra approaches for Bellman-Ford,
reachability and PageRank, as well as a novel way to express Topological Sort
with matrix operations. On this foundation, we will show how all these
algorithms can be expressed as relational queries. As is already known,
BellmanFord and reachability can be solved by powers of the adjacency matrix and a
vector-matrix multiplication. Topological sort also can be solved in the same
manner, as we will show shortly. Likewise, PageRank can be solved by powers
of the transition matrix. These four algorithms can be solved in a general way
by Equation 1:</p>
      <p>Sk = S0 Ek = S0 E</p>
      <p>
        E :::
(1)
where S0 stores the initial state and Sk stores the nal result. The last iteration
is k. Instead of solving Ek, it is more e cient to compute Sk with iterative
vector-matrix products: Sd Sd 1 E for d = 1 : : : k. Algorithms are below.
Bellman-Ford: Shortest paths from a source s. Initialization: S0[i] = 0 if i =
s; 1 otherwise. After k successive vector matrix multiplications (under the min
+ semiring), the vector Sk contains the minimun distance from s to every vertex.
Reachability: Initialization: S0[i] = 0 if i = s, 0 otherwise. Vector matrix
multiplications are computed under the natural numbers semiring. At iteration
d, Sd[i] contains 1 if there exists a path of length d from the s to the vertex i.
PageRank: It is well known that PageRank can be computed as powers of a
modi ed transition matrix [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Since PageRank is conceived as a Markov
process, it can be computed as an iterative process that stops when the Markov
chain stabilizes. Let S0[i] = 1=n. The transition matrix T is de ned as Ti;j =
Ej;i=outdeg(j) when Ej;i = 1; otherwise Ti;j = 0. Let A be a n n matrix, whose
cells contain always 1=n, and 0 &lt; p &lt; 1 the damping factor; let T 0 = T + D ,
where D is a matrix such that Di;j = 1=n if the column j is a 0 column. The
power method can be applied on T 00 de ned as: T 00 = (1 p)T 0 + pA, computing
Sd Sd 1 T " iteratively, from d = 1 until stabilization of the Markov chain.
Topological Sort: Sd holds the relative ordering for the vertices, and Sd[j] = 0
when the relative order of j is not yet determined. Initialization: S0[j] = 1
if outdeg(j) = 0, otherwise S0[j] = 1. The algorithm can be outlined in this
manner: For each vertex j 2 Sd 1, nd every father vertex such that 8i
successor of ; i 2 S, and let Sd[ ] = 1. Repeat the process until 8i 2 V ,
S[i] 6= 0. The topological order is given by S0 + S1 + :::Sk. This algorithm can
be expressed as in Equation 1, where Sk is solved with iterative vector-matrix
multiplication. In this case, the operation is the logical AND: ^, and the
is the logical implication ). As result of the vector-matrix multiplication,
Sd[j] = 1 if for all j0 successors of j , Sd[j0] = 1. Although the operators ^ and
) cannot comprise a semiring (the commutative property for ) does not hold),
it is possible to express Sd E with SPJA queries, like in the previous algorithms.
Considering that S and E are stored in relational tables S(j; v) and E(i; j; v),
the vector-matrix product can be clearly computed with a relational query as:
      </p>
      <p>S:i;E:j:sum(S:v E:v)(S ./S:j=E:i E). Table 1 shows a comparison of the properties
of the four algorithms of our concern. All of them are computed by a combined
else
end</p>
      <p>Rd</p>
      <p>Sd
end</p>
      <p>f (Rd 1; Sd 1; Rd; Sd)
Algorithm 1: Uni ed Algorithm</p>
      <p>Input: Table E, S0, R0, f (); g(); ; ; unionF lag . Optional input: s (source)
Output: Table Rd(j; v)
d 0; 1;
while &gt; do
d d + 1 ;
Sd j:g()(Sd 1:v E:v)(Sd 1 ./j=i E)
if unionFlag then</p>
      <p>Rd j:sum(v)(Sd [ Rd 1)
join/aggregation between S and E (or T ") as the main operation of the
iteration. This SPJA query computes a vector-matrix multiplication on di erent
semirings, depending on the speci c algorithm, with time complexity O(nlogn).
The meaning of the value in S[i] is di erent in every case. While Bellman-Ford
and reachability return results concerning an unique source vertex, topological
sort and PageRank return results with respect to the entire graph. The execution
time of these algorithms will depend on n and the number of iterations k. For
instance, reachability may nish in less iterations than Bellman-Ford, for the
same graph. It is possible to conceptualize a Uni ed Algorithm to solve these
four graphs problems, as presented in Algorithm 5. The main operation remains
the matrix multiplication, with parameters ; . Other input parameters are
the initial vector S0. and unionFlag, wich controls if the output is Sd or the
cumulative sum of S0 + S1::: + Sd. F
5</p>
    </sec>
    <sec id="sec-2">
      <title>Conclusions</title>
      <p>We demonstrate that Bellman-Ford, reachability, PageRank and Topological
Sort can be expressed as an iteration of relational queries, to solve graph
problems in data sets stored in databases. Based on the conceptual foundation of
vector-matrix products under di erent semirings, these algorithms run as SPJA
queries; they are secondary memory algorithms, not limited by the system RAM.
The e ciency of our algorithms is supported by: sparse storage, which avoids
wasting both space and computing time; conditions for early termination, which
avoids useless iterations; and light weight relational queries. Moreover, we
compared these algorithms and proposed a Uni ed Algorithm to solve them. Further
research will identify and integrate into the Uni ed Algorithm more algorithms
that follow the vector-matrix multiplication pattern. Besides, we will study these
algorithms in large and complex graphs, especially in parallel clusters.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Demmel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dongarra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ruhe</surname>
          </string-name>
          , and H. van der Vorst.
          <article-title>Templates for the solution of algebraic eigenvalue problems: a practical guide</article-title>
          , volume
          <volume>11</volume>
          .
          <string-name>
            <surname>Siam</surname>
          </string-name>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Cormen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Leiserson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Rivest</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Stein</surname>
          </string-name>
          . Introduction to Algorithms, Third Edition.
          <source>The MIT Press, 3rd edition</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Jindal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rawlani</surname>
          </string-name>
          , E. Wu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Stonebraker</surname>
          </string-name>
          .
          <article-title>Vertexica: your relational friend for graph analytics</article-title>
          !
          <source>Proceedings of the VLDB Endowment</source>
          ,
          <volume>7</volume>
          (
          <issue>13</issue>
          ):
          <volume>1669</volume>
          {
          <fpage>1672</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Kamvar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Haveliwala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Golub</surname>
          </string-name>
          .
          <article-title>Extrapolation methods for accelerating pagerank computations</article-title>
          .
          <source>In Proceedings of the 12th Int. Conf. on World Wide Web</source>
          , pages
          <volume>261</volume>
          {
          <fpage>270</fpage>
          . ACM,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Ordonez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cabrera</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Gurram</surname>
          </string-name>
          .
          <article-title>Comparing columnar, row and array DBMSs to process recursive queries on graphs</article-title>
          .
          <source>Information Systems</source>
          , pages {,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>N.</given-names>
            <surname>Yakovets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Godfrey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Gryz</surname>
          </string-name>
          .
          <article-title>Evaluation of SPARQL property paths via recursive SQL</article-title>
          .
          <source>In Proc. AMW</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>