Unified Algorithm to Solve several Graph
         Problems with Relational Queries

                      Wellington Cabrera, Carlos Ordonez

          Department of Computer Science, University of Houston, USA


      Abstract. Several important graph algorithms can be solved as an iter-
      ation of vector-matrix multiplication over different semirings. On this ba-
      sis, we show that the Bellman-Ford (single source shortest paths), reacha-
      bility, PageRank, and topological sort algorithms can be expressed as re-
      lational queries, to solve analytic graph problems in relational databases.
      As a main contribution, we present a general algorithm that unifies all
      graph algorithms aforementioned.


1   Introduction

Relational databases remain the most common technology to store transactional
and analytical databases, due to optimized I/O, robustness and security con-
trol. Although the common understanding is that relational database queries
are not sufficient to express important graphs algorithms, some recent work has
addressed the problem of solving graph algorithms with SQL queries. In our
previous work [5], we proposed optimized recursive queries to compute matrix
powers, to solve two important graph problems in the DBMS: Transitive clo-
sure and all pairs shortest path. In [6], the authors use recursive SQL to find
constrained paths in RDF data, stored in a relational DBMS. Jindal et all [3]
present some graph algorithms using a columnar database, showing that DBMS
are competitive with specialized graph systems, namely Giraph and GraphLab.
In this work we demonstrate that several graph algorithms can be expressed as a
succession of efficient SPJA queries that are defined over the foundation of graph
algorithms with vector-matrix operations. We believe that efficient graph algo-
rithms for relational databases are a contribution that will support in-database
graphs analytics in large data sets stored in databases.


2   Definitions

Graph Dataset Let G = (V, E) be a directed graph, where V is a set of
vertices and E is a set of edges, considered as an ordered pairs of vertices. Let
n = |V | vertices and m = |E| edges. An edge (i, j) in E links two vertices
in V and has a direction. We use E to name both the set of edges of G as
well as the adjacency matrix, depending on the context. In general, real-life
graphs are sparse. Therefore, it is reasonable to store the adjacency matrix E
in a sparse representation, which saves both computing, memory and storage
resources. Several methods to represent sparse matrices are well known, and
the interested reader may check [1]. In our work, the adjacency matrix of a
graph G is stored in a table E(i, j, v), with primary key (i, j). The sparse matrix
representation avoids storing zeroes. Thus, the space required for table E is
O(m), much smaller than the space necessary in the dense representation, O(n2 ).
    We consider a matrix sparse when the number of non-zero cells is O(n).
Likewise, we consider a graph as sparse when the number of edges is O(n).
Certain classes of graph are clearly sparse. For instance, both trees and spanning
trees are graphs where m = n − 1. Many graph problems are solved computing
powers of E. For large graphs computing such powers is challenging; even if E
is a sparse graph, it is possible that after k multiplications by itself, the number
of non-zero entries increases to O(n2 ). Thus, E k is not necessarily sparse.

Matrix Multiplication and Semirings Semirings are algebraic structures
consisting of a set R, an additive operator ⊕ with identity element 0, a product
operator ⊗ with identity element 1, and commutative, associative and distribu-
tive properties holding for the two operators in the usual manner, succinctly
represented as (R, ⊕, ⊗, 0, 1). For instance, the regular matrix multiplication is
defined under (R, +, ×, 0, 1). A general definition of matrix multiplication ex-
pands it to any semiring. On the min-plus semiring, also known as ”tropical
semiring”, min is the additive operator ⊕, and + is the product operator ⊗. The
min-plus semiring is used to solve shortest path problems, as in [2]. The boolean
semiring, with ∨ (logical OR) as ⊕ operator and ∨ (logical OR) as ⊗ operator,
is used frequently in linear algebra to represent some graph algorithms.

3    Graphs Algorithms computed with Relational Queries
We use as starting point known linear algebra approaches for Bellman-Ford,
reachability and PageRank, as well as a novel way to express Topological Sort
with matrix operations. On this foundation, we will show how all these algo-
rithms can be expressed as relational queries. As is already known, Bellman-
Ford and reachability can be solved by powers of the adjacency matrix and a
vector-matrix multiplication. Topological sort also can be solved in the same
manner, as we will show shortly. Likewise, PageRank can be solved by powers
of the transition matrix. These four algorithms can be solved in a general way
by Equation 1:
                          Sk = S0 · E k = S0 · E · E · ...                  (1)
where S0 stores the initial state and Sk stores the final result. The last iteration
is k. Instead of solving E k , it is more efficient to compute Sk with iterative
vector-matrix products: Sd ← Sd−1 · E for d = 1 . . . k. Algorithms are below.

Bellman-Ford: Shortest paths from a source s. Initialization: S0 [i] = 0 if i =
s, ∞ otherwise. After k successive vector matrix multiplications (under the min
+ semiring), the vector Sk contains the minimun distance from s to every vertex.
Reachability: Initialization: S0 [i] = 0 if i = s, 0 otherwise. Vector matrix
multiplications are computed under the natural numbers semiring. At iteration
d, Sd [i] contains 1 if there exists a path of length d from the s to the vertex i.

PageRank: It is well known that PageRank can be computed as powers of a
modified transition matrix [4]. Since PageRank is conceived as a Markov pro-
cess, it can be computed as an iterative process that stops when the Markov
chain stabilizes. Let S0 [i] = 1/n. The transition matrix T is defined as Ti,j =
Ej,i /outdeg(j) when Ej,i = 1; otherwise Ti,j = 0. Let A be a n×n matrix, whose
                                                                     0
cells contain always 1/n, and 0 < p < 1 the damping factor; let T = T + D ,
where D is a matrix such that Di,j = 1/n if the column j is a 0 column. The
                                    00             00            0
power method can be applied on T defined as: T = (1 − p)T + pA, computing
Sd ← Sd−1 · T ” iteratively, from d = 1 until stabilization of the Markov chain.

Topological Sort: Sd holds the relative ordering for the vertices, and Sd [j] = 0
when the relative order of j is not yet determined. Initialization: S0 [j] = 1
if outdeg(j) = 0, otherwise S0 [j] = 1. The algorithm can be outlined in this
manner: For each vertex j ∈ Sd−1 , find every father vertex π such that ∀i
successor of π, i ∈ S, and let Sd [π] = 1. Repeat the process until ∀i ∈ V ,
S[i] 6= 0. The topological order is given by S0 + S1 + ...Sk . This algorithm can
be expressed as in Equation 1, where Sk is solved with iterative vector-matrix
multiplication. In this case, the ⊕ operation is the logical AND: ∧, and the
⊗ is the logical implication ⇒. As result of the vector-matrix multiplication,
Sd [j] = 1 if for all j 0 successors of j , Sd [j 0 ] = 1. Although the operators ∧ and
⇒ cannot comprise a semiring (the commutative property for ⇒ does not hold),
it is possible to express Sd ·E with SPJA queries, like in the previous algorithms.


                    Table 1. Comparison of four graphs algorithms

        ...        Bellman-Ford         Reachability       PageRank          Topological Sort
  Computed as      Sd ← Sd−1 · E      Sd ← Sd−1 · E      Sd ← Sd−1 · T        Sd ← Sd−1 · E
Semiring op.(⊕, ⊗)     min, +               +, ×              +, ×                  ∧, ⇒
    Value S[i]     distance s to i 1 iif ∃ a path s to i  probability               order
  S0 defined as       S0 [s] = 0          S0 [s] = 1      S0 [i] = 1/n   S0 [i]=1 iff outdeg(i)=1
       |S0 |               1                   1                 n     | {i ∈ V | outdeg(i) = 0} |
     Output               Sk          S0 + S1 + ..Sk            Sk             S0 + S1 + ..Sk
 Time complexity    O(kn log n) O(kn) or O(kn log n) O(kn log n)                O(kn log n)
      Scope        from source s       from source s        ∀i ∈ V                 ∀i ∈ V


4    Unified Algorithm
Considering that S and E are stored in relational tables S(j, v) and E(i, j, v),
the vector-matrix product can be clearly computed with a relational query as:
πS.i,E.j:sum(S.v∗E.v) (S ./S.j=E.i E). Table 1 shows a comparison of the properties
of the four algorithms of our concern. All of them are computed by a combined
    Algorithm 1: Unified Algorithm
     Input: Table E, S0 , R0 , f (), g(), ⊗, , unionF lag . Optional input: s (source)
     Output: Table Rd(j, v)
     d ← 0; ∆ ← 1;
     while ∆ >  do
        d←d+1 ;
        Sd ← πj:g()(Sd−1 .v⊗E.v) (Sd−1 ./j=i E)
        if unionFlag then
            Rd ← πj:sum(v) (Sd ∪ Rd−1 )
        else
            Rd ← Sd
        end
        ∆ ← f (Rd−1 , Sd−1 , Rd , Sd )
     end


join/aggregation between S and E (or T ” ) as the main operation of the iter-
ation. This SPJA query computes a vector-matrix multiplication on different
semirings, depending on the specific algorithm, with time complexity O(nlogn).
The meaning of the value in S[i] is different in every case. While Bellman-Ford
and reachability return results concerning an unique source vertex, topological
sort and PageRank return results with respect to the entire graph. The execution
time of these algorithms will depend on n and the number of iterations k. For
instance, reachability may finish in less iterations than Bellman-Ford, for the
same graph. It is possible to conceptualize a Unified Algorithm to solve these
four graphs problems, as presented in Algorithm 5. The main operation remains
the matrix multiplication, with parameters ⊕, ⊗. Other input parameters are
the initial vector S0 . and unionFlag, wich controls if the output is Sd or the
cumulative sum of S0 + S1 ... + Sd . F


5     Conclusions

We demonstrate that Bellman-Ford, reachability, PageRank and Topological
Sort can be expressed as an iteration of relational queries, to solve graph prob-
lems in data sets stored in databases. Based on the conceptual foundation of
vector-matrix products under different semirings, these algorithms run as SPJA
queries; they are secondary memory algorithms, not limited by the system RAM.
The efficiency of our algorithms is supported by: sparse storage, which avoids
wasting both space and computing time; conditions for early termination, which
avoids useless iterations; and light weight relational queries. Moreover, we com-
pared these algorithms and proposed a Unified Algorithm to solve them. Further
research will identify and integrate into the Unified Algorithm more algorithms
that follow the vector-matrix multiplication pattern. Besides, we will study these
algorithms in large and complex graphs, especially in parallel clusters.
References
1. Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst. Templates for the
   solution of algebraic eigenvalue problems: a practical guide, volume 11. Siam, 2000.
2. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algo-
   rithms, Third Edition. The MIT Press, 3rd edition, 2009.
3. A. Jindal, P. Rawlani, E. Wu, S. Madden, A. Deshpande, and M. Stonebraker.
   Vertexica: your relational friend for graph analytics! Proceedings of the VLDB En-
   dowment, 7(13):1669–1672, 2014.
4. S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Extrapolation
   methods for accelerating pagerank computations. In Proceedings of the 12th Int.
   Conf. on World Wide Web, pages 261–270. ACM, 2003.
5. C. Ordonez, W. Cabrera, and A. Gurram. Comparing columnar, row and array
   DBMSs to process recursive queries on graphs. Information Systems, pages –, 2016.
6. N. Yakovets, P. Godfrey, and J. Gryz. Evaluation of SPARQL property paths via
   recursive SQL. In Proc. AMW, 2013.