HUGS – A Lightweight Graph Partitioning Approach

                                Alexander Krause, Hannes Voigt, Wolfgang Lehner
                                                  Technische Universität Dresden
                                                    Database Systems Group
                                                        Dresden, Germany
                                           {firstname.lastname}@tu-dresden.de


ABSTRACT                                                               unequally higher with many major IT companies and DBMS
The growing interest in graph data lead to increasingly more           vendors on the band wagon [7, 24]. Among others, one major
research in the field of graph data management and graph               driver behind the graph concept’s revival is a shift in the
analytics. Nowadays, even large graphs of upto a size of               interest of analytics from merely reporting towards data-
billions of vertices and edges fit into main memory of big             intensive science and discovery [11]. Graph data can easily
modern multisocket machines, making them a first-grade                 range in the size of billions of vertices and edges. Promi-
platform for graph management and graph analytics. High                nent examples of large graphs are the Facebook friendship
performance data management solutions have to be aware                 graph, the Twitter follower graph, citation networks, web
of the NUMA properties of such big machines. A data-                   link networks, road networks, supply chains, etc. (cf. [21]).
oriented architecture (DORA) is a particular solution to that.         The analysis even of the biggest graphs is often done with
However, it requires partitioning the data in a way such that          recursive algorithms [26]. The Breadth First Search (BFS)
inter-partition communication can be avoided.                          is one of the most fundamental building blocks for many
   Graph partitioning is a long studied problem and state-             popular graph analysis algorithms, such as algorithms to
of-the-art solutions, such as multilevel k-way partitioning            determine reachability, connected components, and between-
and recursive bisection achieve good results in feasible time.         ness centrality [4, 13, 19].
Integrating such solution is a rather difficult task, though.             The approach on how to work with a given graph is always
In this paper, we present a more lightweight approach called           determined by its size. The storage and mining of such poten-
Hugs. The key idea of Hugs is to reuse the BFS routine                 tially huge data sets requires computational resources rang-
present in a graph data management system anyway, since                ing from small workstations up to whole compute clusters.
it is the foundation of many analytical graph algorithms.              With today’s continuously decreasing main memory cost and
Hugs is not meant to produce a good general-purpose graph              increasingly stronger hardware, even standalone server work-
partitioning but good runtimes of BFS graph traversals such            stations with multiple CPUs and terabyte of main memory
as reachability queries on DORA systems. In our experiments            can process graphs of the billion scale. Such multiprocessor
Hugs showed capable of finding good graph partitionings                machines consist of multiple sockets, whereby every socket
faster than state-of-the-art approaches. The partitioning              holds multiple CPUs and a share of the whole systems main
found by Hugs also showed shorter runtimes for reachability            memory. In these large shared memory machines special
queries.                                                               attention has to be payed to the organization of memory al-
                                                                       location and access [1, 10, 18]. Each socket manages its share
                                                                       of physical memory and provides the other sockets access.
Keywords                                                               Although each core has access to each byte of memory, it is
Graph, Graph Partitioning, Reachability, BFS, DORA                     a Non Uniform Memory Access (NUMA). Bandwidth and
                                                                       latency is lower for remote memory access to other sockets
1.   INTRODUCTION                                                      than the access to a socket’s own memory. Data manage-
                                                                       ment systems oblivious of these NUMA effects take huge
  The last decade has seen resurgence of interest in graph
                                                                       performance penalties on a large scale for remote memory
data management [2]. With the network data model in the
                                                                       accesses [18]. The Data-Oriented Architecture (DORA) [23]
1970s [28] and object-oriented database systems in the early
                                                                       is an architecture for so called NUMA-aware DBMS. Such
1990s graph-based data models and graph query languages
                                                                       DBMS try to co-locate data and process as much as possible
got considerable attention in research already [3]. However,
                                                                       locally to avoid expensive remote memory access [20]. In
the traction of today’s graph data management efforts is
                                                                       particular, partitions of the data, are bound to an individual
                                                                       socket, and processed only by threads running on this socket
                                                                       exclusively. Thereby remote memory access is limited to
                                                                       inter-partition communication of operators such as a traver-
                                                                       sal over a graph. Obviously, the partitioning has a huge
                                                                       impact on how much inter-partition communication takes
                                                                       place during processing of a specific operator. In terms of
28th GI-Workshop on Foundations of Databases (Grundlagen von Daten-
                                                                       graph processing on multiprocessor systems, the partitioning
banken), 24.05.2016 - 27.05.2016, Nörten-Hardenberg, Germany.          of the graph is inevitable.
Copyright is held by the author/owner(s).


                                                                  68
         HUGS – A Lightweight Graph Partitioning Approach


   Partitioning graphs is a well studied field [5]. The typical
problem definition asks for the partitioning of a graph into
                                                                                         Message Passing Interface
k partitions so that the number of edges cut at partition
borders is minimal. In this absolute form the problem is
NP-hard. Over the years many heuristic algorithms have
been developed that try to minimize edge cuts. Prominent
examples are recursive bisection [14] and multilevel k-way
partitioning [16]. By exploiting the graphs topology, these
general purpose algorithms find good and useful partitionings                                                         B

in many real world scenarios in feasible time. However, the
resulting partitions are not optimized for BFS-based analyses                       A

on a multiprocessor system.
   In this paper we propose a novel lightweight graph par-
titioning approach called Hugs. Hugs determines graph
partitioning with the help of BFS traversals starting from                   Figure 1: Communication between workers
high-degree vertices. As we reuse the BFS traversal opera-
tions, which are available in graph data management system
anyway, Hugs only needs minor additions to work with
common graph systems. In contrast to the general purpose
                                                                     tions, e.g. for a join, every worker can independently prepare
partitions of current graph partitioning approaches, Hugs
                                                                     sub-results for its available data partitions. Once these local
exclusively aims at a partitioning scheme good for the speed-
                                                                     computations have been concluded, requests to other units
up of BFS-based reachability queries in a NUMA setting.
                                                                     can be made for fetching possible join partners. As every
Hugs offers two advantages. First, it provides a suitable par-
                                                                     worker is solely working on its own partitions, no further
titioning for BFS-based analyses and second, its lightweight
                                                                     protection for the data partitions needs to be considered.
design allows an easy integration into standard graph systems.
                                                                        Cross-partition operations require communication between
In our experimental evaluations Hugs exhibits partitioning
                                                                     the actual workers. This effect occurs, if e.g. a traversal
speed and partitioning quality for reachability queries compa-
                                                                     operation accesses a part of the graph, which is stored in
rable to the state-of-the-art algorithms. Hence, Hugs offers
                                                                     another partition than it is currently running in. For this
a significantly easier to implement, lightweight alternative
                                                                     purpose, the system needs to provide an asynchronous, high-
to state-of-the-art graph partitioning algorithms for graph
                                                                     throughput message passing layer that allows to hand over
data management system.
                                                                     a task to another worker. For instance, when a traversal
   In the remainder of the paper we first describe the DORA-
                                                                     query reaches a border vertex inside a partition of one worker
based database infrastructure which we are targeting at (Sec-
                                                                     and the neighbor of this vertex resides in one partition of
tion 2) and detail on the graph partitioning problem as well
                                                                     another worker, a message with the current search status and
as state-of-the-art graph partitioning algorithms (Section 3).
                                                                     the target vertex will be sent to the second worker, asking
We continue with presenting our lightweight partitioning
                                                                     to resume the search from this point on. This process is
approach Hugs (Section 4) and the experimental evaluation
                                                                     illustrated in Figure 1.
(Section 5). Finally, we discuss related work (Section 6) and
                                                                        As the message passing layer is aware of the partition
conclude the paper (Section 7).
                                                                     assignments, it will directly send messages to the required
                                                                     workers [20]. Every message constitutes a remote memory
2.   PROBLEM STATEMENT                                               access, which extends to any NUMA system. Assuming a
   As described in the previous section, we want to optimize         balanced utilization of all sockets, a low number of mes-
BFS-based graph analytics for NUMA affected multiproces-             sages typically reduces the total runtime of a given workload.
sor systems. Since DORA is well tailored for the NUMA                Obviously, the partitioning of the graph has significant influ-
effect, it is predestined for exploiting locality in graph analy-    ence on how many message will have to be sent for a given
sis on a NUMA system.                                                traversal.
   For most systems, execution threads or transactions are the          When performing a BFS, the search starts from one spe-
central point of view, one does also say thread-to-transaction       cific vertex. Then, all its subsequent neighbors will be visited
assignment. Transactions need to be globally synchronized            following the edge direction. Undeniable, the BFS will eventu-
and locks and latches need to be implemented, in order to            ally reach a partitions border, which leads to communication
prevent the system from anomalies, such as concurrent writ-          overhead. There are two scenarios to consider. (1) Staying
ings to the same data record. In a DORA system, where                as long in the current partition as possible to avoid any
the thread-to-data assignment holds, no specific locks are           communication at all and (2) reaching partition boundaries
needed [23]. Inside a database system based on DORA, each            as fast as possible in order to exploit the system inherent
processing unit of the NUMA system represents a worker [20].         parallelism to the maximum. The effectiveness of either
During computations, the worker should never switch its affin-       strategy is dependent on the current system load. If other
ity to another physical processing unit on another socket,           workers are highly utilized, completely searching through
since this would introduce additional messaging overhead.            the current partition before notifying others may produce
Every worker is assigned with one or multiple data partitions        a better runtime behavior than sending messages to other
of the whole data set. This approach holds, regardless if the        workers as soon as a border vertex has been found. However,
system stores graph or relational data. One advantage of             if the system is in an idle state, the total opposite is needed.
this architecture lies in the implied parallelism of the system.     Residing in the current partition as long as possible would
When a transaction needs to access data from multiple parti-         completely underutilize the system ressources. Therefore,


                                                                69
                                   HUGS – A Lightweight Graph Partitioning Approach


                                                  Multilevel K-way Partitioning


                                   GO
                                                                                      GO


                                                                                               Uncoarsening Phase
            Coarsening Phase


                                        G1
                                                                                 G1


                                                                                                                         (a) Top 2 Hubs, 1 per parti-
                                                                                                                                                                    (b) Final Result
                                                                                                                         tion
                                             G2                            G2

                                                                                                                                  Figure 3: Partitions resulting from Hugs
                                                    G3
                                                                      G3

                                                                                                                                      require graphs to be undirected, since these approaches aim
                                                         G4                                                                           to reduce the edges cut between partitions.
                                                        Initial Partitioning Phase                                                       For BFS-based analytics, these partitionings will not be
Figure 1: The various phases of the multilevel k-way partitioning algorithm. During the coarsening phase, the size of the graph
                                                                                                                                      optimal. As undirected graphs can always be traversed in any
              Figure
is successively  decreased;2:    Multilevel
                             during                     k-Way
                                    the initial partitioning phase, apartitioning
                                                                        k-way partition of thescheme
                                                                                               smaller graphfrom    [16]and during thedirection along every edge, none of the vertices can become
                                                                                                             is computed;
uncoarsening phase, the partitioning is successively refined as it is projected to the larger graphs.                                 a dead end, as long as it is part of more than one edge. For
    In the rest of this section we briefly describe the various phases of the multilevel algorithm. The reader should referdirected graphs however, any vertex could represent a sink,
to [18] for further details.
            sending as much messages as possible to other workers can                                                                 i.e. when there are only ingoing edges from other vertices.
            only      increase        the     queryphase,   performance.                                                              This key characteristic has to be considered when partitioning
Coarsening       Phase       During the   coarsening             a sequence of smaller graphs G i = (Vi , E i ), is constructed from
the original graph G = (V , E ) such that |V | > |V |. Graph G                         is constructed from G by finding a maximal
                                                                                                                                      directed graphs, especially for BFS-based queries.
                               0    0   0            i        i+1          i+1             i

         3.                    CLASSICAL GRAPH PARTITIONING
matching Mi ⊆ E i of G i and collapsing together the vertices that are incident on each edge of the matching. In this
process no more than two vertices are collapsed together because a matching of a graph is a set of edges, no two of
                                                                                                                         4.   HUGS – HUB-CENTERED GRAPH PAR-
               In this
which are incident   on thework     we are
                            same vertex.         considering
                                           Vertices                  a graph
                                                    that are not incident          G=
                                                                           on any edge       (V,matching
                                                                                         of the   E), whereare simply copied
           v .∈ V is a vertex in G and (vi , vj ) ∈ E with E ⊆ (V × V )
over to G i+1                                                                                                                 TITIONING
   When vertices v, u ∈ Vi are collapsed to form vertex w ∈ Vi+1 , the weight of vertex w is set equal to the sum of
           is an edge from vi to vj . We are only considering lossless                                                         We claim, that partitioning a graph with a BFS-based
the weights of vertices v and u, and the edges incident on w is set equal to the union of the edges incident on v and
           partitionings          into    disjoint       partitions        P 1 , P 2 ,  ..., P
u minus the edge (v, u). For each pair of edges (x, v) and (x, u) (i.e., x is adjacent to both n  ⊆   V    with
                                                                                                 S v and u) a single edge    approach  will increase the performance of BFS-based queries.
           Pi ∩ P
(x, w) is created  whose    ∅ foris all
                      j =weight            i 6=tojtheand
                                     set equal              of jthe∈weights
                                                       sum i,         [1, n]ofas    well
                                                                                these        as Thus,
                                                                                       two edges.   i Pi during
                                                                                                           = V .successiveThe idea behind our novel graph partitioning method is, that
               The
coarsening levels,  the Graph       Partitioning
                        weight of both vertices and edgesProblem
                                                           increases. The (GPP)          is wellisstudied.
                                                                            process of coarsening     illustrated in Figure 2.many paths between two vertices will make use of high degree
Each vertexDividing         a graph
              and edge in Figure  2(a) hasG     into
                                            a unit       k Figure
                                                   weight.   partitions
                                                                    2(b) showsof the balanced
                                                                                     coarsened graphsize,       is from thevertices. Additionally, geometrical patterns like triangles or
                                                                                                        that results
contractionknown
            of shaded tovertices
                            be inNP-complete
                                   Figure 2(a). Numbers     on thePractically
                                                         [12].      vertices and edges   in Figure 2(b)
                                                                                      feasible           show their resultingchains are likely to pass through these nodes. Therefore,
                                                                                                    solutions
weights.
           can only solve the problem heuristically. Many approaches                                                         centering the partitioning around hubs of the graph should
   Maximal matchings can be computed in different ways [17, 18]. The method used to compute the matching greatly
           have      been     developed         for   tackling        the    GPP.        One     of   them      is
affects both the quality of the partition, and the time required during the uncoarsening phase. The matching scheme
                                                                                                                             create partitions, which provide sufficient locality for certain
that we usethe   yetheavy-edge
            is called   most successful
                                   matching (HEM), multilevel
                                                       and computespartitioning
                                                                       a matching Mi , such[5].
                                                                                              thatMultilevel
                                                                                                  the weight of the edges inqueries. The resulting partition will most likely represent the
         partitioning itself is no actual algorithm, rather a heuristical                                                structure which would be created when performing a BFS
         strategy. It consists of three stages
                                          3    in which multiple methods                                                 on the graph. Having the partition equally structured as the
         can be applied. The three stages are (1) coarsening the input                                                   search tree itself should lead to a significant synergy between
         graph, (2) finding a partitioning for the coarsened graph and                                                   both.
         (3) uncoarsening the partitioning back to the granularity of                                                       Hugs is a lightweight BFS-based graph partitioning ap-
         the input graph (cf. Figure 2). Basically, the idea is to do                                                    proach. As many graph systems almost always implement
         the actual partitioning only on graph sizes where complexity                                                    a version of BFS, this implementation can be reused with
         of the partitioning problem is bearable. The coarsening is                                                      slight modifications. In contrast to the multilevel partitioning
         done by combining multiple vertices into one abstract ver-                                                      methods, Hugs considers the whole graph without coarsening
         tex. Edges are coarsened likewise by maintaining the graphs                                                     it. Its heuristic is based on centering the partitions around
         topology as well as possible. Typically, the coarsening is                                                      so called hubs in the graph. A hub is a vertex with a high
         repeated until the graph is small enough (a few hundreds of                                                     degree, i.e. it has many incoming and outgoing edges. The
         vertices). On this very small graph even expensive partition-                                                   intuition behind this heuristic is that hubs, because of the
         ing algorithms can be applied without much runtime burden.                                                      many incoming and outgoing edges, are most likely part of
         Finally, the uncoarsening iteratively unpacks the abstract                                                      many paths in the graph – and particularily of paths in the
         vertex. After each uncoarsening a refinement procedure opti-                                                    neighborhood of hubs. Essentially, Hugs runs BFSs starting
         mizes the partitioning of the now finer grained graph. Since                                                    from these hubs to determine the partitions. The order in
         the refinement steps start from an already well partitioned                                                     which the vertices are relaxed during the BFS determines
         base, the added overhead remains small.                                                                         the partitioning time as well as the partitioning quality.
            There are two approaches for multilevel partitioning,                                                           Algorithm 1 shows Hugs’ partitioning procedure. A set
         namely k-way [16] and recursive bisection [14]. The main                                                        of hubs H is maintainted besides the base data. H de-
         difference for both methods resides in the second phase. Mul-                                                   scendingly lists the vertices h with a top-(k · n) degree
         tilevel k-way partitioning aims to partition the graph into                                                     deg(h), where k ∈ N+ ∧ k · n  |V |. We define the
         k partitions directly. While recursive bisection recursively                                                    degree deg(v) of a vertex v as deg(v) = |K(v)| where
         divides the graph and the resulting subgraphs into two par-                                                     K(v) = {w | (v, w) ∈ E ∨ (w, v) ∈ E} is the neighborhood
         titions, until the final number of partitions is reached. Both                                                  of v. Initially, Hugs adds k of the top-n hubs and their


                                                                                                                    70
         HUGS – A Lightweight Graph Partitioning Approach


Algorithm 1 HUGS                                                   5.   EVALUATION
 1: H ← A Set of Hubs                                                 To show the benefit of our novel and lightweight graph
 2: Pi ← A Partition                                               partitioning approach, we conducted a series of experiments
 3: N ← A list of partition candidates                             for testing a prototypical implementation of Hugs. As data
 4: for all Pi do                                                  we used two directed graphs taken from the online graph
 5:   Add k of hubs in H as root hubs to Pi                        data collection of SNAP [21]. The first graph is the Wikivote
 6:   Add the neighbors of the root hubs to Pi                     graph. It represents the voting behaviour of users, who had
 7:   while Pi not full do                                         to elect new moderators. It is a rather small graph with 7115
 8:     for all v ∈ Pi do                                          vertices and 103689 edges. The second graph is the Stanford
 9:        for all u ∈ K(v) ∧ u not visited do                     web graph, which represents the structure of the Stanford
10:          Set u visited                                         University website with 281903 vertices and 2312497 edges.
11:          degPi (u) = |K(u) ∩ Pi |                              Every edge represents a hyperlink between two webpages.
12:          N ← N ∪ {u}                                              As for our test setting, we partitioned every graph and
13:        end for                                                 measured different statistics. For the multilevel k-Way par-
14:      end for                                                   titioning and recursive bisection algorithms, we used the
                      degPi (u)
15:      Order N by deg(u)                                         METIS library [17]. Additionally we also used a random par-
16:      Add top-t of N to Pi and remove from N                    titioning as a baseline. For every partitioning appraoch we
17:      Set N \ Pi unvisited                                      measured the runtime of the partitioning algorithm as well
18:   end while                                                    as the runtime of the ten random reachability queries. All
19: end for                                                        tests have been performed using a workstation with an AMD
20: Add unreachable nodes to the smallest partition                Opteron 6274 CPU and 64 GB of main memory. Because
                                                                   of its significantly longer query runtime, the performance
                                                                   of the random partitioning was considerably worse than all
                                                                   other approaches. Therefore we omited its runtimes from the
direct neighbors to each partition (line 5–6). We refer to         corresponding figures. In order to maintain comparability
these k top-n hubs as the root hubs of the partition. In each      and to avoid hidden latencies, we performed the testing se-
BFS step, Hugs does not add all newly explored vertices to         quentially and measured the query time spent per partition.
the current partition. Instead, it determines for each newly       Since Hugs outperforms its competitors in a sequential envi-
explored vertex u the ratio of its neighbors in the current        ronment, it is obvious that a DORA system would also show
partition degPi (u) and all its neighbors deg(u) (line 8–15).      better performance values for Hugs compared to multilevel
The ratio is a heuristical local measure how close the vertex      k-Way and recursive bisection.
is to the current partition. Newly explored vertices with a           Figure 4 and 5 show the partitioning times of Hugs com-
top-t ratio are added to the partition (line 16).                  pared to multilevel k-Way and recursive bisection. For Wikiv-
   There are two parameters to steer the performance of            ote, Hugs outperforms both approaches due to its reduced
Hugs: the number of root hubs k and the maximum growth             effort in creating topologically balanced partitions. For Stan-
rate t. Starting with only one root hub creates a partition        ford, Hugs struggles with a smaller number of partitions.
which is coherent but Hugs needs more BFS steps to fill the        A full BFS with continuous ranking of all neighbors of the
partition. Multiple root hubs result in a partition consisting     partition leads to longer partitioning times. With a growing
of a scattered number of sub areas in the graph which usually      number of partitions, Hugs’ partitioning time drops a bit
results in faster partitioning time but yields less quality. A     while the k-Way partitioning and recursive bisection algo-
higher growth rate means that more of the explorer vertices        rithms take considerably longer. For DORA systems, which
are added to the partition, which drastically reduces the          aim at large multiple socket server machines, partitionings
partitioning time but yields less quality, since many low          into eight and more partitions are the relevant scenarios.
ranked vertices are added to the partition as well.                Here, Hugs consistently outperforms its competitors.
   Figure 3 illustrates the whole process for n = 2 partitions        For query runtime tests, we selected ten pairs of random
and serves as an example. First, the two vertices with the         vertices in the graph. These pairs were fixed for all parti-
highest degree in the graph are selected, as shown in Figure       tioning approaches as well as for every number of partitions.
3(a). Starting from the vertex with the higher degree, all         The reachability traversal is implemented as BFS as well.
neighbors will be scanned and added to its partition. If the       Starting from the source node of the reachability request, the
maximum number of vertices for this specific partition has         search procedure starts forward directed. As explained in
been reached, the BFS will terminate. Otherwise the search         Section 2, our data is stored on one specific worker. There-
continues with the neighborhood sets of the already selected       fore, we try to find the target vertex in the root partition.
vertices. The same procedure is analoguesly applied to the         When the BFS reaches an inter-partition edge, the worker
second hub.                                                        with the partition containing the targeted vertex will be
   Hugs provides two advantages. The first advantage applies       notified and immediately starts his own search procedure.
to the optimized partitioning. Creating synergies between          When multiple vertices have an inter-partition edge to the
the data partitioning and the search routine increases query       same target partition, these targeted vertices will be queued
performance. Second, through reusing existing implementa-          with the corresponding worker. If those vertices are already
tions of BFS, Hugs is lightweight and easy to implement.           visited from the current or already terminated search, no ad-
However, a drawback of our method can result from the              ditional search will be scheduled. The procedure terminates,
graphs topology. If the highest degree vertices are all located    when one of the workers finds a path from a border vertex
in each others neighborhood set, the algorithm may produce         to the target vertex. This experiments represent the first of
imbalanced results.                                                the two scenarios explained at the end of Section 2.


                                                              71
                                        HUGS – A Lightweight Graph Partitioning Approach


                                            HUGS                    k-Way               Rec. Bisection                                           HUGS                    k-Way               Rec. Bisection
                                 250                                                                                                   140


                                                                                                                                       120
Partition Time in milliseconds


                                 200


                                                                                                              Search Time in seconds
                                                                                                                                       100

                                 150
                                                                                                                                        80


                                                                                                                                        60
                                 100

                                                                                                                                        40
                                  50
                                                                                                                                        20


                                   0                                                                                                     0
                                                2       4            8             16         32                                                     2       4            8             16         32
                                                            Number of partitions                                                                                 Number of partitions


                                 Figure 4: Partitioning times for the Wikivote Graph                                           Figure 6: Reachability runtimes for the Wikivote Graph

                                            HUGS                     k-Way              Rec. Bisection                                           HUGS                     k-Way              Rec. Bisection
                                 2000                                                                                                  1600

                                 1800                                                                                                  1400
Partition Time in milliseconds


                                 1600


                                                                                                              Search Time in seconds
                                                                                                                                       1200
                                 1400
                                                                                                                                       1000
                                 1200
                                                                                                                                        800
                                 1000
                                                                                                                                        600
                                  800
                                                                                                                                        400
                                  600

                                  400                                                                                                   200

                                  200                                                                                                        0
                                            2       4        8       16       32         64        128                                           2       4        8       16       32         64        128
                                                            Number of partitions                                                                                 Number of partitions


                                 Figure 5: Partitioning times for the Stanford Graph                                              Figure 7: Reachability runtimes for the Stanford Graph


  The runtimes of the reachability queries are displayed in                                                   graph on a topological level. We exploit the fact, that high
Figures 6 and 7. The small Wikivote graph shows, that a                                                       degree vertices are most likely to be visited, when searching
search based on Hugs provides consistently faster answers.                                                    connections between two vertices.
The topological partitionings of k-Way partitioning and re-                                                      A different field of applications is to model problems as
cursive bisection tend to introduce more overhead for a BFS-                                                  graphs and apply partitioning algorithms on it to simplify
based search. Since the partitioning and search scheme are                                                    computations on this data. Fern and Brodley use multilevel
both BFS-based, they can synergize and yield good perfor-                                                     graph partitioning to solve the cluster ensemble problem [8].
mance values, compared to the two other approaches.                                                           They map the datapoints to vertices and partition them
                                                                                                              according to their similarity, which is used as an edge weight.
6.                                RELATED WORK                                                                   Gilbert and Zmijewski show, that using the Kernighan
                                                                                                              Lin Algorithm [22] can improve the performance of mes-
   Many works aim at improving graph partitioning algo-
                                                                                                              sage passing multiprocessor systems, like the hypercube [9].
rithms themselves. Karypis and Kumar [15] are improving
                                                                                                              Similar to our work, the ultimate goal is to reduce com-
the multilevel partitioning approach by parallelizing multiple
                                                                                                              munication overhead between the workers. As our method
steps. At first, the bisection of the subgraphs is performed by
                                                                                                              is based on BFS only, it will most likely not produce opti-
two processors. The resulting partitions are to be bisected as
                                                                                                              mal partitionings with a minimal edge cut. Yet we produce
well. This workload is also split up for multiple processors.
                                                                                                              highly localized partitions, which reduce the communication
The authors refer to these techniques as parallelism in the re-
                                                                                                              overhead of BFS-based reachability queries.
cursive and in the bisection step. Hugs is designed as a single
                                                                                                                 Schloegel, Karypis and Kumar [25] as well as [29] use graph
threaded BFS-like partitioning. Slota, Rajamanickam and
                                                                                                              partitioning for exploiting parallelism on meshes, which are
Madduri show, that the BFS itself can be parallelized [27].
                                                                                                              often the basis of scientific calculations. Meshes can consist
   Ding et al. are discussing a min-max cut problem for
                                                                                                              of massive amount of data and therefore don’t fit into a
data clustering [6]. The authors claim that they achieve a
                                                                                                              single machines main memory. Since scientific calculations
balanced partitioning with maximizing the similarity between
                                                                                                              often refer to incremental updates of neighboring nodes, we
vertices in the same partition. In contrast the similarity or
                                                                                                              assume that Hugs could be applied to this field as well.
association to vertices of other partitions is to be minimized.
Other approaches like normalized cut or ratio cut are said
to be less efficient. Generally speaking, this approach seeks                                                 7.                        CONCLUSIONS AND FUTURE WORK
to cluster identities, with high similarity and thus partitions                                                  We presented Hugs, a lightweight, BFS-based graph parti-
the graph on a logical level. In contrast, Hugs partitions the                                                tioning algorithm. Hugs is meant for partitioning graph data


                                                                                                         72
         HUGS – A Lightweight Graph Partitioning Approach


for its management in DORA systems. We showed that Hugs             [10] N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril,
yields better performance for BFS-based reachability queries             A. Ailamaki, and B. Falsafi. Database servers on chip
than its state-of-the-art graph partitioning competitors. At             multiprocessors: Limitations and opportunities. In
the same time Hugs also determines the partitioning quicker              CIDR 2007, pages 79–87, 2007.
than these standard approaches particularly for relevant sce-       [11] T. Hey, S. Tansley, and K. M. Tolle. The Fourth
narios of 8 and more partitions. Implementing Hugs in a                  Paradigm: Data-Intensive Scientific Discovery. 2009.
graph data management system is simple since its can build          [12] L. Hyafil and R. L. Rivest. Graph partitioning and
on existing BFS code, which is one of the most fundamental               constructing optimal decision trees are polynomial
graph routines and can be assumed to be implemented by                   complete problems. 1973.
every typical graph data management system in a highly              [13] R. Jin, N. Ruan, S. Dey, and J. X. Yu. SCARAB:
optimized fashion.                                                       scaling reachability computation on large graphs. In
    Our investigations reported here shows Hugs to be a                  SIGMOD 2012, pages 169–180, 2012.
promising approach for DORA systems. As our prototype               [14] G. Karypis and V. Kumar. Multilevel graph
mostly favors the first partition, we are investigating other            partitioning schemes. In ICPP 1995, pages 113–122,
filling techniques, e.g. an alternating calculation of each par-         1995.
tition, whereby the corresponding runtime penalties are yet
                                                                    [15] G. Karypis and V. Kumar. Parallel multilevel graph
to be resolved. We are further investigating other heuristics
                                                                         partitioning. In IPPS 1996, pages 314–319, 1996.
to determine for the partition growth phase. Since Hugs
exploits two parameters for adjustments, it is customizable.        [16] G. Karypis and V. Kumar. A coarse-grain parallel
However, the algorithms performance is dependent of the                  formulation of multilevel k-way graph partitioning
                                                                         algorithm. In PPSC 1997, 1997.
underlying graphs structure. Therefore, we need to find
good heuristics to automatically determine upfront, which           [17] G. Karypis and V. Kumar. A fast and high quality
configuration works best for a given graph. Further down                 multilevel scheme for partitioning irregular graphs.
the road, we will look into an incremental version of Hugs               SIAM J. Scientific Computing, 20(1):359–392, 1998.
for maintaining the partitioning upon updates to the graph.         [18] T. Kiefer, B. Schlegel, and W. Lehner. Experimental
                                                                         evaluation of NUMA effects on database management
8.   ACKNOWLEDGMENTS                                                     systems. In BTW 2013, pages 185–204, 2013.
                                                                    [19] S. Kintali. Betweenness centrality : Algorithms and
  This work is partly funded by the German Research Foun-                lower bounds. CoRR, abs/0809.1906, 2008.
dation (DFG) within the Cluster of Excellence ”Center for
                                                                    [20] T. Kissinger, T. Kiefer, B. Schlegel, D. Habich,
Advancing Electronics” (cfaed) in the Collaborative Research
                                                                         D. Molka, and W. Lehner. ERIS: A numa-aware
Center 912 Highly Adaptive Energy-Efficient Computing
                                                                         in-memory storage engine for analytical workload. In
(HAEC).
                                                                         ADMS 2014, pages 74–85, 2014.
                                                                    [21] J. Leskovec. Snap – stanford network analysis platform.
9.   REFERENCES                                                          http://snap.stanford.edu/snap/ [Online, last acessed
 [1] A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A.                    2016-02-30].
     Wood. Dbmss on a modern processor: Where does time             [22] S. Lin and B. W. Kernighan. An effective heuristic
     go? In VLDB 1999, pages 266–277, 1999.                              algorithm for the traveling-salesman problem.
 [2] R. Angles. A comparison of current graph database                   Operations research, pages 498–516, 1973.
     models. In ICDE 2012, pages 171–177, 2012.                     [23] I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki.
 [3] R. Angles and C. Gutiérrez. Survey of graph database               Data-oriented transaction execution. PVLDB,
     models. ACM Comput. Surv., 40(1), 2008.                             3(1):928–939, 2010.
 [4] A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan,               [24] C. Rother, V. Kolmogorov, and A. Blake. ”grabcut”:
     S. Rajagopalan, R. Stata, A. Tomkins, and J. L.                     interactive foreground extraction using iterated graph
     Wiener. Graph structure in the web. Computer                        cuts. ACM Trans. Graph., 23(3):309–314, 2004.
     Networks, 33(1-6):309–320, 2000.                               [25] K. Schloegel, G. Karypis, and V. Kumar. Graph
 [5] A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and                 partitioning for high performance scientific simulations.
     C. Schulz. Recent advances in graph partitioning.                   2000.
     CoRR, abs/1311.3144, 2013.                                     [26] J. Seo, J. Park, J. Shin, and M. S. Lam. Distributed
 [6] C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon.               socialite: A datalog-based language for large-scale
     A min-max cut algorithm for graph partitioning and                  graph analysis. PVLDB, 6(14):1906–1917, 2013.
     data clustering. In IEEE 2001, pages 107–114, 2001.            [27] G. M. Slota, S. Rajamanickam, and K. Madduri. BFS
 [7] F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg,           and coloring-based parallel algorithms for strongly
     and W. Lehner. SAP HANA database: data                              connected components and related problems. In IEEE
     management for modern business applications.                        2014, pages 550–559, 2014.
     SIGMOD Record, 40(4):45–51, 2011.
                                                                    [28] R. W. Taylor and R. L. Frank. CODASYL data-base
 [8] X. Z. Fern and C. E. Brodley. Solving cluster ensemble              management systems. ACM Comput. Surv.,
     problems by bipartite graph partitioning. In (ICML                  8(1):67–103, 1976.
     2004), 2004.                                                   [29] C. Walshaw, M. Cross, and M. G. Everett. Parallel
 [9] J. R. Gilbert and E. Zmijewski. A parallel graph                    dynamic graph partitioning for adaptive unstructured
     partitioning algorithm for a message-passing                        meshes. J. Parallel Distrib. Comput., 47(2):102–108,
     multiprocessor. International Journal of Parallel                   1997.
     Programming, 16(6):427–449, 1987.


                                                               73