<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HUGS - A Lightweight Graph Partitioning Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Krause</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hannes Voigt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang Lehner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technische Universität Dresden Database Systems Group Dresden</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>68</fpage>
      <lpage>73</lpage>
      <abstract>
        <p>The growing interest in graph data lead to increasingly more research in the field of graph data management and graph analytics. Nowadays, even large graphs of upto a size of billions of vertices and edges fit into main memory of big modern multisocket machines, making them a first-grade platform for graph management and graph analytics. High performance data management solutions have to be aware of the NUMA properties of such big machines. A dataoriented architecture (DORA) is a particular solution to that. However, it requires partitioning the data in a way such that inter-partition communication can be avoided. Graph partitioning is a long studied problem and stateof-the-art solutions, such as multilevel k-way partitioning and recursive bisection achieve good results in feasible time. Integrating such solution is a rather difficult task, though. In this paper, we present a more lightweight approach called Hugs. The key idea of Hugs is to reuse the BFS routine present in a graph data management system anyway, since it is the foundation of many analytical graph algorithms. Hugs is not meant to produce a good general-purpose graph partitioning but good runtimes of BFS graph traversals such as reachability queries on DORA systems. In our experiments Hugs showed capable of finding good graph partitionings faster than state-of-the-art approaches. The partitioning found by Hugs also showed shorter runtimes for reachability queries.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The last decade has seen resurgence of interest in graph
data management [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. With the network data model in the
1970s [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] and object-oriented database systems in the early
1990s graph-based data models and graph query languages
got considerable attention in research already [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However,
the traction of today’s graph data management efforts is
unequally higher with many major IT companies and DBMS
vendors on the band wagon [
        <xref ref-type="bibr" rid="ref24 ref7">7, 24</xref>
        ]. Among others, one major
driver behind the graph concept’s revival is a shift in the
interest of analytics from merely reporting towards
dataintensive science and discovery [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Graph data can easily
range in the size of billions of vertices and edges.
Prominent examples of large graphs are the Facebook friendship
graph, the Twitter follower graph, citation networks, web
link networks, road networks, supply chains, etc. (cf. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]).
The analysis even of the biggest graphs is often done with
recursive algorithms [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. The Breadth First Search (BFS)
is one of the most fundamental building blocks for many
popular graph analysis algorithms, such as algorithms to
determine reachability, connected components, and
betweenness centrality [
        <xref ref-type="bibr" rid="ref13 ref19 ref4">4, 13, 19</xref>
        ].
      </p>
      <p>
        The approach on how to work with a given graph is always
determined by its size. The storage and mining of such
potentially huge data sets requires computational resources
ranging from small workstations up to whole compute clusters.
With today’s continuously decreasing main memory cost and
increasingly stronger hardware, even standalone server
workstations with multiple CPUs and terabyte of main memory
can process graphs of the billion scale. Such multiprocessor
machines consist of multiple sockets, whereby every socket
holds multiple CPUs and a share of the whole systems main
memory. In these large shared memory machines special
attention has to be payed to the organization of memory
allocation and access [
        <xref ref-type="bibr" rid="ref1 ref10 ref18">1, 10, 18</xref>
        ]. Each socket manages its share
of physical memory and provides the other sockets access.
Although each core has access to each byte of memory, it is
a Non Uniform Memory Access (NUMA). Bandwidth and
latency is lower for remote memory access to other sockets
than the access to a socket’s own memory. Data
management systems oblivious of these NUMA effects take huge
performance penalties on a large scale for remote memory
accesses [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The Data-Oriented Architecture (DORA) [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
is an architecture for so called NUMA-aware DBMS. Such
DBMS try to co-locate data and process as much as possible
locally to avoid expensive remote memory access [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. In
particular, partitions of the data, are bound to an individual
socket, and processed only by threads running on this socket
exclusively. Thereby remote memory access is limited to
inter-partition communication of operators such as a
traversal over a graph. Obviously, the partitioning has a huge
impact on how much inter-partition communication takes
place during processing of a specific operator. In terms of
graph processing on multiprocessor systems, the partitioning
of the graph is inevitable.
Partitioning graphs is a well studied field [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The typical
problem definition asks for the partitioning of a graph into
k partitions so that the number of edges cut at partition
borders is minimal. In this absolute form the problem is
NP-hard. Over the years many heuristic algorithms have
been developed that try to minimize edge cuts. Prominent
examples are recursive bisection [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and multilevel k-way
partitioning [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. By exploiting the graphs topology, these
general purpose algorithms find good and useful partitionings
in many real world scenarios in feasible time. However, the
resulting partitions are not optimized for BFS-based analyses
on a multiprocessor system.
      </p>
      <p>In this paper we propose a novel lightweight graph
partitioning approach called Hugs. Hugs determines graph
partitioning with the help of BFS traversals starting from
high-degree vertices. As we reuse the BFS traversal
operations, which are available in graph data management system
anyway, Hugs only needs minor additions to work with
common graph systems. In contrast to the general purpose
partitions of current graph partitioning approaches, Hugs
exclusively aims at a partitioning scheme good for the
speedup of BFS-based reachability queries in a NUMA setting.
Hugs offers two advantages. First, it provides a suitable
partitioning for BFS-based analyses and second, its lightweight
design allows an easy integration into standard graph systems.
In our experimental evaluations Hugs exhibits partitioning
speed and partitioning quality for reachability queries
comparable to the state-of-the-art algorithms. Hence, Hugs offers
a significantly easier to implement, lightweight alternative
to state-of-the-art graph partitioning algorithms for graph
data management system.</p>
      <p>In the remainder of the paper we first describe the
DORAbased database infrastructure which we are targeting at
(Section 2) and detail on the graph partitioning problem as well
as state-of-the-art graph partitioning algorithms (Section 3).
We continue with presenting our lightweight partitioning
approach Hugs (Section 4) and the experimental evaluation
(Section 5). Finally, we discuss related work (Section 6) and
conclude the paper (Section 7).
2.</p>
    </sec>
    <sec id="sec-2">
      <title>PROBLEM STATEMENT</title>
      <p>As described in the previous section, we want to optimize
BFS-based graph analytics for NUMA affected
multiprocessor systems. Since DORA is well tailored for the NUMA
effect, it is predestined for exploiting locality in graph
analysis on a NUMA system.</p>
      <p>
        For most systems, execution threads or transactions are the
central point of view, one does also say thread-to-transaction
assignment. Transactions need to be globally synchronized
and locks and latches need to be implemented, in order to
prevent the system from anomalies, such as concurrent
writings to the same data record. In a DORA system, where
the thread-to-data assignment holds, no specific locks are
needed [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Inside a database system based on DORA, each
processing unit of the NUMA system represents a worker [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>During computations, the worker should never switch its
affinity to another physical processing unit on another socket,
since this would introduce additional messaging overhead.</p>
      <p>Every worker is assigned with one or multiple data partitions
of the whole data set. This approach holds, regardless if the
system stores graph or relational data. One advantage of
this architecture lies in the implied parallelism of the system.</p>
      <p>When a transaction needs to access data from multiple
parti</p>
      <p>Message Passing Interface
A</p>
      <p>B
tions, e.g. for a join, every worker can independently prepare
sub-results for its available data partitions. Once these local
computations have been concluded, requests to other units
can be made for fetching possible join partners. As every
worker is solely working on its own partitions, no further
protection for the data partitions needs to be considered.</p>
      <p>Cross-partition operations require communication between
the actual workers. This effect occurs, if e.g. a traversal
operation accesses a part of the graph, which is stored in
another partition than it is currently running in. For this
purpose, the system needs to provide an asynchronous,
highthroughput message passing layer that allows to hand over
a task to another worker. For instance, when a traversal
query reaches a border vertex inside a partition of one worker
and the neighbor of this vertex resides in one partition of
another worker, a message with the current search status and
the target vertex will be sent to the second worker, asking
to resume the search from this point on. This process is
illustrated in Figure 1.</p>
      <p>
        As the message passing layer is aware of the partition
assignments, it will directly send messages to the required
workers [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Every message constitutes a remote memory
access, which extends to any NUMA system. Assuming a
balanced utilization of all sockets, a low number of
messages typically reduces the total runtime of a given workload.
      </p>
      <p>Obviously, the partitioning of the graph has significant
influence on how many message will have to be sent for a given
traversal.</p>
      <p>When performing a BFS, the search starts from one
specific vertex. Then, all its subsequent neighbors will be visited
following the edge direction. Undeniable, the BFS will
eventually reach a partitions border, which leads to communication
overhead. There are two scenarios to consider. (1) Staying
as long in the current partition as possible to avoid any
communication at all and (2) reaching partition boundaries
as fast as possible in order to exploit the system inherent
parallelism to the maximum. The effectiveness of either
strategy is dependent on the current system load. If other
workers are highly utilized, completely searching through
the current partition before notifying others may produce
a better runtime behavior than sending messages to other
workers as soon as a border vertex has been found. However,
if the system is in an idle state, the total opposite is needed.</p>
      <p>Residing in the current partition as long as possible would
completely underutilize the system ressources. Therefore,
e
s
a
h
P
g
n
i
n
e
s
r
a
o
C</p>
      <p>GO</p>
      <p>
        In the rest of this section we briefly describe the various phases of the multilevel algorithm. The reader should referdirected graphs however, any vertex could represent a sink,
to [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] forsfuernthderidnetgailsa.s much messages as possible to other workers can i.e. when there are only ingoing edges from other vertices.
This key characteristic has to be considered when partitioning
CoarseniongnPlyhaisnecrDeaursineg tthheceoaqrsueneinrgyphpasee,rafsoerqumenacenocf esm. aller graphs Gi D .Vi; Ei/, is constructed fromdirected graphs, especially for BFS-based queries.
the original graph G0 D .V0; E0/ such that jVij &gt; jViC1j. Graph GiC1 is constructed from Gi by finding a maximal
matching M3i. EiCof LGi AandScSollaIpCsingAtoLgethGerthRevAertiPces Hthat ParAeinRcidTentIoTneaIcOhedNgeIof NtheGmatching. In this
process no more than two vertices are collapsed together because a matching of a graph is a set of edges, no two of4. HUGS – HUB-CENTERED GRAPH
PARwhich are inciIdnenttohnitshewsaomrekvewrteex. aVrereticcesotnhastiadreenroitnignciadengt roanpanhy eGdge=of t(hVem,Eatc)h,ingwahreesrimeply copied
over to GiCv1.∈ V is a vertex in G and (vi, vj ) ∈ E with E ⊆ (V × V ) TITIONING
      </p>
      <p>When viesrticaensv;eud2gVei afrreocmollapvseid ttooforvmj v.ertWex ew 2arVeiC1o,tnhelyweicgohtnosfivdereterxiwngis sleotesqsulaelstosthe sum of We claim, that partitioning a graph with a BFS-based
tuhemwineuisghthtpseoaefdrvgteeirtt.ivci;oesun/v.ianFnogdrsue,aicanhndtpoathiredoefidsgejdeosgeiinsnc.itxd;epvnt/aoarnntdwi.txiis;osune/t s(eiq.ePu.,a1lxt,oisPtha2ed,jua.nc.ei.on,nt Ptoofnbthoe⊆thedvVgaensdiwnuc)iitdaehnsitnognleveadngdeapproach will increase the performance of BFS-based queries.</p>
      <p>= n] as S
.x; w/ is cPreiate∩d Pwhjose w∅eigfhotris aseltleqiu6a=l tojthaensudmio,fjthe∈w[e1ig,hts of theswetewlol eadsges. TihPusi, d=urinVg .successiveThe idea behind our novel graph partitioning method is, that
coarsening levTelsh,tehe GwerigahtpohfboPthavrerttiicteisoanndiendgges Pinrcroeabselse. mThe(pGrocPessPo)fcoisarswenienlglissiltluustdraiteeddin. Figure 2m.any paths between two vertices will make use of high degree
Each verteDxainvdieddigneign Faigugrer2a(ap) hhas Gauniintwteoighkt. Fpigaurret2it(bi)osnhoswsothfe bcoaarlsaennedcegrdaphstihzaetr,esuislts from thevertices. Additionally, geometrical patterns like triangles or
contractionkonfoshwadnedtvoertibcees iNnFPig-ucreo2m(a)p.lNeutmebe[r1s2on].thePvrearticctesicanadlleydgefseiansFiibgulree 2s(ob)lushtoiwotnhesir resultingchains are likely to pass through these nodes. Therefore,
weights. can only solve the problem heuristically. Many approaches centering the partitioning around hubs of the graph should</p>
      <p>
        Maximal matchings can be computed in different ways [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ]. The method used to compute the matching greatly
affects both the quality of the partition, and the time required during the uncoarsening phase. The matching schemecreate partitions, which provide sufficient locality for certain
have been developed for tackling the GPP. One of them is
that we usethisecalyleedthemavyo-sedtgesmuactccheisnsgf(uHlEMm),uanltdicloemvpeultespaamratticthiinognMi ni,gsuc[h5t]h.at Mtheuwletiiglhet vofetlhe edges inqueries. The resulting partition will most likely represent the
partitioning itself is no actual algorithm, rather a heuristical structure which would be created when performing a BFS
strategy. It consists of three sta3ges in which multiple methods on the graph. Having the partition equally structured as the
can be applied. The three stages are (1) coarsening the input search tree itself should lead to a significant synergy between
graph, (2) finding a partitioning for the coarsened graph and both.
(3) uncoarsening the partitioning back to the granularity of Hugs is a lightweight BFS-based graph partitioning
apthe input graph (cf. Figure 2). Basically, the idea is to do proach. As many graph systems almost always implement
the actual partitioning only on graph sizes where complexity a version of BFS, this implementation can be reused with
of the partitioning problem is bearable. The coarsening is slight modifications. In contrast to the multilevel partitioning
done by combining multiple vertices into one abstract ver- methods, Hugs considers the whole graph without coarsening
tex. Edges are coarsened likewise by maintaining the graphs it. Its heuristic is based on centering the partitions around
topology as well as possible. Typically, the coarsening is so called hubs in the graph. A hub is a vertex with a high
repeated until the graph is small enough (a few hundreds of degree, i.e. it has many incoming and outgoing edges. The
vertices). On this very small graph even expensive partition- intuition behind this heuristic is that hubs, because of the
ing algorithms can be applied without much runtime burden. many incoming and outgoing edges, are most likely part of
Finally, the uncoarsening iteratively unpacks the abstract many paths in the graph – and particularily of paths in the
vertex. After each uncoarsening a refinement procedure opti- neighborhood of hubs. Essentially, Hugs runs BFSs starting
mizes the partitioning of the now finer grained graph. Since from these hubs to determine the partitions. The order in
the refinement steps start from an already well partitioned which the vertices are relaxed during the BFS determines
base, the added overhead remains small. the partitioning time as well as the partitioning quality.
      </p>
      <p>
        There are two approaches for multilevel partitioning, Algorithm 1 shows Hugs’ partitioning procedure. A set
namely k-way [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and recursive bisection [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The main of hubs H is maintainted besides the base data. H
dedifference for both methods resides in the second phase. Mul- scendingly lists the vertices h with a top-(k · n) degree
tilevel k-way partitioning aims to partition the graph into deg(h), where k ∈ N+ ∧ k · n |V |. We define the
k partitions directly. While recursive bisection recursively degree deg(v) of a vertex v as deg(v) = |K(v)| where
divides the graph and the resulting subgraphs into two par- K(v) = {w | (v, w) ∈ E ∨ (w, v) ∈ E} is the neighborhood
titions, until the final number of partitions is reached. Both of v. Initially, Hugs adds k of the top-n hubs and their
U
n
c
o
a
r
s
e
n
i
n
g
P
h
a
s
e
Algorithm 1 HUGS
direct neighbors to each partition (line 5–6). We refer to
these k top-n hubs as the root hubs of the partition. In each
BFS step, Hugs does not add all newly explored vertices to
the current partition. Instead, it determines for each newly
explored vertex u the ratio of its neighbors in the current
partition degPi (u) and all its neighbors deg(u) (line 8–15).
      </p>
      <p>The ratio is a heuristical local measure how close the vertex
is to the current partition. Newly explored vertices with a
top-t ratio are added to the partition (line 16).</p>
      <p>There are two parameters to steer the performance of
Hugs: the number of root hubs k and the maximum growth
rate t. Starting with only one root hub creates a partition
which is coherent but Hugs needs more BFS steps to fill the
partition. Multiple root hubs result in a partition consisting
of a scattered number of sub areas in the graph which usually
results in faster partitioning time but yields less quality. A
higher growth rate means that more of the explorer vertices
are added to the partition, which drastically reduces the
partitioning time but yields less quality, since many low
ranked vertices are added to the partition as well.</p>
      <p>Figure 3 illustrates the whole process for n = 2 partitions
and serves as an example. First, the two vertices with the
highest degree in the graph are selected, as shown in Figure
3(a). Starting from the vertex with the higher degree, all
neighbors will be scanned and added to its partition. If the
maximum number of vertices for this specific partition has
been reached, the BFS will terminate. Otherwise the search
continues with the neighborhood sets of the already selected
vertices. The same procedure is analoguesly applied to the
second hub.</p>
      <p>Hugs provides two advantages. The first advantage applies
to the optimized partitioning. Creating synergies between
the data partitioning and the search routine increases query
performance. Second, through reusing existing
implementations of BFS, Hugs is lightweight and easy to implement.</p>
      <p>However, a drawback of our method can result from the
graphs topology. If the highest degree vertices are all located
in each others neighborhood set, the algorithm may produce
imbalanced results.
5.</p>
    </sec>
    <sec id="sec-3">
      <title>EVALUATION</title>
      <p>
        To show the benefit of our novel and lightweight graph
partitioning approach, we conducted a series of experiments
for testing a prototypical implementation of Hugs. As data
we used two directed graphs taken from the online graph
data collection of SNAP [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. The first graph is the Wikivote
graph. It represents the voting behaviour of users, who had
to elect new moderators. It is a rather small graph with 7115
vertices and 103689 edges. The second graph is the Stanford
web graph, which represents the structure of the Stanford
University website with 281903 vertices and 2312497 edges.
Every edge represents a hyperlink between two webpages.
      </p>
      <p>
        As for our test setting, we partitioned every graph and
measured different statistics. For the multilevel k-Way
partitioning and recursive bisection algorithms, we used the
METIS library [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Additionally we also used a random
partitioning as a baseline. For every partitioning appraoch we
measured the runtime of the partitioning algorithm as well
as the runtime of the ten random reachability queries. All
tests have been performed using a workstation with an AMD
Opteron 6274 CPU and 64 GB of main memory. Because
of its significantly longer query runtime, the performance
of the random partitioning was considerably worse than all
other approaches. Therefore we omited its runtimes from the
corresponding figures. In order to maintain comparability
and to avoid hidden latencies, we performed the testing
sequentially and measured the query time spent per partition.
Since Hugs outperforms its competitors in a sequential
environment, it is obvious that a DORA system would also show
better performance values for Hugs compared to multilevel
k-Way and recursive bisection.
      </p>
      <p>Figure 4 and 5 show the partitioning times of Hugs
compared to multilevel k-Way and recursive bisection. For
Wikivote, Hugs outperforms both approaches due to its reduced
effort in creating topologically balanced partitions. For
Stanford, Hugs struggles with a smaller number of partitions.
A full BFS with continuous ranking of all neighbors of the
partition leads to longer partitioning times. With a growing
number of partitions, Hugs’ partitioning time drops a bit
while the k-Way partitioning and recursive bisection
algorithms take considerably longer. For DORA systems, which
aim at large multiple socket server machines, partitionings
into eight and more partitions are the relevant scenarios.
Here, Hugs consistently outperforms its competitors.</p>
      <p>For query runtime tests, we selected ten pairs of random
vertices in the graph. These pairs were fixed for all
partitioning approaches as well as for every number of partitions.
The reachability traversal is implemented as BFS as well.
Starting from the source node of the reachability request, the
search procedure starts forward directed. As explained in
Section 2, our data is stored on one specific worker.
Therefore, we try to find the target vertex in the root partition.
When the BFS reaches an inter-partition edge, the worker
with the partition containing the targeted vertex will be
notified and immediately starts his own search procedure.
When multiple vertices have an inter-partition edge to the
same target partition, these targeted vertices will be queued
with the corresponding worker. If those vertices are already
visited from the current or already terminated search, no
additional search will be scheduled. The procedure terminates,
when one of the workers finds a path from a border vertex
to the target vertex. This experiments represent the first of
the two scenarios explained at the end of Section 2.
140
120
The runtimes of the reachability queries are displayed in
Figures 6 and 7. The small Wikivote graph shows, that a
search based on Hugs provides consistently faster answers.
The topological partitionings of k-Way partitioning and
recursive bisection tend to introduce more overhead for a
BFSbased search. Since the partitioning and search scheme are
both BFS-based, they can synergize and yield good
performance values, compared to the two other approaches.</p>
    </sec>
    <sec id="sec-4">
      <title>RELATED WORK</title>
      <p>
        Many works aim at improving graph partitioning
algorithms themselves. Karypis and Kumar [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] are improving
the multilevel partitioning approach by parallelizing multiple
steps. At first, the bisection of the subgraphs is performed by
two processors. The resulting partitions are to be bisected as
well. This workload is also split up for multiple processors.
The authors refer to these techniques as parallelism in the
recursive and in the bisection step. Hugs is designed as a single
threaded BFS-like partitioning. Slota, Rajamanickam and
Madduri show, that the BFS itself can be parallelized [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
      </p>
      <p>
        Ding et al. are discussing a min-max cut problem for
data clustering [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The authors claim that they achieve a
balanced partitioning with maximizing the similarity between
vertices in the same partition. In contrast the similarity or
association to vertices of other partitions is to be minimized.
Other approaches like normalized cut or ratio cut are said
to be less efficient. Generally speaking, this approach seeks
to cluster identities, with high similarity and thus partitions
the graph on a logical level. In contrast, Hugs partitions the
graph on a topological level. We exploit the fact, that high
degree vertices are most likely to be visited, when searching
connections between two vertices.
      </p>
      <p>
        A different field of applications is to model problems as
graphs and apply partitioning algorithms on it to simplify
computations on this data. Fern and Brodley use multilevel
graph partitioning to solve the cluster ensemble problem [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
They map the datapoints to vertices and partition them
according to their similarity, which is used as an edge weight.
      </p>
      <p>
        Gilbert and Zmijewski show, that using the Kernighan
Lin Algorithm [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] can improve the performance of
message passing multiprocessor systems, like the hypercube [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Similar to our work, the ultimate goal is to reduce
communication overhead between the workers. As our method
is based on BFS only, it will most likely not produce
optimal partitionings with a minimal edge cut. Yet we produce
highly localized partitions, which reduce the communication
overhead of BFS-based reachability queries.
      </p>
      <p>
        Schloegel, Karypis and Kumar [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] as well as [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] use graph
partitioning for exploiting parallelism on meshes, which are
often the basis of scientific calculations. Meshes can consist
of massive amount of data and therefore don’t fit into a
single machines main memory. Since scientific calculations
often refer to incremental updates of neighboring nodes, we
assume that Hugs could be applied to this field as well.
      </p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>We presented Hugs, a lightweight, BFS-based graph
partitioning algorithm. Hugs is meant for partitioning graph data
for its management in DORA systems. We showed that Hugs
yields better performance for BFS-based reachability queries
than its state-of-the-art graph partitioning competitors. At
the same time Hugs also determines the partitioning quicker
than these standard approaches particularly for relevant
scenarios of 8 and more partitions. Implementing Hugs in a
graph data management system is simple since its can build
on existing BFS code, which is one of the most fundamental
graph routines and can be assumed to be implemented by
every typical graph data management system in a highly
optimized fashion.</p>
      <p>Our investigations reported here shows Hugs to be a
promising approach for DORA systems. As our prototype
mostly favors the first partition, we are investigating other
filling techniques, e.g. an alternating calculation of each
partition, whereby the corresponding runtime penalties are yet
to be resolved. We are further investigating other heuristics
to determine for the partition growth phase. Since Hugs
exploits two parameters for adjustments, it is customizable.
However, the algorithms performance is dependent of the
underlying graphs structure. Therefore, we need to find
good heuristics to automatically determine upfront, which
configuration works best for a given graph. Further down
the road, we will look into an incremental version of Hugs
for maintaining the partitioning upon updates to the graph.
8.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work is partly funded by the German Research
Foundation (DFG) within the Cluster of Excellence ”Center for
Advancing Electronics” (cfaed) in the Collaborative Research
Center 912 Highly Adaptive Energy-Efficient Computing
(HAEC).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ailamaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>DeWitt</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. D. Hill</surname>
            , and
            <given-names>D. A.</given-names>
          </string-name>
          <string-name>
            <surname>Wood</surname>
          </string-name>
          .
          <article-title>Dbmss on a modern processor: Where does time go?</article-title>
          <source>In VLDB 1999</source>
          , pages
          <fpage>266</fpage>
          -
          <lpage>277</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          .
          <article-title>A comparison of current graph database models</article-title>
          .
          <source>In ICDE 2012</source>
          , pages
          <fpage>171</fpage>
          -
          <lpage>177</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          and
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Guti´errez. Survey of graph database models</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>40</volume>
          (
          <issue>1</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. Z.</given-names>
            <surname>Broder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Maghoul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajagopalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tomkins</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Wiener</surname>
          </string-name>
          .
          <article-title>Graph structure in the web</article-title>
          .
          <source>Computer Networks</source>
          ,
          <volume>33</volume>
          (
          <issue>1-6</issue>
          ):
          <fpage>309</fpage>
          -
          <lpage>320</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bulu</surname>
          </string-name>
          <article-title>¸c, H</article-title>
          . Meyerhenke,
          <string-name>
            <surname>I. Safro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sanders</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schulz</surname>
          </string-name>
          .
          <article-title>Recent advances in graph partitioning</article-title>
          .
          <source>CoRR, abs/1311.3144</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C. H. Q.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. D.</given-names>
            <surname>Simon</surname>
          </string-name>
          .
          <article-title>A min-max cut algorithm for graph partitioning and data clustering</article-title>
          .
          <source>In IEEE 2001</source>
          , pages
          <fpage>107</fpage>
          -
          <lpage>114</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fa</surname>
          </string-name>
          <article-title>¨rber, S. K. Cha</article-title>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Primsch</surname>
          </string-name>
          , C. Bornho¨vd, S. Sigg, and
          <string-name>
            <surname>W. Lehner. SAP</surname>
          </string-name>
          <article-title>HANA database: data management for modern business applications</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>40</volume>
          (
          <issue>4</issue>
          ):
          <fpage>45</fpage>
          -
          <lpage>51</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>X. Z.</given-names>
            <surname>Fern</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Brodley</surname>
          </string-name>
          .
          <article-title>Solving cluster ensemble problems by bipartite graph partitioning</article-title>
          .
          <source>In (ICML</source>
          <year>2004</year>
          ),
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Gilbert</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Zmijewski</surname>
          </string-name>
          .
          <article-title>A parallel graph partitioning algorithm for a message-passing multiprocessor</article-title>
          .
          <source>International Journal of Parallel Programming</source>
          ,
          <volume>16</volume>
          (
          <issue>6</issue>
          ):
          <fpage>427</fpage>
          -
          <lpage>449</lpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hardavellas</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Pandis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mancheril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ailamaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Falsafi</surname>
          </string-name>
          .
          <article-title>Database servers on chip multiprocessors: Limitations and opportunities</article-title>
          .
          <source>In CIDR 2007</source>
          , pages
          <fpage>79</fpage>
          -
          <lpage>87</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tansley</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. M.</given-names>
            <surname>Tolle</surname>
          </string-name>
          .
          <source>The Fourth Paradigm: Data-Intensive Scientific Discovery</source>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hyafil</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Rivest</surname>
          </string-name>
          .
          <article-title>Graph partitioning and constructing optimal decision trees are polynomial complete problems</article-title>
          .
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ruan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>SCARAB: scaling reachability computation on large graphs</article-title>
          .
          <source>In SIGMOD 2012</source>
          , pages
          <fpage>169</fpage>
          -
          <lpage>180</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Karypis</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>Multilevel graph partitioning schemes</article-title>
          .
          <source>In ICPP 1995</source>
          , pages
          <fpage>113</fpage>
          -
          <lpage>122</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Karypis</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>Parallel multilevel graph partitioning</article-title>
          .
          <source>In IPPS 1996</source>
          , pages
          <fpage>314</fpage>
          -
          <lpage>319</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Karypis</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm</article-title>
          .
          <source>In PPSC</source>
          <year>1997</year>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Karypis</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>A fast and high quality multilevel scheme for partitioning irregular graphs</article-title>
          .
          <source>SIAM J. Scientific Computing</source>
          ,
          <volume>20</volume>
          (
          <issue>1</issue>
          ):
          <fpage>359</fpage>
          -
          <lpage>392</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kiefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schlegel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehner</surname>
          </string-name>
          .
          <article-title>Experimental evaluation of NUMA effects on database management systems</article-title>
          .
          <source>In BTW 2013</source>
          , pages
          <fpage>185</fpage>
          -
          <lpage>204</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kintali</surname>
          </string-name>
          .
          <article-title>Betweenness centrality : Algorithms and lower bounds</article-title>
          .
          <source>CoRR, abs/0809</source>
          .
          <year>1906</year>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kissinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kiefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schlegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Habich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Molka</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lehner</surname>
          </string-name>
          . ERIS:
          <article-title>A numa-aware in-memory storage engine for analytical workload</article-title>
          .
          <source>In ADMS 2014</source>
          , pages
          <fpage>74</fpage>
          -
          <lpage>85</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <article-title>Snap - stanford network analysis platform</article-title>
          . http://snap.stanford.edu/snap/ [Online, last acessed 2016-
          <volume>02</volume>
          -30].
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. W.</given-names>
            <surname>Kernighan</surname>
          </string-name>
          .
          <article-title>An effective heuristic algorithm for the traveling-salesman problem</article-title>
          .
          <source>Operations research</source>
          , pages
          <fpage>498</fpage>
          -
          <lpage>516</lpage>
          ,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>I. Pandis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , N. Hardavellas,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ailamaki</surname>
          </string-name>
          .
          <article-title>Data-oriented transaction execution</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <fpage>928</fpage>
          -
          <lpage>939</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rother</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kolmogorov</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Blake</surname>
          </string-name>
          .
          <article-title>”grabcut”: interactive foreground extraction using iterated graph cuts</article-title>
          .
          <source>ACM Trans. Graph</source>
          .,
          <volume>23</volume>
          (
          <issue>3</issue>
          ):
          <fpage>309</fpage>
          -
          <lpage>314</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Schloegel</surname>
          </string-name>
          , G. Karypis, and
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>Graph partitioning for high performance scientific simulations</article-title>
          .
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Seo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Lam</surname>
          </string-name>
          .
          <article-title>Distributed socialite: A datalog-based language for large-scale graph analysis</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>6</volume>
          (
          <issue>14</issue>
          ):
          <fpage>1906</fpage>
          -
          <lpage>1917</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Slota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajamanickam</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Madduri</surname>
          </string-name>
          .
          <article-title>BFS and coloring-based parallel algorithms for strongly connected components and related problems</article-title>
          .
          <source>In IEEE 2014</source>
          , pages
          <fpage>550</fpage>
          -
          <lpage>559</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Taylor</surname>
          </string-name>
          and R. L. Frank.
          <article-title>CODASYL data-base management systems</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):
          <fpage>67</fpage>
          -
          <lpage>103</lpage>
          ,
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>C.</given-names>
            <surname>Walshaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cross</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Everett.</surname>
          </string-name>
          <article-title>Parallel dynamic graph partitioning for adaptive unstructured meshes</article-title>
          .
          <source>J. Parallel Distrib. Comput.</source>
          ,
          <volume>47</volume>
          (
          <issue>2</issue>
          ):
          <fpage>102</fpage>
          -
          <lpage>108</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>