HUGS – A Lightweight Graph Partitioning Approach Alexander Krause, Hannes Voigt, Wolfgang Lehner Technische Universität Dresden Database Systems Group Dresden, Germany {firstname.lastname}@tu-dresden.de ABSTRACT unequally higher with many major IT companies and DBMS The growing interest in graph data lead to increasingly more vendors on the band wagon [7, 24]. Among others, one major research in the field of graph data management and graph driver behind the graph concept’s revival is a shift in the analytics. Nowadays, even large graphs of upto a size of interest of analytics from merely reporting towards data- billions of vertices and edges fit into main memory of big intensive science and discovery [11]. Graph data can easily modern multisocket machines, making them a first-grade range in the size of billions of vertices and edges. Promi- platform for graph management and graph analytics. High nent examples of large graphs are the Facebook friendship performance data management solutions have to be aware graph, the Twitter follower graph, citation networks, web of the NUMA properties of such big machines. A data- link networks, road networks, supply chains, etc. (cf. [21]). oriented architecture (DORA) is a particular solution to that. The analysis even of the biggest graphs is often done with However, it requires partitioning the data in a way such that recursive algorithms [26]. The Breadth First Search (BFS) inter-partition communication can be avoided. is one of the most fundamental building blocks for many Graph partitioning is a long studied problem and state- popular graph analysis algorithms, such as algorithms to of-the-art solutions, such as multilevel k-way partitioning determine reachability, connected components, and between- and recursive bisection achieve good results in feasible time. ness centrality [4, 13, 19]. Integrating such solution is a rather difficult task, though. The approach on how to work with a given graph is always In this paper, we present a more lightweight approach called determined by its size. The storage and mining of such poten- Hugs. The key idea of Hugs is to reuse the BFS routine tially huge data sets requires computational resources rang- present in a graph data management system anyway, since ing from small workstations up to whole compute clusters. it is the foundation of many analytical graph algorithms. With today’s continuously decreasing main memory cost and Hugs is not meant to produce a good general-purpose graph increasingly stronger hardware, even standalone server work- partitioning but good runtimes of BFS graph traversals such stations with multiple CPUs and terabyte of main memory as reachability queries on DORA systems. In our experiments can process graphs of the billion scale. Such multiprocessor Hugs showed capable of finding good graph partitionings machines consist of multiple sockets, whereby every socket faster than state-of-the-art approaches. The partitioning holds multiple CPUs and a share of the whole systems main found by Hugs also showed shorter runtimes for reachability memory. In these large shared memory machines special queries. attention has to be payed to the organization of memory al- location and access [1, 10, 18]. Each socket manages its share of physical memory and provides the other sockets access. Keywords Although each core has access to each byte of memory, it is Graph, Graph Partitioning, Reachability, BFS, DORA a Non Uniform Memory Access (NUMA). Bandwidth and latency is lower for remote memory access to other sockets 1. INTRODUCTION than the access to a socket’s own memory. Data manage- ment systems oblivious of these NUMA effects take huge The last decade has seen resurgence of interest in graph performance penalties on a large scale for remote memory data management [2]. With the network data model in the accesses [18]. The Data-Oriented Architecture (DORA) [23] 1970s [28] and object-oriented database systems in the early is an architecture for so called NUMA-aware DBMS. Such 1990s graph-based data models and graph query languages DBMS try to co-locate data and process as much as possible got considerable attention in research already [3]. However, locally to avoid expensive remote memory access [20]. In the traction of today’s graph data management efforts is particular, partitions of the data, are bound to an individual socket, and processed only by threads running on this socket exclusively. Thereby remote memory access is limited to inter-partition communication of operators such as a traver- sal over a graph. Obviously, the partitioning has a huge impact on how much inter-partition communication takes place during processing of a specific operator. In terms of 28th GI-Workshop on Foundations of Databases (Grundlagen von Daten- graph processing on multiprocessor systems, the partitioning banken), 24.05.2016 - 27.05.2016, Nörten-Hardenberg, Germany. of the graph is inevitable. Copyright is held by the author/owner(s). 68 HUGS – A Lightweight Graph Partitioning Approach Partitioning graphs is a well studied field [5]. The typical problem definition asks for the partitioning of a graph into Message Passing Interface k partitions so that the number of edges cut at partition borders is minimal. In this absolute form the problem is NP-hard. Over the years many heuristic algorithms have been developed that try to minimize edge cuts. Prominent examples are recursive bisection [14] and multilevel k-way partitioning [16]. By exploiting the graphs topology, these general purpose algorithms find good and useful partitionings B in many real world scenarios in feasible time. However, the resulting partitions are not optimized for BFS-based analyses A on a multiprocessor system. In this paper we propose a novel lightweight graph par- titioning approach called Hugs. Hugs determines graph partitioning with the help of BFS traversals starting from Figure 1: Communication between workers high-degree vertices. As we reuse the BFS traversal opera- tions, which are available in graph data management system anyway, Hugs only needs minor additions to work with common graph systems. In contrast to the general purpose tions, e.g. for a join, every worker can independently prepare partitions of current graph partitioning approaches, Hugs sub-results for its available data partitions. Once these local exclusively aims at a partitioning scheme good for the speed- computations have been concluded, requests to other units up of BFS-based reachability queries in a NUMA setting. can be made for fetching possible join partners. As every Hugs offers two advantages. First, it provides a suitable par- worker is solely working on its own partitions, no further titioning for BFS-based analyses and second, its lightweight protection for the data partitions needs to be considered. design allows an easy integration into standard graph systems. Cross-partition operations require communication between In our experimental evaluations Hugs exhibits partitioning the actual workers. This effect occurs, if e.g. a traversal speed and partitioning quality for reachability queries compa- operation accesses a part of the graph, which is stored in rable to the state-of-the-art algorithms. Hence, Hugs offers another partition than it is currently running in. For this a significantly easier to implement, lightweight alternative purpose, the system needs to provide an asynchronous, high- to state-of-the-art graph partitioning algorithms for graph throughput message passing layer that allows to hand over data management system. a task to another worker. For instance, when a traversal In the remainder of the paper we first describe the DORA- query reaches a border vertex inside a partition of one worker based database infrastructure which we are targeting at (Sec- and the neighbor of this vertex resides in one partition of tion 2) and detail on the graph partitioning problem as well another worker, a message with the current search status and as state-of-the-art graph partitioning algorithms (Section 3). the target vertex will be sent to the second worker, asking We continue with presenting our lightweight partitioning to resume the search from this point on. This process is approach Hugs (Section 4) and the experimental evaluation illustrated in Figure 1. (Section 5). Finally, we discuss related work (Section 6) and As the message passing layer is aware of the partition conclude the paper (Section 7). assignments, it will directly send messages to the required workers [20]. Every message constitutes a remote memory 2. PROBLEM STATEMENT access, which extends to any NUMA system. Assuming a As described in the previous section, we want to optimize balanced utilization of all sockets, a low number of mes- BFS-based graph analytics for NUMA affected multiproces- sages typically reduces the total runtime of a given workload. sor systems. Since DORA is well tailored for the NUMA Obviously, the partitioning of the graph has significant influ- effect, it is predestined for exploiting locality in graph analy- ence on how many message will have to be sent for a given sis on a NUMA system. traversal. For most systems, execution threads or transactions are the When performing a BFS, the search starts from one spe- central point of view, one does also say thread-to-transaction cific vertex. Then, all its subsequent neighbors will be visited assignment. Transactions need to be globally synchronized following the edge direction. Undeniable, the BFS will eventu- and locks and latches need to be implemented, in order to ally reach a partitions border, which leads to communication prevent the system from anomalies, such as concurrent writ- overhead. There are two scenarios to consider. (1) Staying ings to the same data record. In a DORA system, where as long in the current partition as possible to avoid any the thread-to-data assignment holds, no specific locks are communication at all and (2) reaching partition boundaries needed [23]. Inside a database system based on DORA, each as fast as possible in order to exploit the system inherent processing unit of the NUMA system represents a worker [20]. parallelism to the maximum. The effectiveness of either During computations, the worker should never switch its affin- strategy is dependent on the current system load. If other ity to another physical processing unit on another socket, workers are highly utilized, completely searching through since this would introduce additional messaging overhead. the current partition before notifying others may produce Every worker is assigned with one or multiple data partitions a better runtime behavior than sending messages to other of the whole data set. This approach holds, regardless if the workers as soon as a border vertex has been found. However, system stores graph or relational data. One advantage of if the system is in an idle state, the total opposite is needed. this architecture lies in the implied parallelism of the system. Residing in the current partition as long as possible would When a transaction needs to access data from multiple parti- completely underutilize the system ressources. Therefore, 69 HUGS – A Lightweight Graph Partitioning Approach Multilevel K-way Partitioning GO GO Uncoarsening Phase Coarsening Phase G1 G1 (a) Top 2 Hubs, 1 per parti- (b) Final Result tion G2 G2 Figure 3: Partitions resulting from Hugs G3 G3 require graphs to be undirected, since these approaches aim G4 to reduce the edges cut between partitions. Initial Partitioning Phase For BFS-based analytics, these partitionings will not be Figure 1: The various phases of the multilevel k-way partitioning algorithm. During the coarsening phase, the size of the graph optimal. As undirected graphs can always be traversed in any Figure is successively decreased;2: Multilevel during k-Way the initial partitioning phase, apartitioning k-way partition of thescheme smaller graphfrom [16]and during thedirection along every edge, none of the vertices can become is computed; uncoarsening phase, the partitioning is successively refined as it is projected to the larger graphs. a dead end, as long as it is part of more than one edge. For In the rest of this section we briefly describe the various phases of the multilevel algorithm. The reader should referdirected graphs however, any vertex could represent a sink, to [18] for further details. sending as much messages as possible to other workers can i.e. when there are only ingoing edges from other vertices. only increase the queryphase, performance. This key characteristic has to be considered when partitioning Coarsening Phase During the coarsening a sequence of smaller graphs G i = (Vi , E i ), is constructed from the original graph G = (V , E ) such that |V | > |V |. Graph G is constructed from G by finding a maximal directed graphs, especially for BFS-based queries. 0 0 0 i i+1 i+1 i 3. CLASSICAL GRAPH PARTITIONING matching Mi ⊆ E i of G i and collapsing together the vertices that are incident on each edge of the matching. In this process no more than two vertices are collapsed together because a matching of a graph is a set of edges, no two of 4. HUGS – HUB-CENTERED GRAPH PAR- In this which are incident on thework we are same vertex. considering Vertices a graph that are not incident G= on any edge (V,matching of the E), whereare simply copied v .∈ V is a vertex in G and (vi , vj ) ∈ E with E ⊆ (V × V ) over to G i+1 TITIONING When vertices v, u ∈ Vi are collapsed to form vertex w ∈ Vi+1 , the weight of vertex w is set equal to the sum of is an edge from vi to vj . We are only considering lossless We claim, that partitioning a graph with a BFS-based the weights of vertices v and u, and the edges incident on w is set equal to the union of the edges incident on v and partitionings into disjoint partitions P 1 , P 2 , ..., P u minus the edge (v, u). For each pair of edges (x, v) and (x, u) (i.e., x is adjacent to both n ⊆ V with S v and u) a single edge approach will increase the performance of BFS-based queries. Pi ∩ P (x, w) is created whose ∅ foris all j =weight i 6=tojtheand set equal of jthe∈weights sum i, [1, n]ofas well these as Thus, two edges. i Pi during = V .successiveThe idea behind our novel graph partitioning method is, that The coarsening levels, the Graph Partitioning weight of both vertices and edgesProblem increases. The (GPP) is wellisstudied. process of coarsening illustrated in Figure 2.many paths between two vertices will make use of high degree Each vertexDividing a graph and edge in Figure 2(a) hasG into a unit k Figure weight. partitions 2(b) showsof the balanced coarsened graphsize, is from thevertices. Additionally, geometrical patterns like triangles or that results contractionknown of shaded tovertices be inNP-complete Figure 2(a). Numbers on thePractically [12]. vertices and edges in Figure 2(b) feasible show their resultingchains are likely to pass through these nodes. Therefore, solutions weights. can only solve the problem heuristically. Many approaches centering the partitioning around hubs of the graph should Maximal matchings can be computed in different ways [17, 18]. The method used to compute the matching greatly have been developed for tackling the GPP. One of them is affects both the quality of the partition, and the time required during the uncoarsening phase. The matching scheme create partitions, which provide sufficient locality for certain that we usethe yetheavy-edge is called most successful matching (HEM), multilevel and computespartitioning a matching Mi , such[5]. thatMultilevel the weight of the edges inqueries. The resulting partition will most likely represent the partitioning itself is no actual algorithm, rather a heuristical structure which would be created when performing a BFS strategy. It consists of three stages 3 in which multiple methods on the graph. Having the partition equally structured as the can be applied. The three stages are (1) coarsening the input search tree itself should lead to a significant synergy between graph, (2) finding a partitioning for the coarsened graph and both. (3) uncoarsening the partitioning back to the granularity of Hugs is a lightweight BFS-based graph partitioning ap- the input graph (cf. Figure 2). Basically, the idea is to do proach. As many graph systems almost always implement the actual partitioning only on graph sizes where complexity a version of BFS, this implementation can be reused with of the partitioning problem is bearable. The coarsening is slight modifications. In contrast to the multilevel partitioning done by combining multiple vertices into one abstract ver- methods, Hugs considers the whole graph without coarsening tex. Edges are coarsened likewise by maintaining the graphs it. Its heuristic is based on centering the partitions around topology as well as possible. Typically, the coarsening is so called hubs in the graph. A hub is a vertex with a high repeated until the graph is small enough (a few hundreds of degree, i.e. it has many incoming and outgoing edges. The vertices). On this very small graph even expensive partition- intuition behind this heuristic is that hubs, because of the ing algorithms can be applied without much runtime burden. many incoming and outgoing edges, are most likely part of Finally, the uncoarsening iteratively unpacks the abstract many paths in the graph – and particularily of paths in the vertex. After each uncoarsening a refinement procedure opti- neighborhood of hubs. Essentially, Hugs runs BFSs starting mizes the partitioning of the now finer grained graph. Since from these hubs to determine the partitions. The order in the refinement steps start from an already well partitioned which the vertices are relaxed during the BFS determines base, the added overhead remains small. the partitioning time as well as the partitioning quality. There are two approaches for multilevel partitioning, Algorithm 1 shows Hugs’ partitioning procedure. A set namely k-way [16] and recursive bisection [14]. The main of hubs H is maintainted besides the base data. H de- difference for both methods resides in the second phase. Mul- scendingly lists the vertices h with a top-(k · n) degree tilevel k-way partitioning aims to partition the graph into deg(h), where k ∈ N+ ∧ k · n  |V |. We define the k partitions directly. While recursive bisection recursively degree deg(v) of a vertex v as deg(v) = |K(v)| where divides the graph and the resulting subgraphs into two par- K(v) = {w | (v, w) ∈ E ∨ (w, v) ∈ E} is the neighborhood titions, until the final number of partitions is reached. Both of v. Initially, Hugs adds k of the top-n hubs and their 70 HUGS – A Lightweight Graph Partitioning Approach Algorithm 1 HUGS 5. EVALUATION 1: H ← A Set of Hubs To show the benefit of our novel and lightweight graph 2: Pi ← A Partition partitioning approach, we conducted a series of experiments 3: N ← A list of partition candidates for testing a prototypical implementation of Hugs. As data 4: for all Pi do we used two directed graphs taken from the online graph 5: Add k of hubs in H as root hubs to Pi data collection of SNAP [21]. The first graph is the Wikivote 6: Add the neighbors of the root hubs to Pi graph. It represents the voting behaviour of users, who had 7: while Pi not full do to elect new moderators. It is a rather small graph with 7115 8: for all v ∈ Pi do vertices and 103689 edges. The second graph is the Stanford 9: for all u ∈ K(v) ∧ u not visited do web graph, which represents the structure of the Stanford 10: Set u visited University website with 281903 vertices and 2312497 edges. 11: degPi (u) = |K(u) ∩ Pi | Every edge represents a hyperlink between two webpages. 12: N ← N ∪ {u} As for our test setting, we partitioned every graph and 13: end for measured different statistics. For the multilevel k-Way par- 14: end for titioning and recursive bisection algorithms, we used the degPi (u) 15: Order N by deg(u) METIS library [17]. Additionally we also used a random par- 16: Add top-t of N to Pi and remove from N titioning as a baseline. For every partitioning appraoch we 17: Set N \ Pi unvisited measured the runtime of the partitioning algorithm as well 18: end while as the runtime of the ten random reachability queries. All 19: end for tests have been performed using a workstation with an AMD 20: Add unreachable nodes to the smallest partition Opteron 6274 CPU and 64 GB of main memory. Because of its significantly longer query runtime, the performance of the random partitioning was considerably worse than all other approaches. Therefore we omited its runtimes from the direct neighbors to each partition (line 5–6). We refer to corresponding figures. In order to maintain comparability these k top-n hubs as the root hubs of the partition. In each and to avoid hidden latencies, we performed the testing se- BFS step, Hugs does not add all newly explored vertices to quentially and measured the query time spent per partition. the current partition. Instead, it determines for each newly Since Hugs outperforms its competitors in a sequential envi- explored vertex u the ratio of its neighbors in the current ronment, it is obvious that a DORA system would also show partition degPi (u) and all its neighbors deg(u) (line 8–15). better performance values for Hugs compared to multilevel The ratio is a heuristical local measure how close the vertex k-Way and recursive bisection. is to the current partition. Newly explored vertices with a Figure 4 and 5 show the partitioning times of Hugs com- top-t ratio are added to the partition (line 16). pared to multilevel k-Way and recursive bisection. For Wikiv- There are two parameters to steer the performance of ote, Hugs outperforms both approaches due to its reduced Hugs: the number of root hubs k and the maximum growth effort in creating topologically balanced partitions. For Stan- rate t. Starting with only one root hub creates a partition ford, Hugs struggles with a smaller number of partitions. which is coherent but Hugs needs more BFS steps to fill the A full BFS with continuous ranking of all neighbors of the partition. Multiple root hubs result in a partition consisting partition leads to longer partitioning times. With a growing of a scattered number of sub areas in the graph which usually number of partitions, Hugs’ partitioning time drops a bit results in faster partitioning time but yields less quality. A while the k-Way partitioning and recursive bisection algo- higher growth rate means that more of the explorer vertices rithms take considerably longer. For DORA systems, which are added to the partition, which drastically reduces the aim at large multiple socket server machines, partitionings partitioning time but yields less quality, since many low into eight and more partitions are the relevant scenarios. ranked vertices are added to the partition as well. Here, Hugs consistently outperforms its competitors. Figure 3 illustrates the whole process for n = 2 partitions For query runtime tests, we selected ten pairs of random and serves as an example. First, the two vertices with the vertices in the graph. These pairs were fixed for all parti- highest degree in the graph are selected, as shown in Figure tioning approaches as well as for every number of partitions. 3(a). Starting from the vertex with the higher degree, all The reachability traversal is implemented as BFS as well. neighbors will be scanned and added to its partition. If the Starting from the source node of the reachability request, the maximum number of vertices for this specific partition has search procedure starts forward directed. As explained in been reached, the BFS will terminate. Otherwise the search Section 2, our data is stored on one specific worker. There- continues with the neighborhood sets of the already selected fore, we try to find the target vertex in the root partition. vertices. The same procedure is analoguesly applied to the When the BFS reaches an inter-partition edge, the worker second hub. with the partition containing the targeted vertex will be Hugs provides two advantages. The first advantage applies notified and immediately starts his own search procedure. to the optimized partitioning. Creating synergies between When multiple vertices have an inter-partition edge to the the data partitioning and the search routine increases query same target partition, these targeted vertices will be queued performance. Second, through reusing existing implementa- with the corresponding worker. If those vertices are already tions of BFS, Hugs is lightweight and easy to implement. visited from the current or already terminated search, no ad- However, a drawback of our method can result from the ditional search will be scheduled. The procedure terminates, graphs topology. If the highest degree vertices are all located when one of the workers finds a path from a border vertex in each others neighborhood set, the algorithm may produce to the target vertex. This experiments represent the first of imbalanced results. the two scenarios explained at the end of Section 2. 71 HUGS – A Lightweight Graph Partitioning Approach HUGS k-Way Rec. Bisection HUGS k-Way Rec. Bisection 250 140 120 Partition Time in milliseconds 200 Search Time in seconds 100 150 80 60 100 40 50 20 0 0 2 4 8 16 32 2 4 8 16 32 Number of partitions Number of partitions Figure 4: Partitioning times for the Wikivote Graph Figure 6: Reachability runtimes for the Wikivote Graph HUGS k-Way Rec. Bisection HUGS k-Way Rec. Bisection 2000 1600 1800 1400 Partition Time in milliseconds 1600 Search Time in seconds 1200 1400 1000 1200 800 1000 600 800 400 600 400 200 200 0 2 4 8 16 32 64 128 2 4 8 16 32 64 128 Number of partitions Number of partitions Figure 5: Partitioning times for the Stanford Graph Figure 7: Reachability runtimes for the Stanford Graph The runtimes of the reachability queries are displayed in graph on a topological level. We exploit the fact, that high Figures 6 and 7. The small Wikivote graph shows, that a degree vertices are most likely to be visited, when searching search based on Hugs provides consistently faster answers. connections between two vertices. The topological partitionings of k-Way partitioning and re- A different field of applications is to model problems as cursive bisection tend to introduce more overhead for a BFS- graphs and apply partitioning algorithms on it to simplify based search. Since the partitioning and search scheme are computations on this data. Fern and Brodley use multilevel both BFS-based, they can synergize and yield good perfor- graph partitioning to solve the cluster ensemble problem [8]. mance values, compared to the two other approaches. They map the datapoints to vertices and partition them according to their similarity, which is used as an edge weight. 6. RELATED WORK Gilbert and Zmijewski show, that using the Kernighan Lin Algorithm [22] can improve the performance of mes- Many works aim at improving graph partitioning algo- sage passing multiprocessor systems, like the hypercube [9]. rithms themselves. Karypis and Kumar [15] are improving Similar to our work, the ultimate goal is to reduce com- the multilevel partitioning approach by parallelizing multiple munication overhead between the workers. As our method steps. At first, the bisection of the subgraphs is performed by is based on BFS only, it will most likely not produce opti- two processors. The resulting partitions are to be bisected as mal partitionings with a minimal edge cut. Yet we produce well. This workload is also split up for multiple processors. highly localized partitions, which reduce the communication The authors refer to these techniques as parallelism in the re- overhead of BFS-based reachability queries. cursive and in the bisection step. Hugs is designed as a single Schloegel, Karypis and Kumar [25] as well as [29] use graph threaded BFS-like partitioning. Slota, Rajamanickam and partitioning for exploiting parallelism on meshes, which are Madduri show, that the BFS itself can be parallelized [27]. often the basis of scientific calculations. Meshes can consist Ding et al. are discussing a min-max cut problem for of massive amount of data and therefore don’t fit into a data clustering [6]. The authors claim that they achieve a single machines main memory. Since scientific calculations balanced partitioning with maximizing the similarity between often refer to incremental updates of neighboring nodes, we vertices in the same partition. In contrast the similarity or assume that Hugs could be applied to this field as well. association to vertices of other partitions is to be minimized. Other approaches like normalized cut or ratio cut are said to be less efficient. Generally speaking, this approach seeks 7. CONCLUSIONS AND FUTURE WORK to cluster identities, with high similarity and thus partitions We presented Hugs, a lightweight, BFS-based graph parti- the graph on a logical level. In contrast, Hugs partitions the tioning algorithm. Hugs is meant for partitioning graph data 72 HUGS – A Lightweight Graph Partitioning Approach for its management in DORA systems. We showed that Hugs [10] N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, yields better performance for BFS-based reachability queries A. Ailamaki, and B. Falsafi. Database servers on chip than its state-of-the-art graph partitioning competitors. At multiprocessors: Limitations and opportunities. In the same time Hugs also determines the partitioning quicker CIDR 2007, pages 79–87, 2007. than these standard approaches particularly for relevant sce- [11] T. Hey, S. Tansley, and K. M. Tolle. The Fourth narios of 8 and more partitions. Implementing Hugs in a Paradigm: Data-Intensive Scientific Discovery. 2009. graph data management system is simple since its can build [12] L. Hyafil and R. L. Rivest. Graph partitioning and on existing BFS code, which is one of the most fundamental constructing optimal decision trees are polynomial graph routines and can be assumed to be implemented by complete problems. 1973. every typical graph data management system in a highly [13] R. Jin, N. Ruan, S. Dey, and J. X. Yu. SCARAB: optimized fashion. scaling reachability computation on large graphs. In Our investigations reported here shows Hugs to be a SIGMOD 2012, pages 169–180, 2012. promising approach for DORA systems. As our prototype [14] G. Karypis and V. Kumar. Multilevel graph mostly favors the first partition, we are investigating other partitioning schemes. In ICPP 1995, pages 113–122, filling techniques, e.g. an alternating calculation of each par- 1995. tition, whereby the corresponding runtime penalties are yet [15] G. Karypis and V. Kumar. Parallel multilevel graph to be resolved. We are further investigating other heuristics partitioning. In IPPS 1996, pages 314–319, 1996. to determine for the partition growth phase. Since Hugs exploits two parameters for adjustments, it is customizable. [16] G. Karypis and V. Kumar. A coarse-grain parallel However, the algorithms performance is dependent of the formulation of multilevel k-way graph partitioning algorithm. In PPSC 1997, 1997. underlying graphs structure. Therefore, we need to find good heuristics to automatically determine upfront, which [17] G. Karypis and V. Kumar. A fast and high quality configuration works best for a given graph. Further down multilevel scheme for partitioning irregular graphs. the road, we will look into an incremental version of Hugs SIAM J. Scientific Computing, 20(1):359–392, 1998. for maintaining the partitioning upon updates to the graph. [18] T. Kiefer, B. Schlegel, and W. Lehner. Experimental evaluation of NUMA effects on database management 8. ACKNOWLEDGMENTS systems. In BTW 2013, pages 185–204, 2013. [19] S. Kintali. Betweenness centrality : Algorithms and This work is partly funded by the German Research Foun- lower bounds. CoRR, abs/0809.1906, 2008. dation (DFG) within the Cluster of Excellence ”Center for [20] T. Kissinger, T. Kiefer, B. Schlegel, D. Habich, Advancing Electronics” (cfaed) in the Collaborative Research D. Molka, and W. Lehner. ERIS: A numa-aware Center 912 Highly Adaptive Energy-Efficient Computing in-memory storage engine for analytical workload. In (HAEC). ADMS 2014, pages 74–85, 2014. [21] J. Leskovec. Snap – stanford network analysis platform. 9. REFERENCES http://snap.stanford.edu/snap/ [Online, last acessed [1] A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. 2016-02-30]. Wood. Dbmss on a modern processor: Where does time [22] S. Lin and B. W. Kernighan. An effective heuristic go? In VLDB 1999, pages 266–277, 1999. algorithm for the traveling-salesman problem. [2] R. Angles. A comparison of current graph database Operations research, pages 498–516, 1973. models. In ICDE 2012, pages 171–177, 2012. [23] I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. [3] R. Angles and C. Gutiérrez. Survey of graph database Data-oriented transaction execution. PVLDB, models. ACM Comput. Surv., 40(1), 2008. 3(1):928–939, 2010. [4] A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, [24] C. Rother, V. Kolmogorov, and A. Blake. ”grabcut”: S. Rajagopalan, R. Stata, A. Tomkins, and J. L. interactive foreground extraction using iterated graph Wiener. Graph structure in the web. Computer cuts. ACM Trans. Graph., 23(3):309–314, 2004. Networks, 33(1-6):309–320, 2000. [25] K. Schloegel, G. Karypis, and V. Kumar. Graph [5] A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and partitioning for high performance scientific simulations. C. Schulz. Recent advances in graph partitioning. 2000. CoRR, abs/1311.3144, 2013. [26] J. Seo, J. Park, J. Shin, and M. S. Lam. Distributed [6] C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. socialite: A datalog-based language for large-scale A min-max cut algorithm for graph partitioning and graph analysis. PVLDB, 6(14):1906–1917, 2013. data clustering. In IEEE 2001, pages 107–114, 2001. [27] G. M. Slota, S. Rajamanickam, and K. Madduri. BFS [7] F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and coloring-based parallel algorithms for strongly and W. Lehner. SAP HANA database: data connected components and related problems. In IEEE management for modern business applications. 2014, pages 550–559, 2014. SIGMOD Record, 40(4):45–51, 2011. [28] R. W. Taylor and R. L. Frank. CODASYL data-base [8] X. Z. Fern and C. E. Brodley. Solving cluster ensemble management systems. ACM Comput. Surv., problems by bipartite graph partitioning. In (ICML 8(1):67–103, 1976. 2004), 2004. [29] C. Walshaw, M. Cross, and M. G. Everett. Parallel [9] J. R. Gilbert and E. Zmijewski. A parallel graph dynamic graph partitioning for adaptive unstructured partitioning algorithm for a message-passing meshes. J. Parallel Distrib. Comput., 47(2):102–108, multiprocessor. International Journal of Parallel 1997. Programming, 16(6):427–449, 1987. 73