Graph Processing: Main Concepts and Systems Mikhail Chernoskutov Krasovskii Institute of Mathematics and Mechanics, Ural Federal University Yekaterinburg, Russia mach@imm.uran.ru Abstract fore, the structural or functional connections between vertices are represented as edges [4, 5]. Another typ- The development of new processing and stor- ical use case of graphs is the modeling of road net- age facilities leads to an increase in the size works, where intersections are vertices, and roads are of the problems to be solved. One of such edges [6]. Graphs are also used in bioinformatics, com- problems is graph processing, which charac- puter security, data analysis and other fields of science, terized by their non-determinism and an ir- industry and business. regular memory access pattern, that greatly The huge volume of accumulated data led to the complicates its development and debugging. growth of graphs that need to be investigated. This led In this paper, we describe the basic concepts to the emergence of the network science, dealing with that make it possible to simplify the develop- the study of the features of the interaction between ment of graph algorithms and its porting to many objects. The number of algorithms necessary for various computer architectures. In addition, domain scientists is also growing rapidly. As a result, a description of the most well-known graph various specialized tools for graph analysis appear on processing systems that implement these con- the market. cepts is given. Development and debugging of graph algorithms is Keywords: high performance computing, a non-trivial task because of the non-determinism of graph algorithms, graph processing systems, many such algorithms. The use of parallel computa- parallel processing tions as a tool for accelerating computation further complicates this task. Therefore, there are many dif- ferent methods and technologies that make it possible 1 Introduction to simplify the development of graph algorithms and A graph is a mathematical abstraction for the rep- simplify the work of various domain scientists. Below resentation of objects and the connections between are the main concepts described in the paper: them. The graph G = (V, E) consists of the set of vertices V with the number of elements n and the set • Shift to parallel processing in order to speedup of edges E with the number of elements m. Using the processing of big graphs (allows to faster process graphs, various objects of the real world can be repre- graphs with big size); sented. For instance, in the study of social networks it • New data structures for storing graphs, allowing is convenient to represent individual users as vertices, faster and more convenient graph mutation and and links between them as edges [1, 2]. Graphs are navigation through the graph (allows to operate actively used in the processing of natural languages with graphs in more efficient and convenient way); • Development of new models for describing graph for the word sense induction problem [3]. We can algorithms, which makes it possible to effectively also model the work of the brain, denoting individ- use the parallelism embedded in the algorithms ual neurons (or brain areas) as vertices, and, there- and effectively map it to modern computing Copyright c by the paper’s authors. Copying permitted for architectures (allows to construct architecture- private and academic purposes. independent algorithms with more productivity In: V. Voevodin, A. Simonov (eds.): Proceedings of the for the developer). GraphHPC-2017 Conference, Moscow State University, Russia, 02-03-2017, published at http://ceur-ws.org. In this paper, we describe the above technologies 1 used in various existing graph processing systems. The supported algorithms. Parallel BGL is capable of pro- work is structured as follows. The Section 2 provides cessing graphs consisting of billions of vertices and al- a description of the various types of parallelism that lows developer to parallelize calculations on hundreds are used to speed up the processing of graphs. The of computational processes by using MPI [13]. Section 3 describes the data structures that are used to store graphs, together with its strengths and weak- 2.3 Asynchronous Parallel Processing nesses. Section 4 is devoted to the description of var- Asynchronous parallel processing of graphs has not ious models of processing graphs. We concludes with yet become as widespread and ubiquitous as the BSP final remarks in Section 5. model, but in some cases it allows to execute some algorithms (such as belief propagation [14]) more effi- 2 Shift to Parallel Processing ciently due to it faster convergence. 2.1 Serial Processing Asynchronous execution assumes an independent operation of each parallel thread or process from all Sequential execution of graph algorithms is still very other threads or processes. At the same time, when common. Sequential processing can be used for proto- working with shared memory, such undesirable effects typing of new algorithms and can be one of the steps to as data race can arise. This forces developers to use building highly efficient parallel algorithms that allow duplicate copies of graph elements, as well as use dif- to process graphs with billions of vertices. ferent synchronization primitives. Up to now, graph processing systems based on se- GraphChi [15] is one of the systems that provide the rial execution of algorithms, such as NetworkX [7], user with parallel asynchronous processing of graphs. Gephi [8] and igraph [9], are still very popular and Asynchronous processing is achieved by using the con- ubiquitous. The popularity of these systems is deter- cept of Parallel Sliding Windows, which involves ex- mined by a large set of algorithms and a lot of tools tracting and updating the state of a small number of for visualizing the output of the graph algorithms. vertices and edges that are simultaneously extracted These systems are actively used in the complex net- from the PC’s memory (SSD or hard drive). A simi- works analysis. lar concept is implemented in the CuSha system [16] and allows to achieve asynchronous processing of the 2.2 Synchronous Parallel Processing graphs using GPGPU accelerators. Sequential algorithms are not always suitable for solv- 3 Various Graph Formats ing real-world problems. With the rapid growth of the data size, the size of the processed graphs is also 3.1 Adjacency and Incidence Matrices growing. Obvious way to accelerate graph processing The matrix is a natural mathematical representation (in addition to developing more efficient algorithms) is of the graph. The graph G = (V, E) consisting of n the parallelization of existing algorithms on graphs. vertices in this case can be described by the matrix The most common parallel processing model for n × n of the form A = (aij ), where: graph computations is the bulk synchronous model (BSP) [10]. Another name for algorithms that work in ( 1, eij ∈ E accordance with the BSP model is level-synchronized aij = (1) algorithms. Within the BSP model, the N + 1 itera- 0, eij ∈ /E tion will be executed only after the N iteration will be An example of an adjacency matrix of a directed completed. This model can be implemented by using graph is shown in Figure 1. two nested for loops – one for all vertices, and the other one for all adjacent edges of each vertex. 0 Drawback of the concept of synchronous-parallel 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 processing connected with overheads arising in some 0  1 0 1 0 1 1  2 A = 0 0 0 0 0 1 0 algorithms because of the processing of inactive (not   0 0 0 0 0 0 0   3 0 0 0 0 0 0 1 affecting the algorithm output) vertices at each itera- 4 0 0 0 0 1 0 0 tion of the algorithm. 5 6 Among the graph processing systems based on the BSP model, the Boost Graph Library (BGL) [11], Figure 1: Representation of the directed graph (left) as well as its parallel implementation called Parallel in the form of an adjacency matrix (right) BGL [12], are the most widely used. These systems implement the concept of adjacency lists (with ability If the graph is non-directed, then the corresponding to modify the graph topology) and have quite a lot of adjacency matrix is symmetric; aij = aji . If the graph 2 is weighted, then the elements of the adjacency matrix the use of matrices “as-is” to handle large graphs is take the following form: unacceptable for many real-world applications. ( wij , eij ∈ E 3.2 Compressed Lists aij = (2) 0, eij ∈ /E Another one popular graph storage format is adja- cency lists. The adjacency lists allow the graph to where wij is weight of the edge eij . Another way be stored in the form of few linear arrays. to represent weighted graph is to add supplementary The adjacency lists allow storing only existing graph weight matrix to adjacency matrix (with a restriction elements (non-zero elements in the adjacency matrix), on zero weight edges). which allows to achieve a linear dependence between Another way of describing the graph is the incidence the amount of consumed memory and the number of matrix. The incidence matrix contains data about all vertices and edges in the graph. The Compressed edges in the rows, and the columns are used to repre- Sparse Rows (CSR) format, which implements the con- sent the vertices. Matrix has the form B = bij and size cept of contiguity lists, is one of the most popular and m × n, where m is the number of edges in the graph. useful, and consists of two arrays: Due to the fact that in this case each row must simul- • row pointers – contains the offsets in the rows taneously contain data on both the beginning and the of the corresponding adjacency matrix; end of corresponding edge, it is most convenient to rep- • column ids – contains data about the end of each resent graph with two matrices: one serves to store the edge. data about the beginning of the edges, and the other – about the ends (it works only for directed graphs, but From i to (i + 1) element of the row pointers array edges of undirected graphs can be represented as cou- there are the ranges of vertex numbers in the array ples of directed edges with opposite directions). The column ids, which contains outgoing edges incident incidence matrix of the graph depicted in Figure 1 is to i vertex. Below is a representation of the above ar- shown in Figure 2. rays for the graph from Figure 1: row pointers = [0, 1, 1, 5, 6, 6, 7, 8] Bfrom = Bto = 1 0 0 0 0 0 0 0 1 0 0 0 0 0 column ids = [1, 1, 3, 5, 6, 5, 6, 4] For in- 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0  0 1 0 0 0 0  0  0 0 1 0 0 0  stance, vertex 2 has 4 edges (row pointers[3] - row 0 0 1 0 0 0 0 0 0 0 0 0 1 0     0  0 1 0 0 0 0  0  0 0 0 0 0 1  pointers[2] = 5 - 1 = 4). Edges can be recon- 0 0 0 1 0 0 0 0 0 0 0 0 1 0     0 0 0 0 0 1 0 0 0 0 0 0 0 1 structed by looking at column ids array from position 0 0 0 0 0 0 1 0 0 0 0 1 0 0 column ids[1] to column ids[5] (not included). It Figure 2: The incidence matrix for the beginnings stores numbers of vertices, that incident to opposite (left) and the ends (right) of the edges ends of edges outgoing from each vertex in the graph. In the example above, vertex 2 has outgoing edges to Data structures represented by adjacency and in- vertices 1, 3, 5, 6. cidence matrices allows developer to quickly find the The traversal of the vertices and edges of the graph necessary elements in the graph, as well as add new in this case is performed with two nested loops: the elements to the graph. In the case of an adjacency outer loop reads offsets from the row pointers array matrix, the addition or removal of new edges to the to obtain information about the number of edges out- graph can be performed by changing the values of the going from each vertex, the inner loop reads column necessary matrix elements from 0 to 1 and vice versa. ids array for following processing neighbors of each The incidence matrix makes it possible to deal not only vertex in the graph. with general graphs, but also with multigraphs (hav- The main advantage of the CSR format is the com- ing parallel edges), and also with hypergraphs (having pactness of the graph representation, which makes it edges incident to any number of vertices). This im- convenient for processing large graphs. For instance, provement is achieved by describing each edge with the CSR format is actively used in the Knowledge Dis- two separate lines in different matrices: one line in covery Toolbox (KDT) [18] and GraphCT [19] systems first matrix for the beginning of hyperedge and one to store and process large graphs obtained from analy- line in second matrix for the end of hyperedge. sis of biological data and social networks. In the KDT However, these graph storage formats have one sig- system, the CSR format is used for compact storage nificant drawback – the amount of memory required to the incidence matrices of the graphs. In the GraphCT store matrices is proportional to n × n for adjacency system, on the other hand, CSR is used to store adja- and to n × m for incident matrix [17]. For instance, an cency matrices. integer adjacency matrix for a graph of 1 000 000 ver- However, the CSR format is only suitable for pro- tices will occupy more than 3.7 TB of memory. Thus, cessing static graphs. Processing of dynamic graphs 3 with the CSR format is extremely inefficient, and re- developer to store graphs with a lot of vertices and quires a complete rebuild of the entire data structure edges (light weight operations of vertex and eage ad- with every graph mutation. In addition, the parallel dition and deletion), but badly suited for graph topol- processing of graphs with skewed degree distribution is ogy modification (see vertex and edge deletion oper- burdened with additional overhead costs in the form of ations) and navigation through the graph (it is hard computational workload imbalance amongst computa- to obtain list of ingoing edges to some random ver- tional processes or threads [20, 21]. tex). Opposite, storing graph as matrix allows de- veloper to carry out fast graph topology modifications 3.3 Custom Formats (with complexity O(1) and O(N )) as well as navigating Typically, custom formats designed to simplify devel- through the graph (checking random vertex and edge opment of some complex graph algorithms. For in- state with complexity O(1)), but matrix data struc- stance, some maximum flow algorithms (Edmonds- ture suffers from non-linear memory complexity (need Carp algorithm [22] and push-relabel algorithm [23]), O(N 2 ) space for graph with N vertices). uses a special “residual conductance” network, which At the moment, there is no standard format for stor- is repeatedly rebuilt during the execution of the algo- ing graphs, which would allow to efficiently modify the rithm. Another example is the Girvan-Newman com- graph, access its individual elements in linear time, munity detection algorithm [24]. The main idea of the and store only significant elements of the graph, and algorithm is to isolate communities by removing inter- allow parallel processing at the same time. However, community edges from the graph. a number attempts are being made to develop such a As seen, the above algorithms use not only the format. For instance, the extended storage function- traversal procedure through all vertices and edges ality and parallel processing of dynamic graphs is pro- (which can be efficiently performed using the CSR for- vided by the data structure called STINGER [25, 26]. mat), but also the procedures of searching for indi- This data structure is based on linked lists consisting vidual graph elements, as well as modifying the graph of blocks of graph elements. Each block is a data stor- topology. The detailed list of most commonly used op- age for an individual subset of vertices or edges. Each erations is presented in Table 1. Table 2 shows memory block contains special metadata about the elements complexity for various operations. stored inside it (for example, the minimum and maxi- mum number of the vertex inside the block). By sepa- Table 1: Time complexity analysis of graph storing rating the entire array of vertices or edges into subsets, formats and using metadata, the STINGER data structure al- Procedure Matrix CSR lows to process dynamically changing graphs in paral- Vertex addition O(N ) O(1) lel. Another example of custom graph format is Res- Vertex deletion O(N ) O(N 2 ) ident Distributed Graphs (RDG) from GraphX [27], Edge addition O(1) O(N 2 ) that uses vertex-cut partitioning scheme to minimize Edge deletion O(1) O(N 2 ) data transfers during graph computations by evenly Check whether vertex in O(1) O(1) assigning edges to computational nodes. Also, RDG graph allows to efficiently view, filter and transform the Get weight of (random) O(1) O(N ) graph by operating on three tables (implemented as edge Spark Resilient Distributed Datasets [28]) storing the Get list of ingoing neigh- O(N ) O(N 2 ) RDG: EdgeTable (adjacency structure and edge data), bors of vertex VertexDataTable (vertex data) and VertexMap (map- Get list of outgoing neigh- O(N ) O(N ) ping from the id of a vertex to the ids of the virtual bors of vertex partitions that contain adjacent edges). 4 Graph Processing Models Table 2: Space complexity analysis of graph storing formats 4.1 Vertex-Centric Paradigm Procedure Matrix CSR According to this paradigm, graph algorithms are exe- Vertex addition O(N ) O(1) cuted in “think like a vertex” model. Thus, each indi- Vertex deletion O(N ) O(1) vidual vertex is considered as a separate computational Edge addition O(1) O(1) element, dealing with the data available to it (for in- Edge deletion O(1) O(1) stance, internal variables) and having the ability to ex- As can be seen from Tables 1 and 2, the use of each ecute various computational algorithms (provided by of the formats is a trade-off between processing effi- developer). A vertex can receive messages from neigh- ciency and memory consumption. Using CSR allows boring vertices along its incoming edges, and also send 4 messages to other vertices along the outgoing edges. developed with low-level languages. Using DSL in The virtue of the vertex-centric model is the ability conjunction with compilers that support various hard- to naturally parallelize the computations. In this case, ware architectures allows developer not to worry about the processing of each vertex (or, possibly, the vertex porting the program code to more efficient and high- group) will be assigned to a separate computational performance platforms. Finally, DSL allows to use process or thread. The implementation of these prin- highly optimized (usually parallelized) procedures that ciples makes it possible to achieve high scalability of allow you to efficiently process large graphs (for ex- computing. ample, the procedure for simultaneously updating the However, such a model proves to be convenient not state of all vertices in a graph, etc.). for all algorithms. For example, the page-rank algo- The drawbacks of DSL include the inability to use rithm [29] or the label-propagation algorithm [30] nat- the programs written with it together with other code urally fits in this model, in contrast to, for example, (written in C++, for example). It is allowed to use complex algorithms for spectral graph analysis. programs translated from DSL as separate ready-made The most famous (and the very first at the same mo- computational modules, but this option may not al- ment) implementation of this paradigm is Pregel [31], ways be convenient for the developers and domain sci- developed by Google. The computations in this sys- entists. tem consist of several steps. In the first step, the One of the most popular DSL for development of graph is initialized and the initial values is assigned high-performance parallel graph algorithms is Green- to all vertices. After initialization, a series of super- Marl [32,33]. Green-Marl allows to calculate the scalar steps separated by synchronization steps are executed invariant for each element in the graph, as well as (the Pregel computation model follows the BSP [10] the property of each element in the graph (for in- paradigm). At each step, every vertex executes its stance, the centrality metric [34]) or extract a specific program, specified by the developer, using the data subgraph from the existing graph. Using this DSL, available to it, as well as data obtained from other the developer has the ability to operate the appara- vertices (by incoming edges). After finishing the cal- tus of graph theory, while receiving, after compila- culations, the vertex can send data to its neighbors (by tion, a well-parallelized code. For example, a paral- outgoing edges). One important property of Pregel is lel search of all vertices in the graph happens accord- that only “active” nodes can compute and send data. ing its breadth-first order (special traversal operations Vertex can become “active” when received messages InBFS and InRBFS are used for this purpose). Also, from other vertices (the vertex state machine is shown Green-Marl has special containers for storing sets of in Figure 3). vertices: Set, Order and Sequence, which differs in var- Vote to halt ious types of access to the elements and are designed for parallel and/or sequential access to its elements. The use of such containers greatly simplifies the devel- Active Inactive opment of some algorithms (for example, the Dijkstra algorithm [35] or ∆-stepping [36]). An example of an implementation of the algorithm using Green-Marl is Message received shown in Figure 4. Figure 3: Pregel vertex state machine 4.3 Parallel Processing Primitives Parallel processing primitives serves as a small “build- 4.2 Domain-Specific Languages ing blocks” of more complex graph algorithms. Thus, Domain-Specific Languages (DSL) is a special pro- different algorithms may consist of combinations of the gramming language intended for applications in a spe- same set of primitives. cific domain area. DSL, as a rule, contains domain- The development of complex algorithms on graphs specific expressions and constructs. The program is a big problem, coupled with a number of difficulties. developed using the DSL is translated by the com- Debugging is one such challenge. For example, the piler into the target language (for example, C/C++, correctness of the execution of some community de- CUDA, etc.). tection algorithms [30] (especially when working with Advantages of using DSL to develop algorithms on real data) is hard to control, because there is no clear graphs is increasing the productivity of development. criteria for the correctness of the detection of com- DSL allows to express algorithmic ideas using terms of munities (except some theoretical metrics like modu- the domain area, which usually requires significantly larity, that may not always be suitable for real-world less programming code compared to similar programs applications). This complexity is further intensified 5 Procedure Compute_BC ( G : Graph , which they are built. Some implementations of such BC : Node_Prop < Float >( G )) { primitives will be described below. G . BC =0; // initialize BC The PowerGraph [37] system is based on the Foreach ( s : G . Nodes ) { Gather-Apply-Scatter (GAS) model and includes three // define temporary properties types of primitives: Node_Prop < Float >( G ) Sigma ; Node_Prop < Float >( G ) Delta ; • Gather – each vertex v collects data about its ad- // Init . Sigma for root jacent vertices and edges; s . Sigma = 1; // Traverse graph • Apply – the value of each vertex v is updated // in BFS - order from s (taking into account previously collected data in InBFS ( v : G . Nodes From s )( v != s ) { the Gather phase); // sum over BFS - parents • Scatter – The new value of the vertex v distributes v . Sigma = Sum ( w : v . UpNbrs ){ w . Sigma }; to adjacent vertices. } // Traverse graph By combining the above primitives, the developer has // in reverse BFS - order the opportunity to create various set of graph algo- InRBFS ( v != s ) { rithms. // sum over BFS - children v . Delta = Sum ( w : v . DownNbrs ) { The GAS model is not the only possible way of ex- v . Sigma / w . Sigma * (1+ w . Delta ) pressing “building blocks” of the complex graph algo- }; rithms. Another well-known example is the Advance- v . BC += v . Delta @s ; // accumulate BC Filter-Compute model introduced in the Gunrock – } } } system for developing algorithms on graphs using GPGPU accelerators [38]. This model is based on Figure 4: Betweenness centrality algorithm described modification of the vertex frontiers during execution in Green-Marl of each iteration of the algorithm and includes the fol- lowing primitives: when it comes to parallel processing of graphs with a large number of vertices and edges. When using • Advance – obtaining a new vertex frontier by special-purpose graph processing primitives, the devel- passing along adjacent edges from the current ver- oper only needs to be sure of the correctness of each tex frontier; primitive operation separately. It is necessary to con- • Filter – obtaining a new vertex frontier by select- trol only the high-level description of the algorithm ing some subset of vertices from the current vertex when using primitives. frontier; Another difficulty is porting the code to various • Compute – obtaining a new vertex frontier by exe- parallel architectures. Writing efficient code for com- cuting a procedure defined by the developer, that putational accelerators (GPGPU or MIC) or moving applied to the current vertex frontier. from a shared memory model to a distributed memory usually requires a lot of efforts. Considering the fact Primitives are also used in other parallel graph pro- that every 3 to 5 years there are significant changes cessing systems, such as MapGraph [39], HelP [40], in architectures and programming models, the work GraphPad [41], GraphBLAS [42], etc. of porting a large number of different algorithms can continue “infinitely”. The use of primitives imple- mented for different architectures makes it possible to 5 Conclusion abstract from this problem and implement only prim- itives themselves for each architecture. This paper gives an overview of the main concepts The advantage of using processing primitives (shift to parallel computing, using complex data struc- against using DSL is the ability to use primitives along tures and new computational models) that are used in with other code in the program. Using DSL, on the modern graph processing systems. Using these con- contrary, forces the developer to work only in a special cepts allows: development environment. Primitives are used in a variety of graph process- • Simplify development of novel graph algorithms; ing systems. However, there is no single standard for • Use different computer architectures, without the use of primitives for parallel processing of graphs changing the program code; at the moment. Therefore, in many systems the set • Speedup the execution of complex graph algo- of these primitives varies, as well as the principles by rithms. 6 Acknowledgment ocs/index.php/ICWSM/09/paper/view/154 (ac- cessed: 20.10.2017). The research was supported by the RFBR under the project no. 16-37-00203 mol a and the Ministry of Ed- [9] G. Csardi and T. Nepusz, “The igraph software ucation and Science of the Russian Federation Agree- package for complex network research,” Inter- ment no. 02.A03.21.0006. Journal, vol. Complex Systems, p. 1695, 2006. https://pdfs.semanticscholar.org/1d27/ References 44b83519657f5f2610698a8ddd177ced4f5c.pdf [1] E. Otte and R. Rousseau, “Social network analy- (accessed: 20.10.2017). sis: a powerful strategy, also for the information sciences,” Journal of Information Science, vol. 28, [10] L. G. Valiant, “A bridging model for parallel com- no. 6, pp. 441–453, 2002. putation,” Commun. ACM, vol. 33, pp. 103–111, Aug. 1990. http://web.mit.edu/6.976/www/ [2] M. Grandjean, “A social network analysis of twit- handout/valiant2.pdf (accessed: 20.10.2017). ter: Mapping the digital humanities commu- nity,” Cogent Arts & Humanities, vol. 3, no. 1, [11] The Boost Graph Library: User Guide and p. 1171458, 2016. Reference Manual. Boston, MA, USA: Addison- Wesley Longman Publishing Co., Inc., 2002. [3] D. Ustalov, A. Panchenko, and C. Biemann, https://markqiu.files.wordpress.com/ “Watset: Automatic Induction of Synsets from 2009/12/boost-graph-library.pdf (accessed: a Graph of Synonyms,” in Proceedings of the 55th 20.10.2017). Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), [12] D. Gregor and A. Lumsdaine, “The parallel (Vancouver, Canada), pp. 1579–1590, Associa- bgl: A generic library for distributed graph tion for Computational Linguistics, 2017. http: computations,” in In Parallel Object-Oriented //aclweb.org/anthology/P17-1145 (accessed: Scientific Computing (POOSC, 2005. http:// 20.10.2017). citeseerx.ist.psu.edu/viewdoc/download? [4] E. Bullmore and O. Sporns, “Complex brain net- doi=10.1.1.84.2137&rep=rep1&type=pdf works: Graph theoretical analysis of structural (accessed: 20.10.2017). and functional systems,” vol. 10, pp. 186– 98, 03 2009. https://www.nature.com/nrn/ [13] A. Geist, W. Gropp, S. Huss-Lederman, A. Lums- journal/v10/n3/pdf/nrn2575.pdf (accessed: daine, E. Lusk, W. Saphir, T. Skjellum, and 20.10.2017). M. Snir, MPI-2: Extending the message-passing interface, pp. 128–135. Berlin, Heidelberg: [5] A. Horn, D. Ostwald, M. Reisert, and F. Blanken- Springer Berlin Heidelberg, 1996. http:// burg, “The structural–functional connectome and citeseerx.ist.psu.edu/viewdoc/download? the default mode network of the human brain,” doi=10.1.1.48.2841&rep=rep1&type=pdf NeuroImage, vol. 102, no. Part 1, pp. 142 – (accessed: 20.10.2017). 151, 2014. https://www.sciencedirect.com/ science/article/pii/S1053811913010057 (ac- [14] G. Elidan, I. McGraw, and D. Koller, “Residual cessed: 20.10.2017). belief propagation: Informed scheduling for asyn- chronous message passing,” in Proceedings of the [6] M. G. Bell et al., Transportation network analysis. Twenty-Second Conference on Uncertainty in Ar- Wiley Online Library, 1997. tificial Intelligence, UAI’06, (Arlington, Virginia, [7] A. A. Hagberg, D. A. Schult, and P. J. United States), pp. 165–173, AUAI Press, 2006. Swart, “Exploring network structure, dynam- https://ai.stanford.edu/~koller/Papers/ ics, and function using NetworkX,” in Pro- Elidan+al:UAI06.pdf (accessed: 20.10.2017). ceedings of the 7th Python in Science Con- ference (SciPy2008), (Pasadena, CA USA), [15] A. Kyrola, G. Blelloch, and C. Guestrin, pp. 11–15, Aug. 2008. http://aric.hagberg. “Graphchi: Large-scale graph computation on org/papers/hagberg-2008-exploring.pdf (ac- just a pc,” in Proceedings of the 10th USENIX cessed: 20.10.2017). Conference on Operating Systems Design and Im- plementation, OSDI’12, (Berkeley, CA, USA), [8] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: pp. 31–46, USENIX Association, 2012. https:// An open source software for exploring and manip- www.cs.cmu.edu/~guyb/papers/KBG12.pdf (ac- ulating networks,” 2009. http://www.aaai.org/ cessed: 20.10.2017). 7 [16] F. Khorasani, K. Vora, R. Gupta, and L. N. [24] M. Girvan and M. E. J. Newman, “Com- Bhuyan, “Cusha: Vertex-centric graph process- munity structure in social and biological net- ing on gpus,” in Proceedings of the 23rd Inter- works,” Proceedings of the National Academy national Symposium on High-performance Paral- of Sciences, vol. 99, no. 12, pp. 7821–7826, lel and Distributed Computing, HPDC ’14, (New 2002. http://www.pnas.org/content/99/12/ York, NY, USA), pp. 239–252, ACM, 2014. 7821.full.pdf (accessed: 20.10.2017). [17] J. Kepner and J. Gilbert, Graph Algorithms in the [25] D. Ediger, R. McColl, J. Riedy, and D. A. Language of Linear Algebra. 2011. Bader, “Stinger: High performance data struc- ture for streaming graphs,” in 2012 IEEE Con- [18] A. Lugowski, D. Alber, A. Buluç, J. R. Gilbert, ference on High Performance Extreme Comput- S. Reinhardt, Y. Teng, and A. Waranis, A Flexible ing, pp. 1–5, Sept 2012. http://ieee-hpec.org/ Open-Source Toolbox for Scalable Complex Graph 2012/index_htm_files/ediger.pdf (accessed: Analysis, pp. 930–941. http://epubs.siam. 20.10.2017). org/doi/pdf/10.1137/1.9781611972825.80 (accessed: 20.10.2017). [26] D. A. Bader, J. Berry, A. Amos-Binks, C. Hast- ings, K. Madduri, and S. C. Poulos, “Stinger: [19] D. Ediger, K. Jiang, E. J. Riedy, and D. A. Bader, Spatio-temporal interaction networks and “Graphct: Multithreaded algorithms for massive graphs (sting) extensible representation,” 2009. graph analysis,” IEEE Transactions on Parallel https://pdfs.semanticscholar.org/6992/ and Distributed Systems, vol. 24, pp. 2220– 7a3b9fc25e655ce662c03deb1e9d2832585c.pdf 2229, Nov 2013. https://www.cc.gatech. (accessed: 20.10.2017). edu/~bader/papers/GraphCT-TPDS2013.pdf [27] R. S. Xin, J. E. Gonzalez, M. J. Franklin, (accessed: 20.10.2017). and I. Stoica, “Graphx: A resilient distributed graph system on spark,” in First International [20] D. Merrill, M. Garland, and A. Grimshaw, “Scal- Workshop on Graph Data Management Ex- able gpu graph traversal,” in Proceedings of the periences and Systems, GRADES ’13, (New 17th ACM SIGPLAN Symposium on Principles York, NY, USA), pp. 2:1–2:6, ACM, 2013. and Practice of Parallel Programming, PPoPP http://www.istc-cc.cmu.edu/publications/ ’12, (New York, NY, USA), pp. 117–128, ACM, papers/2013/grades-graphx_with_fonts.pdf 2012. http://research.nvidia.com/sites/ (accessed: 20.10.2017). default/files/pubs/2012-02_Scalable- GPU-Graph/ppo213s-merrill.pdf (accessed: [28] M. Zaharia, M. Chowdhury, T. Das, A. Dave, 20.10.2017). J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: [21] J. Chhugani, N. Satish, C. Kim, J. Sewall, and A fault-tolerant abstraction for in-memory P. Dubey, “Fast and efficient graph traversal al- cluster computing,” in Proceedings of the 9th gorithm for cpus: Maximizing single-node effi- USENIX Conference on Networked Systems ciency,” in 2012 IEEE 26th International Parallel Design and Implementation, NSDI’12, (Berkeley, and Distributed Processing Symposium, pp. 378– CA, USA), pp. 2–2, USENIX Association, 2012. 389, May 2012. https://www.usenix.org/system/files/ conference/nsdi12/nsdi12-final138.pdf [22] J. Edmonds and R. M. Karp, “Theoretical (accessed: 20.10.2017). improvements in algorithmic efficiency for network flow problems,” J. ACM, vol. 19, [29] S. Brin and L. Page, “The anatomy of a pp. 248–264, Apr. 1972. http://citeseerx. large-scale hypertextual web search en- ist.psu.edu/viewdoc/download?doi=10.1. gine,” in Seventh International World- 1.610.4784&rep=rep1&type=pdf (accessed: Wide Web Conference (WWW 1998), 20.10.2017). 1998. http://ilpubs.stanford.edu: 8090/361/1/1998-8.pdf (accessed: 20.10.2017). [23] A. V. Goldberg and R. E. Tarjan, “A new ap- proach to the maximum-flow problem,” J. ACM, [30] N. Raghavan, R. Albert, and S. Kumara, “Near vol. 35, pp. 921–940, Oct. 1988. https://www. linear time algorithm to detect community struc- cs.princeton.edu/courses/archive/fall03/ tures in large-scale networks,” vol. 76, p. 036106, cs528/handouts/a%20new%20approach.pdf 10 2007. https://arxiv.org/pdf/0709.2938. (accessed: 20.10.2017). pdf (accessed: 20.10.2017). 8 [31] G. Malewicz, M. H. Austern, A. J. Bik, in Proceedings of the 10th USENIX Con- J. C. Dehnert, I. Horn, N. Leiser, and ference on Operating Systems Design and G. Czajkowski, “Pregel: A system for large- Implementation, OSDI’12, (Berkeley, CA, scale graph processing,” in Proceedings of USA), pp. 17–30, USENIX Association, 2012. the 2010 ACM SIGMOD International Confer- https://www.usenix.org/system/files/ ence on Management of Data, SIGMOD ’10, conference/osdi12/osdi12-final-167.pdf (New York, NY, USA), pp. 135–146, ACM, (accessed: 20.10.2017). 2010. https://kowshik.github.io/JPregel/ pregel_paper.pdf (accessed: 20.10.2017). [38] Y. Wang, A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens, “Gunrock: A high-performance [32] S. Hong, H. Chafi, E. Sedlar, and K. Olukotun, graph processing library on the gpu,” SIG- “Green-marl: A dsl for easy and efficient graph PLAN Not., vol. 51, pp. 11:1–11:12, Feb. 2016. analysis,” in Proceedings of the Seventeenth https://arxiv.org/pdf/1501.05387.pdf (ac- International Conference on Architectural Sup- cessed: 20.10.2017). port for Programming Languages and Operating [39] Z. Fu, M. Personick, and B. Thompson, “Map- Systems, ASPLOS XVII, (New York, NY, USA), graph: A high level api for fast development of pp. 349–362, ACM, 2012. http://citeseerx. high performance graph analytics on gpus,” in ist.psu.edu/viewdoc/download?doi=10.1. Proceedings of Workshop on GRAph Data Man- 1.220.1796&rep=rep1&type=pdf (accessed: agement Experiences and Systems, GRADES’14, 20.10.2017). (New York, NY, USA), pp. 2:1–2:6, ACM, 2014. [33] S. Hong, S. Salihoglu, J. Widom, and K. Oluko- https://pdfs.semanticscholar.org/3ebf/ tun, “Simplifying scalable graph process- 3857a60c3e224284bbbe6c7127d0a12c546d. ing with a domain-specific language,” in pdf?_ga=2.86211930.371173678.1508511132- Proceedings of Annual IEEE/ACM Inter- 1908779532.1478770815 (accessed: 20.10.2017). national Symposium on Code Generation [40] S. Salihoglu and J. Widom, “Help: High-level and Optimization, CGO ’14, (New York, primitives for large-scale graph processing,” in NY, USA), pp. 208:208–208:218, ACM, 2014. Proceedings of Workshop on GRAph Data Man- https://pdfs.semanticscholar.org/2d8b/ agement Experiences and Systems, GRADES’14, e5e1b88ac9919984b9369f7045fbb0af0d08.pdf (New York, NY, USA), pp. 3:1–3:6, ACM, 2014. (accessed: 20.10.2017). http://ilpubs.stanford.edu:8090/1085/2/ primitives_tr_sig_alternate.pdf (accessed: [34] U. Brandes, “A faster algorithm for betweenness 20.10.2017). centrality,” Journal of Mathematical Sociology, vol. 25, pp. 163–177, 2001. http://citeseerx. [41] M. J. Anderson, N. Sundaram, N. Satish, ist.psu.edu/viewdoc/download?doi=10.1. M. M. A. Patwary, T. L. Willke, and P. Dubey, 1.11.2024&rep=rep1&type=pdf (accessed: “Graphpad: Optimized graph primitives for par- 20.10.2017). allel and distributed platforms,” in 2016 IEEE International Parallel and Distributed Processing [35] E. W. Dijkstra, “A note on two problems in Symposium (IPDPS), pp. 313–322, May 2016. connexion with graphs,” Numer. Math., vol. 1, pp. 269–271, Dec. 1959. [42] J. Kepner, P. Aaltonen, D. Bader, A. Buluç, F. Franchetti, J. Gilbert, D. Hutchison, M. Ku- [36] U. Meyer and P. Sanders, “∆-stepping: A paral- mar, A. Lumsdaine, H. Meyerhenke, S. McMil- lelizable shortest path algorithm,” J. Algorithms, lan, C. Yang, J. D. Owens, M. Zalewski, T. Matt- vol. 49, pp. 114–152, Oct. 2003. https: son, and J. Moreira, “Mathematical foundations //ac.els-cdn.com/S0196677403000762/ of the graphblas,” in 2016 IEEE High Perfor- 1-s2.0-S0196677403000762-main. mance Extreme Computing Conference (HPEC), pdf?_tid=a3d2287c-b5a5-11e7-b112- pp. 1–9, Sept 2016. https://arxiv.org/pdf/ 00000aacb362&acdnat=1508511048_ 1606.05790.pdf (accessed: 20.10.2017). d998da6a4ec35491c44e069c0b48b756 (accessed: 20.10.2017). [37] J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin, “Powergraph: Distributed graph-parallel computation on natural graphs,” 9