=Paper= {{Paper |id=Vol-1981/paper3 |storemode=property |title=Graph Processing: Main Concepts and Systems |pdfUrl=https://ceur-ws.org/Vol-1981/paper3.pdf |volume=Vol-1981 |authors=Mikhail Chernoskutov }} ==Graph Processing: Main Concepts and Systems== https://ceur-ws.org/Vol-1981/paper3.pdf
          Graph Processing: Main Concepts and Systems

                                          Mikhail Chernoskutov
                           Krasovskii Institute of Mathematics and Mechanics,
                                         Ural Federal University
                                          Yekaterinburg, Russia
                                            mach@imm.uran.ru



                      Abstract                                   fore, the structural or functional connections between
                                                                 vertices are represented as edges [4, 5]. Another typ-
    The development of new processing and stor-                  ical use case of graphs is the modeling of road net-
    age facilities leads to an increase in the size              works, where intersections are vertices, and roads are
    of the problems to be solved. One of such                    edges [6]. Graphs are also used in bioinformatics, com-
    problems is graph processing, which charac-                  puter security, data analysis and other fields of science,
    terized by their non-determinism and an ir-                  industry and business.
    regular memory access pattern, that greatly                     The huge volume of accumulated data led to the
    complicates its development and debugging.                   growth of graphs that need to be investigated. This led
    In this paper, we describe the basic concepts                to the emergence of the network science, dealing with
    that make it possible to simplify the develop-               the study of the features of the interaction between
    ment of graph algorithms and its porting to                  many objects. The number of algorithms necessary for
    various computer architectures. In addition,                 domain scientists is also growing rapidly. As a result,
    a description of the most well-known graph                   various specialized tools for graph analysis appear on
    processing systems that implement these con-                 the market.
    cepts is given.                                                 Development and debugging of graph algorithms is
    Keywords: high performance computing,                        a non-trivial task because of the non-determinism of
    graph algorithms, graph processing systems,                  many such algorithms. The use of parallel computa-
    parallel processing                                          tions as a tool for accelerating computation further
                                                                 complicates this task. Therefore, there are many dif-
                                                                 ferent methods and technologies that make it possible
1    Introduction
                                                                 to simplify the development of graph algorithms and
A graph is a mathematical abstraction for the rep-               simplify the work of various domain scientists. Below
resentation of objects and the connections between               are the main concepts described in the paper:
them. The graph G = (V, E) consists of the set of
vertices V with the number of elements n and the set               • Shift to parallel processing in order to speedup
of edges E with the number of elements m. Using the                  processing of big graphs (allows to faster process
graphs, various objects of the real world can be repre-              graphs with big size);
sented. For instance, in the study of social networks it           • New data structures for storing graphs, allowing
is convenient to represent individual users as vertices,             faster and more convenient graph mutation and
and links between them as edges [1, 2]. Graphs are                   navigation through the graph (allows to operate
actively used in the processing of natural languages                 with graphs in more efficient and convenient way);
                                                                   • Development of new models for describing graph
for the word sense induction problem [3]. We can
                                                                     algorithms, which makes it possible to effectively
also model the work of the brain, denoting individ-
                                                                     use the parallelism embedded in the algorithms
ual neurons (or brain areas) as vertices, and, there-
                                                                     and effectively map it to modern computing
Copyright c by the paper’s authors. Copying permitted for            architectures (allows to construct architecture-
private and academic purposes.                                       independent algorithms with more productivity
In: V. Voevodin, A. Simonov (eds.): Proceedings of the               for the developer).
GraphHPC-2017 Conference, Moscow State University, Russia,
02-03-2017, published at http://ceur-ws.org.                       In this paper, we describe the above technologies




                                                             1
used in various existing graph processing systems. The         supported algorithms. Parallel BGL is capable of pro-
work is structured as follows. The Section 2 provides          cessing graphs consisting of billions of vertices and al-
a description of the various types of parallelism that         lows developer to parallelize calculations on hundreds
are used to speed up the processing of graphs. The             of computational processes by using MPI [13].
Section 3 describes the data structures that are used
to store graphs, together with its strengths and weak-         2.3       Asynchronous Parallel Processing
nesses. Section 4 is devoted to the description of var-        Asynchronous parallel processing of graphs has not
ious models of processing graphs. We concludes with            yet become as widespread and ubiquitous as the BSP
final remarks in Section 5.                                    model, but in some cases it allows to execute some
                                                               algorithms (such as belief propagation [14]) more effi-
2     Shift to Parallel Processing                             ciently due to it faster convergence.
2.1    Serial Processing                                          Asynchronous execution assumes an independent
                                                               operation of each parallel thread or process from all
Sequential execution of graph algorithms is still very         other threads or processes. At the same time, when
common. Sequential processing can be used for proto-           working with shared memory, such undesirable effects
typing of new algorithms and can be one of the steps to        as data race can arise. This forces developers to use
building highly efficient parallel algorithms that allow       duplicate copies of graph elements, as well as use dif-
to process graphs with billions of vertices.                   ferent synchronization primitives.
   Up to now, graph processing systems based on se-               GraphChi [15] is one of the systems that provide the
rial execution of algorithms, such as NetworkX [7],            user with parallel asynchronous processing of graphs.
Gephi [8] and igraph [9], are still very popular and           Asynchronous processing is achieved by using the con-
ubiquitous. The popularity of these systems is deter-          cept of Parallel Sliding Windows, which involves ex-
mined by a large set of algorithms and a lot of tools          tracting and updating the state of a small number of
for visualizing the output of the graph algorithms.            vertices and edges that are simultaneously extracted
These systems are actively used in the complex net-            from the PC’s memory (SSD or hard drive). A simi-
works analysis.                                                lar concept is implemented in the CuSha system [16]
                                                               and allows to achieve asynchronous processing of the
2.2    Synchronous Parallel Processing                         graphs using GPGPU accelerators.
Sequential algorithms are not always suitable for solv-
                                                               3         Various Graph Formats
ing real-world problems. With the rapid growth of
the data size, the size of the processed graphs is also        3.1       Adjacency and Incidence Matrices
growing. Obvious way to accelerate graph processing
                                                               The matrix is a natural mathematical representation
(in addition to developing more efficient algorithms) is
                                                               of the graph. The graph G = (V, E) consisting of n
the parallelization of existing algorithms on graphs.
                                                               vertices in this case can be described by the matrix
   The most common parallel processing model for               n × n of the form A = (aij ), where:
graph computations is the bulk synchronous model
(BSP) [10]. Another name for algorithms that work in                                   (
                                                                                         1, eij ∈ E
accordance with the BSP model is level-synchronized                              aij =                          (1)
algorithms. Within the BSP model, the N + 1 itera-                                       0, eij ∈
                                                                                                /E
tion will be executed only after the N iteration will be
                                                                  An example of an adjacency matrix of a directed
completed. This model can be implemented by using
                                                               graph is shown in Figure 1.
two nested for loops – one for all vertices, and the
other one for all adjacent edges of each vertex.                                   0

   Drawback of the concept of synchronous-parallel                                                     0    1   0   0   0   0   0
                                                                                       1               0    0   0   0   0   0   0
processing connected with overheads arising in some                                                    0
                                                                                                       
                                                                                                             1   0   1   0   1   1
                                                                                                                                  
                                                                               2                   A = 0    0   0   0   0   1   0
algorithms because of the processing of inactive (not
                                                                                                                                 
                                                                                                       0    0   0   0   0   0   0
                                                                                                                                 
                                                                     3                                 0    0   0   0   0   0   1
affecting the algorithm output) vertices at each itera-                                        4         0   0   0   0   1   0   0
tion of the algorithm.                                                     5
                                                                                           6
   Among the graph processing systems based on the
BSP model, the Boost Graph Library (BGL) [11],
                                                               Figure 1: Representation of the directed graph (left)
as well as its parallel implementation called Parallel
                                                               in the form of an adjacency matrix (right)
BGL [12], are the most widely used. These systems
implement the concept of adjacency lists (with ability           If the graph is non-directed, then the corresponding
to modify the graph topology) and have quite a lot of          adjacency matrix is symmetric; aij = aji . If the graph




                                                           2
is weighted, then the elements of the adjacency matrix                          the use of matrices “as-is” to handle large graphs is
take the following form:                                                        unacceptable for many real-world applications.
                           (
                                wij , eij ∈ E                                   3.2   Compressed Lists
                   aij =                                              (2)
                                0,    eij ∈
                                          /E                                    Another one popular graph storage format is adja-
                                                                                cency lists. The adjacency lists allow the graph to
where wij is weight of the edge eij . Another way                               be stored in the form of few linear arrays.
to represent weighted graph is to add supplementary                                The adjacency lists allow storing only existing graph
weight matrix to adjacency matrix (with a restriction                           elements (non-zero elements in the adjacency matrix),
on zero weight edges).                                                          which allows to achieve a linear dependence between
   Another way of describing the graph is the incidence                         the amount of consumed memory and the number of
matrix. The incidence matrix contains data about all                            vertices and edges in the graph. The Compressed
edges in the rows, and the columns are used to repre-                           Sparse Rows (CSR) format, which implements the con-
sent the vertices. Matrix has the form B = bij and size                         cept of contiguity lists, is one of the most popular and
m × n, where m is the number of edges in the graph.                             useful, and consists of two arrays:
Due to the fact that in this case each row must simul-                            • row pointers – contains the offsets in the rows
taneously contain data on both the beginning and the                                of the corresponding adjacency matrix;
end of corresponding edge, it is most convenient to rep-                          • column ids – contains data about the end of each
resent graph with two matrices: one serves to store the                             edge.
data about the beginning of the edges, and the other –
about the ends (it works only for directed graphs, but                          From i to (i + 1) element of the row pointers array
edges of undirected graphs can be represented as cou-                           there are the ranges of vertex numbers in the array
ples of directed edges with opposite directions). The                           column ids, which contains outgoing edges incident
incidence matrix of the graph depicted in Figure 1 is                           to i vertex. Below is a representation of the above ar-
shown in Figure 2.                                                              rays for the graph from Figure 1:
                                                                                row pointers = [0, 1, 1, 5, 6, 6, 7, 8]
             Bfrom =                               Bto =
  1    0   0   0  0   0   0
                                     0    1   0     0   0   0   0             column ids = [1, 1, 3, 5, 6, 5, 6, 4] For in-
  0    0   1   0  0   0   0        0    1   0     0   0   0   0
  0
  
        0   1   0  0   0   0
                            
                                     0
                                     
                                           0   0     1   0   0   0
                                                                               stance, vertex 2 has 4 edges (row pointers[3] - row
  0    0   1   0  0   0   0        0    0   0     0   0   1   0
                                                               
  0
  
        0   1   0  0   0   0
                            
                                     0
                                     
                                           0   0     0   0   0   1
                                                                               pointers[2] = 5 - 1 = 4). Edges can be recon-
  0    0   0   1  0   0   0        0    0   0     0   0   1   0
                                                               
  0    0   0   0  0   1   0
                                     0    0   0     0   0   0   1             structed by looking at column ids array from position
    0   0   0   0  0   0   1           0   0   0     0   1   0   0
                                                                                column ids[1] to column ids[5] (not included). It
Figure 2: The incidence matrix for the beginnings                               stores numbers of vertices, that incident to opposite
(left) and the ends (right) of the edges                                        ends of edges outgoing from each vertex in the graph.
                                                                                In the example above, vertex 2 has outgoing edges to
    Data structures represented by adjacency and in-                            vertices 1, 3, 5, 6.
cidence matrices allows developer to quickly find the                              The traversal of the vertices and edges of the graph
necessary elements in the graph, as well as add new                             in this case is performed with two nested loops: the
elements to the graph. In the case of an adjacency                              outer loop reads offsets from the row pointers array
matrix, the addition or removal of new edges to the                             to obtain information about the number of edges out-
graph can be performed by changing the values of the                            going from each vertex, the inner loop reads column
necessary matrix elements from 0 to 1 and vice versa.                           ids array for following processing neighbors of each
The incidence matrix makes it possible to deal not only                         vertex in the graph.
with general graphs, but also with multigraphs (hav-                               The main advantage of the CSR format is the com-
ing parallel edges), and also with hypergraphs (having                          pactness of the graph representation, which makes it
edges incident to any number of vertices). This im-                             convenient for processing large graphs. For instance,
provement is achieved by describing each edge with                              the CSR format is actively used in the Knowledge Dis-
two separate lines in different matrices: one line in                           covery Toolbox (KDT) [18] and GraphCT [19] systems
first matrix for the beginning of hyperedge and one                             to store and process large graphs obtained from analy-
line in second matrix for the end of hyperedge.                                 sis of biological data and social networks. In the KDT
    However, these graph storage formats have one sig-                          system, the CSR format is used for compact storage
nificant drawback – the amount of memory required to                            the incidence matrices of the graphs. In the GraphCT
store matrices is proportional to n × n for adjacency                           system, on the other hand, CSR is used to store adja-
and to n × m for incident matrix [17]. For instance, an                         cency matrices.
integer adjacency matrix for a graph of 1 000 000 ver-                             However, the CSR format is only suitable for pro-
tices will occupy more than 3.7 TB of memory. Thus,                             cessing static graphs. Processing of dynamic graphs




                                                                            3
with the CSR format is extremely inefficient, and re-         developer to store graphs with a lot of vertices and
quires a complete rebuild of the entire data structure        edges (light weight operations of vertex and eage ad-
with every graph mutation. In addition, the parallel          dition and deletion), but badly suited for graph topol-
processing of graphs with skewed degree distribution is       ogy modification (see vertex and edge deletion oper-
burdened with additional overhead costs in the form of        ations) and navigation through the graph (it is hard
computational workload imbalance amongst computa-             to obtain list of ingoing edges to some random ver-
tional processes or threads [20, 21].                         tex). Opposite, storing graph as matrix allows de-
                                                              veloper to carry out fast graph topology modifications
3.3   Custom Formats                                          (with complexity O(1) and O(N )) as well as navigating
Typically, custom formats designed to simplify devel-         through the graph (checking random vertex and edge
opment of some complex graph algorithms. For in-              state with complexity O(1)), but matrix data struc-
stance, some maximum flow algorithms (Edmonds-                ture suffers from non-linear memory complexity (need
Carp algorithm [22] and push-relabel algorithm [23]),         O(N 2 ) space for graph with N vertices).
uses a special “residual conductance” network, which              At the moment, there is no standard format for stor-
is repeatedly rebuilt during the execution of the algo-       ing graphs, which would allow to efficiently modify the
rithm. Another example is the Girvan-Newman com-              graph, access its individual elements in linear time,
munity detection algorithm [24]. The main idea of the         and store only significant elements of the graph, and
algorithm is to isolate communities by removing inter-        allow parallel processing at the same time. However,
community edges from the graph.                               a number attempts are being made to develop such a
   As seen, the above algorithms use not only the             format. For instance, the extended storage function-
traversal procedure through all vertices and edges            ality and parallel processing of dynamic graphs is pro-
(which can be efficiently performed using the CSR for-        vided by the data structure called STINGER [25, 26].
mat), but also the procedures of searching for indi-          This data structure is based on linked lists consisting
vidual graph elements, as well as modifying the graph         of blocks of graph elements. Each block is a data stor-
topology. The detailed list of most commonly used op-         age for an individual subset of vertices or edges. Each
erations is presented in Table 1. Table 2 shows memory        block contains special metadata about the elements
complexity for various operations.                            stored inside it (for example, the minimum and maxi-
                                                              mum number of the vertex inside the block). By sepa-
Table 1: Time complexity analysis of graph storing            rating the entire array of vertices or edges into subsets,
formats                                                       and using metadata, the STINGER data structure al-
    Procedure                   Matrix  CSR                   lows to process dynamically changing graphs in paral-
    Vertex addition             O(N )   O(1)                  lel. Another example of custom graph format is Res-
    Vertex deletion             O(N ) O(N 2 )                 ident Distributed Graphs (RDG) from GraphX [27],
    Edge addition                O(1)  O(N 2 )                that uses vertex-cut partitioning scheme to minimize
    Edge deletion                O(1)  O(N 2 )                data transfers during graph computations by evenly
    Check whether vertex in      O(1)   O(1)                  assigning edges to computational nodes. Also, RDG
    graph                                                     allows to efficiently view, filter and transform the
    Get weight of (random)       O(1)   O(N )                 graph by operating on three tables (implemented as
    edge                                                      Spark Resilient Distributed Datasets [28]) storing the
    Get list of ingoing neigh- O(N ) O(N 2 )                  RDG: EdgeTable (adjacency structure and edge data),
    bors of vertex                                            VertexDataTable (vertex data) and VertexMap (map-
    Get list of outgoing neigh- O(N )   O(N )                 ping from the id of a vertex to the ids of the virtual
    bors of vertex                                            partitions that contain adjacent edges).

                                                              4     Graph Processing Models
Table 2: Space complexity analysis of graph storing
formats                                                       4.1   Vertex-Centric Paradigm
    Procedure                  Matrix CSR
                                                              According to this paradigm, graph algorithms are exe-
    Vertex addition             O(N ) O(1)
                                                              cuted in “think like a vertex” model. Thus, each indi-
    Vertex deletion             O(N ) O(1)
                                                              vidual vertex is considered as a separate computational
    Edge addition               O(1)     O(1)
                                                              element, dealing with the data available to it (for in-
    Edge deletion               O(1)     O(1)
                                                              stance, internal variables) and having the ability to ex-
   As can be seen from Tables 1 and 2, the use of each        ecute various computational algorithms (provided by
of the formats is a trade-off between processing effi-        developer). A vertex can receive messages from neigh-
ciency and memory consumption. Using CSR allows               boring vertices along its incoming edges, and also send




                                                          4
messages to other vertices along the outgoing edges.            developed with low-level languages. Using DSL in
   The virtue of the vertex-centric model is the ability        conjunction with compilers that support various hard-
to naturally parallelize the computations. In this case,        ware architectures allows developer not to worry about
the processing of each vertex (or, possibly, the vertex         porting the program code to more efficient and high-
group) will be assigned to a separate computational             performance platforms. Finally, DSL allows to use
process or thread. The implementation of these prin-            highly optimized (usually parallelized) procedures that
ciples makes it possible to achieve high scalability of         allow you to efficiently process large graphs (for ex-
computing.                                                      ample, the procedure for simultaneously updating the
   However, such a model proves to be convenient not            state of all vertices in a graph, etc.).
for all algorithms. For example, the page-rank algo-                The drawbacks of DSL include the inability to use
rithm [29] or the label-propagation algorithm [30] nat-         the programs written with it together with other code
urally fits in this model, in contrast to, for example,         (written in C++, for example). It is allowed to use
complex algorithms for spectral graph analysis.                 programs translated from DSL as separate ready-made
   The most famous (and the very first at the same mo-          computational modules, but this option may not al-
ment) implementation of this paradigm is Pregel [31],           ways be convenient for the developers and domain sci-
developed by Google. The computations in this sys-              entists.
tem consist of several steps. In the first step, the                One of the most popular DSL for development of
graph is initialized and the initial values is assigned         high-performance parallel graph algorithms is Green-
to all vertices. After initialization, a series of super-       Marl [32,33]. Green-Marl allows to calculate the scalar
steps separated by synchronization steps are executed           invariant for each element in the graph, as well as
(the Pregel computation model follows the BSP [10]              the property of each element in the graph (for in-
paradigm). At each step, every vertex executes its              stance, the centrality metric [34]) or extract a specific
program, specified by the developer, using the data             subgraph from the existing graph. Using this DSL,
available to it, as well as data obtained from other            the developer has the ability to operate the appara-
vertices (by incoming edges). After finishing the cal-          tus of graph theory, while receiving, after compila-
culations, the vertex can send data to its neighbors (by        tion, a well-parallelized code. For example, a paral-
outgoing edges). One important property of Pregel is            lel search of all vertices in the graph happens accord-
that only “active” nodes can compute and send data.             ing its breadth-first order (special traversal operations
Vertex can become “active” when received messages               InBFS and InRBFS are used for this purpose). Also,
from other vertices (the vertex state machine is shown          Green-Marl has special containers for storing sets of
in Figure 3).                                                   vertices: Set, Order and Sequence, which differs in var-
                       Vote to halt                             ious types of access to the elements and are designed
                                                                for parallel and/or sequential access to its elements.
                                                                The use of such containers greatly simplifies the devel-
              Active                  Inactive                  opment of some algorithms (for example, the Dijkstra
                                                                algorithm [35] or ∆-stepping [36]). An example of an
                                                                implementation of the algorithm using Green-Marl is
                   Message received                             shown in Figure 4.

        Figure 3: Pregel vertex state machine
                                                                4.3   Parallel Processing Primitives
                                                                Parallel processing primitives serves as a small “build-
4.2   Domain-Specific Languages
                                                                ing blocks” of more complex graph algorithms. Thus,
Domain-Specific Languages (DSL) is a special pro-               different algorithms may consist of combinations of the
gramming language intended for applications in a spe-           same set of primitives.
cific domain area. DSL, as a rule, contains domain-                The development of complex algorithms on graphs
specific expressions and constructs. The program                is a big problem, coupled with a number of difficulties.
developed using the DSL is translated by the com-               Debugging is one such challenge. For example, the
piler into the target language (for example, C/C++,             correctness of the execution of some community de-
CUDA, etc.).                                                    tection algorithms [30] (especially when working with
    Advantages of using DSL to develop algorithms on            real data) is hard to control, because there is no clear
graphs is increasing the productivity of development.           criteria for the correctness of the detection of com-
DSL allows to express algorithmic ideas using terms of          munities (except some theoretical metrics like modu-
the domain area, which usually requires significantly           larity, that may not always be suitable for real-world
less programming code compared to similar programs              applications). This complexity is further intensified




                                                            5
Procedure Compute_BC ( G : Graph ,                                which they are built. Some implementations of such
BC : Node_Prop < Float >( G )) {                                  primitives will be described below.
   G . BC =0; // initialize BC                                       The PowerGraph [37] system is based on the
   Foreach ( s : G . Nodes ) {                                    Gather-Apply-Scatter (GAS) model and includes three
       // define temporary properties                             types of primitives:
       Node_Prop < Float >( G ) Sigma ;
       Node_Prop < Float >( G ) Delta ;                               • Gather – each vertex v collects data about its ad-
       // Init . Sigma for root                                         jacent vertices and edges;
       s . Sigma = 1;
       // Traverse graph                                              • Apply – the value of each vertex v is updated
       // in BFS - order from s                                         (taking into account previously collected data in
       InBFS ( v : G . Nodes From s )( v != s ) {                       the Gather phase);
           // sum over BFS - parents                                  • Scatter – The new value of the vertex v distributes
           v . Sigma = Sum ( w : v . UpNbrs ){ w . Sigma };             to adjacent vertices.
       }
       // Traverse graph                                          By combining the above primitives, the developer has
       // in reverse BFS - order                                  the opportunity to create various set of graph algo-
       InRBFS ( v != s ) {
                                                                  rithms.
           // sum over BFS - children
           v . Delta = Sum ( w : v . DownNbrs ) {                    The GAS model is not the only possible way of ex-
           v . Sigma / w . Sigma * (1+ w . Delta )                pressing “building blocks” of the complex graph algo-
           };                                                     rithms. Another well-known example is the Advance-
       v . BC += v . Delta @s ; // accumulate BC                  Filter-Compute model introduced in the Gunrock –
} } }                                                             system for developing algorithms on graphs using
                                                                  GPGPU accelerators [38]. This model is based on
Figure 4: Betweenness centrality algorithm described              modification of the vertex frontiers during execution
in Green-Marl                                                     of each iteration of the algorithm and includes the fol-
                                                                  lowing primitives:
when it comes to parallel processing of graphs with
a large number of vertices and edges. When using                      • Advance – obtaining a new vertex frontier by
special-purpose graph processing primitives, the devel-                 passing along adjacent edges from the current ver-
oper only needs to be sure of the correctness of each                   tex frontier;
primitive operation separately. It is necessary to con-
                                                                      • Filter – obtaining a new vertex frontier by select-
trol only the high-level description of the algorithm
                                                                        ing some subset of vertices from the current vertex
when using primitives.
                                                                        frontier;
    Another difficulty is porting the code to various
                                                                      • Compute – obtaining a new vertex frontier by exe-
parallel architectures. Writing efficient code for com-
                                                                        cuting a procedure defined by the developer, that
putational accelerators (GPGPU or MIC) or moving
                                                                        applied to the current vertex frontier.
from a shared memory model to a distributed memory
usually requires a lot of efforts. Considering the fact
                                                                     Primitives are also used in other parallel graph pro-
that every 3 to 5 years there are significant changes
                                                                  cessing systems, such as MapGraph [39], HelP [40],
in architectures and programming models, the work
                                                                  GraphPad [41], GraphBLAS [42], etc.
of porting a large number of different algorithms can
continue “infinitely”. The use of primitives imple-
mented for different architectures makes it possible to           5     Conclusion
abstract from this problem and implement only prim-
itives themselves for each architecture.                          This paper gives an overview of the main concepts
    The advantage of using processing primitives                  (shift to parallel computing, using complex data struc-
against using DSL is the ability to use primitives along          tures and new computational models) that are used in
with other code in the program. Using DSL, on the                 modern graph processing systems. Using these con-
contrary, forces the developer to work only in a special          cepts allows:
development environment.
    Primitives are used in a variety of graph process-                • Simplify development of novel graph algorithms;
ing systems. However, there is no single standard for                 • Use different computer architectures, without
the use of primitives for parallel processing of graphs                 changing the program code;
at the moment. Therefore, in many systems the set                     • Speedup the execution of complex graph algo-
of these primitives varies, as well as the principles by                rithms.




                                                              6
Acknowledgment                                                     ocs/index.php/ICWSM/09/paper/view/154 (ac-
                                                                   cessed: 20.10.2017).
The research was supported by the RFBR under the
project no. 16-37-00203 mol a and the Ministry of Ed-           [9] G. Csardi and T. Nepusz, “The igraph software
ucation and Science of the Russian Federation Agree-                package for complex network research,” Inter-
ment no. 02.A03.21.0006.                                            Journal, vol. Complex Systems, p. 1695, 2006.
                                                                    https://pdfs.semanticscholar.org/1d27/
References                                                          44b83519657f5f2610698a8ddd177ced4f5c.pdf
 [1] E. Otte and R. Rousseau, “Social network analy-                (accessed: 20.10.2017).
     sis: a powerful strategy, also for the information
     sciences,” Journal of Information Science, vol. 28,       [10] L. G. Valiant, “A bridging model for parallel com-
     no. 6, pp. 441–453, 2002.                                      putation,” Commun. ACM, vol. 33, pp. 103–111,
                                                                    Aug. 1990. http://web.mit.edu/6.976/www/
 [2] M. Grandjean, “A social network analysis of twit-              handout/valiant2.pdf (accessed: 20.10.2017).
     ter: Mapping the digital humanities commu-
     nity,” Cogent Arts & Humanities, vol. 3, no. 1,           [11] The Boost Graph Library: User Guide and
     p. 1171458, 2016.                                              Reference Manual. Boston, MA, USA: Addison-
                                                                    Wesley Longman Publishing Co., Inc., 2002.
 [3] D. Ustalov, A. Panchenko, and C. Biemann,
                                                                    https://markqiu.files.wordpress.com/
     “Watset: Automatic Induction of Synsets from
                                                                    2009/12/boost-graph-library.pdf (accessed:
     a Graph of Synonyms,” in Proceedings of the 55th
                                                                    20.10.2017).
     Annual Meeting of the Association for Compu-
     tational Linguistics (Volume 1: Long Papers),             [12] D. Gregor and A. Lumsdaine, “The parallel
     (Vancouver, Canada), pp. 1579–1590, Associa-                   bgl: A generic library for distributed graph
     tion for Computational Linguistics, 2017. http:                computations,” in In Parallel Object-Oriented
     //aclweb.org/anthology/P17-1145 (accessed:                     Scientific Computing (POOSC, 2005. http://
     20.10.2017).                                                   citeseerx.ist.psu.edu/viewdoc/download?
 [4] E. Bullmore and O. Sporns, “Complex brain net-                 doi=10.1.1.84.2137&rep=rep1&type=pdf
     works: Graph theoretical analysis of structural                (accessed: 20.10.2017).
     and functional systems,” vol. 10, pp. 186–
     98, 03 2009. https://www.nature.com/nrn/                  [13] A. Geist, W. Gropp, S. Huss-Lederman, A. Lums-
     journal/v10/n3/pdf/nrn2575.pdf (accessed:                      daine, E. Lusk, W. Saphir, T. Skjellum, and
     20.10.2017).                                                   M. Snir, MPI-2: Extending the message-passing
                                                                    interface, pp. 128–135.     Berlin, Heidelberg:
 [5] A. Horn, D. Ostwald, M. Reisert, and F. Blanken-               Springer Berlin Heidelberg, 1996.     http://
     burg, “The structural–functional connectome and                citeseerx.ist.psu.edu/viewdoc/download?
     the default mode network of the human brain,”                  doi=10.1.1.48.2841&rep=rep1&type=pdf
     NeuroImage, vol. 102, no. Part 1, pp. 142 –                    (accessed: 20.10.2017).
     151, 2014. https://www.sciencedirect.com/
     science/article/pii/S1053811913010057 (ac-                [14] G. Elidan, I. McGraw, and D. Koller, “Residual
     cessed: 20.10.2017).                                           belief propagation: Informed scheduling for asyn-
                                                                    chronous message passing,” in Proceedings of the
 [6] M. G. Bell et al., Transportation network analysis.            Twenty-Second Conference on Uncertainty in Ar-
     Wiley Online Library, 1997.                                    tificial Intelligence, UAI’06, (Arlington, Virginia,
 [7] A. A. Hagberg, D. A. Schult, and P. J.                         United States), pp. 165–173, AUAI Press, 2006.
     Swart, “Exploring network structure, dynam-                    https://ai.stanford.edu/~koller/Papers/
     ics, and function using NetworkX,” in Pro-                     Elidan+al:UAI06.pdf (accessed: 20.10.2017).
     ceedings of the 7th Python in Science Con-
     ference (SciPy2008), (Pasadena, CA USA),                  [15] A. Kyrola, G. Blelloch, and C. Guestrin,
     pp. 11–15, Aug. 2008. http://aric.hagberg.                     “Graphchi: Large-scale graph computation on
     org/papers/hagberg-2008-exploring.pdf (ac-                     just a pc,” in Proceedings of the 10th USENIX
     cessed: 20.10.2017).                                           Conference on Operating Systems Design and Im-
                                                                    plementation, OSDI’12, (Berkeley, CA, USA),
 [8] M. Bastian, S. Heymann, and M. Jacomy, “Gephi:                 pp. 31–46, USENIX Association, 2012. https://
     An open source software for exploring and manip-               www.cs.cmu.edu/~guyb/papers/KBG12.pdf (ac-
     ulating networks,” 2009. http://www.aaai.org/                  cessed: 20.10.2017).




                                                           7
[16] F. Khorasani, K. Vora, R. Gupta, and L. N.               [24] M. Girvan and M. E. J. Newman, “Com-
     Bhuyan, “Cusha: Vertex-centric graph process-                 munity structure in social and biological net-
     ing on gpus,” in Proceedings of the 23rd Inter-               works,” Proceedings of the National Academy
     national Symposium on High-performance Paral-                 of Sciences, vol. 99, no. 12, pp. 7821–7826,
     lel and Distributed Computing, HPDC ’14, (New                 2002. http://www.pnas.org/content/99/12/
     York, NY, USA), pp. 239–252, ACM, 2014.                       7821.full.pdf (accessed: 20.10.2017).

[17] J. Kepner and J. Gilbert, Graph Algorithms in the        [25] D. Ediger, R. McColl, J. Riedy, and D. A.
     Language of Linear Algebra. 2011.                             Bader, “Stinger: High performance data struc-
                                                                   ture for streaming graphs,” in 2012 IEEE Con-
[18] A. Lugowski, D. Alber, A. Buluç, J. R. Gilbert,              ference on High Performance Extreme Comput-
     S. Reinhardt, Y. Teng, and A. Waranis, A Flexible             ing, pp. 1–5, Sept 2012. http://ieee-hpec.org/
     Open-Source Toolbox for Scalable Complex Graph                2012/index_htm_files/ediger.pdf (accessed:
     Analysis, pp. 930–941. http://epubs.siam.                     20.10.2017).
     org/doi/pdf/10.1137/1.9781611972825.80
     (accessed: 20.10.2017).                                  [26] D. A. Bader, J. Berry, A. Amos-Binks, C. Hast-
                                                                   ings, K. Madduri, and S. C. Poulos, “Stinger:
[19] D. Ediger, K. Jiang, E. J. Riedy, and D. A. Bader,            Spatio-temporal interaction networks and
     “Graphct: Multithreaded algorithms for massive                graphs (sting) extensible representation,” 2009.
     graph analysis,” IEEE Transactions on Parallel                https://pdfs.semanticscholar.org/6992/
     and Distributed Systems, vol. 24, pp. 2220–                   7a3b9fc25e655ce662c03deb1e9d2832585c.pdf
     2229, Nov 2013.        https://www.cc.gatech.                 (accessed: 20.10.2017).
     edu/~bader/papers/GraphCT-TPDS2013.pdf
                                                              [27] R. S. Xin, J. E. Gonzalez, M. J. Franklin,
     (accessed: 20.10.2017).
                                                                   and I. Stoica, “Graphx: A resilient distributed
                                                                   graph system on spark,” in First International
[20] D. Merrill, M. Garland, and A. Grimshaw, “Scal-
                                                                   Workshop on Graph Data Management Ex-
     able gpu graph traversal,” in Proceedings of the
                                                                   periences and Systems, GRADES ’13, (New
     17th ACM SIGPLAN Symposium on Principles
                                                                   York, NY, USA), pp. 2:1–2:6, ACM, 2013.
     and Practice of Parallel Programming, PPoPP
                                                                   http://www.istc-cc.cmu.edu/publications/
     ’12, (New York, NY, USA), pp. 117–128, ACM,
                                                                   papers/2013/grades-graphx_with_fonts.pdf
     2012.    http://research.nvidia.com/sites/
                                                                   (accessed: 20.10.2017).
     default/files/pubs/2012-02_Scalable-
     GPU-Graph/ppo213s-merrill.pdf         (accessed:         [28] M. Zaharia, M. Chowdhury, T. Das, A. Dave,
     20.10.2017).                                                  J. Ma, M. McCauley, M. J. Franklin, S. Shenker,
                                                                   and I. Stoica, “Resilient distributed datasets:
[21] J. Chhugani, N. Satish, C. Kim, J. Sewall, and                A fault-tolerant abstraction for in-memory
     P. Dubey, “Fast and efficient graph traversal al-             cluster computing,” in Proceedings of the 9th
     gorithm for cpus: Maximizing single-node effi-                USENIX Conference on Networked Systems
     ciency,” in 2012 IEEE 26th International Parallel             Design and Implementation, NSDI’12, (Berkeley,
     and Distributed Processing Symposium, pp. 378–                CA, USA), pp. 2–2, USENIX Association, 2012.
     389, May 2012.                                                https://www.usenix.org/system/files/
                                                                   conference/nsdi12/nsdi12-final138.pdf
[22] J. Edmonds and R. M. Karp, “Theoretical
                                                                   (accessed: 20.10.2017).
     improvements in algorithmic efficiency for
     network flow problems,” J. ACM, vol. 19,                 [29] S. Brin and L. Page, “The anatomy of a
     pp. 248–264, Apr. 1972. http://citeseerx.                     large-scale hypertextual web search en-
     ist.psu.edu/viewdoc/download?doi=10.1.                        gine,”    in Seventh International World-
     1.610.4784&rep=rep1&type=pdf      (accessed:                  Wide     Web   Conference  (WWW        1998),
     20.10.2017).                                                  1998.         http://ilpubs.stanford.edu:
                                                                   8090/361/1/1998-8.pdf (accessed: 20.10.2017).
[23] A. V. Goldberg and R. E. Tarjan, “A new ap-
     proach to the maximum-flow problem,” J. ACM,             [30] N. Raghavan, R. Albert, and S. Kumara, “Near
     vol. 35, pp. 921–940, Oct. 1988. https://www.                 linear time algorithm to detect community struc-
     cs.princeton.edu/courses/archive/fall03/                      tures in large-scale networks,” vol. 76, p. 036106,
     cs528/handouts/a%20new%20approach.pdf                         10 2007. https://arxiv.org/pdf/0709.2938.
     (accessed: 20.10.2017).                                       pdf (accessed: 20.10.2017).




                                                          8
[31] G. Malewicz, M. H. Austern, A. J. Bik,                       in Proceedings of the 10th USENIX Con-
     J. C. Dehnert, I. Horn, N. Leiser, and                       ference on Operating Systems Design and
     G. Czajkowski, “Pregel: A system for large-                  Implementation, OSDI’12, (Berkeley, CA,
     scale graph processing,” in Proceedings of                   USA), pp. 17–30, USENIX Association, 2012.
     the 2010 ACM SIGMOD International Confer-                    https://www.usenix.org/system/files/
     ence on Management of Data, SIGMOD ’10,                      conference/osdi12/osdi12-final-167.pdf
     (New York, NY, USA), pp. 135–146, ACM,                       (accessed: 20.10.2017).
     2010. https://kowshik.github.io/JPregel/
     pregel_paper.pdf (accessed: 20.10.2017).                 [38] Y. Wang, A. Davidson, Y. Pan, Y. Wu, A. Riffel,
                                                                   and J. D. Owens, “Gunrock: A high-performance
[32] S. Hong, H. Chafi, E. Sedlar, and K. Olukotun,                graph processing library on the gpu,” SIG-
     “Green-marl: A dsl for easy and efficient graph               PLAN Not., vol. 51, pp. 11:1–11:12, Feb. 2016.
     analysis,” in Proceedings of the Seventeenth                  https://arxiv.org/pdf/1501.05387.pdf (ac-
     International Conference on Architectural Sup-                cessed: 20.10.2017).
     port for Programming Languages and Operating
                                                              [39] Z. Fu, M. Personick, and B. Thompson, “Map-
     Systems, ASPLOS XVII, (New York, NY, USA),
                                                                   graph: A high level api for fast development of
     pp. 349–362, ACM, 2012. http://citeseerx.
                                                                   high performance graph analytics on gpus,” in
     ist.psu.edu/viewdoc/download?doi=10.1.
                                                                   Proceedings of Workshop on GRAph Data Man-
     1.220.1796&rep=rep1&type=pdf          (accessed:
                                                                   agement Experiences and Systems, GRADES’14,
     20.10.2017).
                                                                   (New York, NY, USA), pp. 2:1–2:6, ACM, 2014.
[33] S. Hong, S. Salihoglu, J. Widom, and K. Oluko-                https://pdfs.semanticscholar.org/3ebf/
     tun, “Simplifying scalable graph process-                     3857a60c3e224284bbbe6c7127d0a12c546d.
     ing with a domain-specific language,” in                      pdf?_ga=2.86211930.371173678.1508511132-
     Proceedings of Annual IEEE/ACM Inter-                         1908779532.1478770815 (accessed: 20.10.2017).
     national Symposium on Code Generation                    [40] S. Salihoglu and J. Widom, “Help: High-level
     and Optimization, CGO ’14, (New York,                         primitives for large-scale graph processing,” in
     NY, USA), pp. 208:208–208:218, ACM, 2014.                     Proceedings of Workshop on GRAph Data Man-
     https://pdfs.semanticscholar.org/2d8b/                        agement Experiences and Systems, GRADES’14,
     e5e1b88ac9919984b9369f7045fbb0af0d08.pdf                      (New York, NY, USA), pp. 3:1–3:6, ACM, 2014.
     (accessed: 20.10.2017).                                       http://ilpubs.stanford.edu:8090/1085/2/
                                                                   primitives_tr_sig_alternate.pdf (accessed:
[34] U. Brandes, “A faster algorithm for betweenness
                                                                   20.10.2017).
     centrality,” Journal of Mathematical Sociology,
     vol. 25, pp. 163–177, 2001. http://citeseerx.            [41] M. J. Anderson, N. Sundaram, N. Satish,
     ist.psu.edu/viewdoc/download?doi=10.1.                        M. M. A. Patwary, T. L. Willke, and P. Dubey,
     1.11.2024&rep=rep1&type=pdf           (accessed:              “Graphpad: Optimized graph primitives for par-
     20.10.2017).                                                  allel and distributed platforms,” in 2016 IEEE
                                                                   International Parallel and Distributed Processing
[35] E. W. Dijkstra, “A note on two problems in                    Symposium (IPDPS), pp. 313–322, May 2016.
     connexion with graphs,” Numer. Math., vol. 1,
     pp. 269–271, Dec. 1959.                                  [42] J. Kepner, P. Aaltonen, D. Bader, A. Buluç,
                                                                   F. Franchetti, J. Gilbert, D. Hutchison, M. Ku-
[36] U. Meyer and P. Sanders, “∆-stepping: A paral-                mar, A. Lumsdaine, H. Meyerhenke, S. McMil-
     lelizable shortest path algorithm,” J. Algorithms,            lan, C. Yang, J. D. Owens, M. Zalewski, T. Matt-
     vol. 49, pp. 114–152, Oct. 2003.           https:             son, and J. Moreira, “Mathematical foundations
     //ac.els-cdn.com/S0196677403000762/                           of the graphblas,” in 2016 IEEE High Perfor-
     1-s2.0-S0196677403000762-main.                                mance Extreme Computing Conference (HPEC),
     pdf?_tid=a3d2287c-b5a5-11e7-b112-                             pp. 1–9, Sept 2016. https://arxiv.org/pdf/
     00000aacb362&acdnat=1508511048_                               1606.05790.pdf (accessed: 20.10.2017).
     d998da6a4ec35491c44e069c0b48b756 (accessed:
     20.10.2017).

[37] J. E. Gonzalez, Y. Low, H. Gu, D. Bickson,
     and C. Guestrin, “Powergraph: Distributed
     graph-parallel computation on natural graphs,”




                                                          9