=Paper= {{Paper |id=Vol-2046/forster |storemode=property |title=Collaboration spotting visual analytics tool optimized for very large graphs |pdfUrl=https://ceur-ws.org/Vol-2046/forster.pdf |volume=Vol-2046 |authors=Richard Forster }} ==Collaboration spotting visual analytics tool optimized for very large graphs== https://ceur-ws.org/Vol-2046/forster.pdf
Collaboration spotting visual analytics tool optimized for
                    very large graphs

                                                  Richard Forster
                                               forceuse@inf.elte.hu
                                            ELTE Eötvös Loránd University
                                                 Budapest, Hungary




                                                        Abstract
                       As big data is getting collected in every sciences and other applications
                       too, visual analytics is starting to gain traction with graph based appli-
                       cations to help the users understand their data. Community detection
                       has become an important operation in such applications. It reveals the
                       groups that exist within real world networks without imposing prior
                       size or cardinality constraints on the set of communities. The Louvain
                       method is a multi-phase, iterative heuristic for modularity optimiza-
                       tion. It was originally developed by Blondel et al. [1], the method has
                       become increasingly popular owing to its ability to detect high mod-
                       ularity community partitions in a fast and memory-efficient manner.
                       To parallelize this solution multiple heuristics are used, that were first
                       introduced in [2]. For graph organization the ForceAtlas algorithm is
                       used that is part of the Gephi toolkit [3]. This method is responsible
                       to assign coordinates in a 2D space for every node in such a way that
                       they will not overlap on each other. In this paper CPU based parallel
                       implementations are provided for these sequential algorithms, and are
                       tested on a single processor with 8 threads, where a 7 fold performance
                       increase was achieved on test data taken from the CERN developed Col-
                       laboration Spotting’s database, that involves patents and publications
                       to visualize the connections in technologies among its collaborators.




1    Introduction
The Collaboration Spotting [4] graph based tool is a generic version of the CERN and JRC developed TIM
(Technology Innovation Monitor), that serves as a visual analytic application. The data about technologies are
stored in a graph database and those are traversed to get a specific subgraph based on user queries that will be
visualized as an interactive image of the graph. The user can extract different informations from publications
and patents, such as authors, keywords, organizations, etc. On organization landscapes it is a requirement to be
able to distinguish the different groups working together on the visualized papers, hence community detection
plays an important role in the tool’s calculations.
   Interactive means that the user can always navigate the map that he or she is looking at. Every user starts
from their own user defined technology-gram, that is populated by the users as they are exploring the different

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: E. Vatai (ed.): Proceedings of the 11th Joint Conference on Mathematics and Computer Science, Eger, Hungary, 20th – 22nd of
May, 2016, published at http://ceur-ws.org




                                                              74
technologies, requesting the connected papers and patents. From there they can start navigating based on
different properties of the papers and patents at hand. They can take a look at the organizations working on a
specific technology or finding out what kind of keywords the authors used in their respective publications.
   The tool is generic as it is capable to handle all kind of data until it fits a predefined graph database schema.
Right now the tool is not only about patents and publications, but within CERN it starts to provide visualization
to different experiments and departments based on their data and their own needs.
   As this application may handle graphs with tens of thousands of nodes and hundreds of thousands of edges
per user, it is clear that the overall system needs to be able to process even millions of nodes and edges at any
given time. And because the tool is supposed to give an interactive session for every user it has to be fast and
reliable to serve the user requests seamlessly.
   We take a look at the community detection problem, what procedures are used to do the computations and
what heuristics were applied on the system and we will also describe how to make the interactive visualization
useful to the users by applying force-directed algorithms on the graphs. Taking all processes and providing
parallelization on them, in overall we were able to achive a speedups up to 7.


2     Problem statement and notation
Let G(V, E, ω) be an undirected weighted graph, where V represents the set of vertices, E is the set of edges and
ω is a weighting function that gives every edge in E a positive weight. If the input graph is unweighted, then we
consider the weight of the edges to be 1. The graph is allowed to have loops, so edges like (i, i) are valid, while
multiple edges should not be present. Let the adjacencyPlist of vertex i be the following: Γ(i) = j|(i, j) ∈ E. Let
δi denote the weighted degree of vertex i, such as δi = j∈Γ(i) ω(i, j). Let #V denote the number of vertices in
graph G, #E the number of edges, and W the sum of all edge weights such as #E = 21 i∈V δi . To detect the
                                                                                           P
communities we would like to partition the vertex set V into an arbitrary number of disjoint communities, each
with different sizes (0 < n ≤ #V ). We will use C(i) to denote the community containing vertex i. Let Ei→C be
the set of all edges connecting vertex i to vertices in community C. Also let ei→C contain the sum of the edge
weights in Ei→C .

                                                          X
                                              ei→C =                  ω(i, j)                                      (1)
                                                       (i,j)∈Ei→C


  Let degC denote the sum of the degrees of every vertices in community C, which will give us the degree of the
whole community.

                                                               X
                                                     degC =          δi                                            (2)
                                                               i∈C

   Even with the groups generated by community detection helping to understand what is in the data we are
looking at, it is still difficult to handle. To be fully useful we need to provide a readable representation that could
be easily understood by users. This helps us to make interactions with the graph possible, while maintaining a
fast and versatile system.


2.1   Modularity

Let S = C1 , C2 , ..., Ck be the set of every community in a given partitioning of the vertex set V in G(V, E, ω),
where 1 ≤ k ≤ #V . Modularity Q of the partitioning S is given by the following [5]:

                                           1 X           X degC degC
                                     Q=        ei→C(i) −  (    ·     )                                             (3)
                                          2W                2W   2W
                                               i∈V              C∈S


   Modularity is widely used for community detection while it has issues such as resolution limit [6][7], and the
definition itself has multiple variants [7][8][9]. However, the definition in Eq. 3 is still the more common version,
as it is in the Louvain method [1].




                                                          75
2.2    Community detection
On a given G(V, E, ω) the problem of community detection is to compute the partitioning S of communities
that produces the maximum modularity. This problem has been shown to be NP-Complete [10]. This problem
basically is different from the graph partitioning and its variants [11], where the number and size of the clusters
and their distribution are known at the beginning. For community detection, both quantities are unknown for
the computation.


3     The Louvain algorithm
In 2008 Blondel et al. presented an algorithm for community detection[1]. The Louvain method, is a multi-phase,
iterative, greedy algorithm capable of producing the community hierarchy. The main idea is the following: the
algorithm has multiple phases, each phase with multiple iterations that are running until a convergence criteria
is met. At the beginning of the first phase each vertex is going to be assigned to a separate community. Going
on, the algorithm progresses from one iteration to another until the modularity gain becomes lower than a
predefined threshold. With every iteration, the algorithm checks all the vertices in an arbitrary but predefined
order. For every vertex i, all its neighboring communities (where the neighbors can be found) are explored and
the modularity gain that would be the result of the movement of i into each of those neighboring communities
from its current one is calculated.

   Once this calculation is done, the algorithm selects one of the neighboring communities that would provide
the maximum modularity gain, as the new community for i, and it also updates the necessary structures
that are maintained to hold the communities and its properties. On the other hand, if all the gains are
negative, the vertex stays in its own community. An iteration ends once all vertices are checked. As a
result the modularity is a monotonically increasing function, that is spread across the multiple iterations of a
phase. When the algorithm converges within a phase, it moves to the next one by reducing all vertices of a
community to a single ”meta-vertex”[12], placing an edge from such meta-vertex to itself with a weight that is
the sum of the weights of all the edges within that community and putting an edge between two meta-vertices
with a weight that is equal to the sum of the weights of all the edges between the corresponding two communities.

   The result is graph G0 (V 0 , E 0 , ω 0 ), which then becomes the input for the consecutive phase. Multiple phases
are processed until the modularity converges. At any given iteration, let ∆Qi→C(j) represent the modularity
gain that resulting from moving vertex i from its current community C(i) to a different community C(j). This
is given by:

                                          ei→C(j)   2 · δi · modC(i)/i − 2 · δi · modC(j)
                            ∆Qi→C(j) =            +                                                              (4)
                                            W                      (2W )2

    The new community assignment for vertex i will be determined as follows. For j ∈ Γ(i) ∪ i:

                                           C(i) = arg max(j)∆Qi→C(j)                                             (5)
                                                        C

   In [10] thanks to the introduced data structures, the calculation time for ∆Qi→C(j) is constant O(1). This way
the full iteration’s complexity is O(#E). Effectively we let the iterations and phases run forever, but because
the modularity is monotonically increasing, it is guaranteed to terminate at some point. Generally the method
needs only tens of iterations and fewer phases to terminate on real world data.


4     Louvain parallel heuristics
In [12] the challenges were explored that arise from parallelizing the Louvain method. To solve those issues
multiple heuristics were introduced that can be used to effectively introduce the basically sequential algorithm
to parallel systems. From the proposed heuristics in this paper we will see two applied on our system. For them
we assume the communities at any given stage of the algorithm be labeled numerically. We will use the notation
l(C) to return the label of community C.




                                                         76
4.1   Singlet minimum label heuristic
In the parallel algorithm, if at any given iteration the vertex i which is in a community by itself (C(i) = i, singlet
community [12]), that vertex will decide to move into another community possibly holding also one single vertex
j in hope for modularity gain. This change will only be applied if l(C(j) < l(C(i))).

4.2   Generalized minimum label heuristic
In the parallel algorithm, if at any given iteration the vertex i has multiple neighboring communities providing
the maximum modularity gain, then the community with the minimum label will be selected as its destination
community. It is possible that swap situations where two vertices from each others target community will change
place delay the convergence, but they can never lead to nontermination as the algorithm uses a minimum required
modularity gain threshold.

4.3   Parallel algorithm
The parallel algorithm has the following major parts:

  • Phases: Execute phases one at a time. Within each phase, multiple iterations are running and each iteration
    performs a parallel evaluation of the vertices without any locking mechanism using only the community
    information from the previous iteration. This is going on until the modularity gain between the iterations
    is not below the threshold.

  • Graph rebuilding: After a successive phase the new community assignment is used to construct a new input
    graph for the next phase. This is done by introducing the communities in the new graph as vertices and the
    edges are added based on their connection to these communities.


Algorithm 1 The parallel Louvain algorithm for a single phase, inputs are the graph G(V, E, ω) and a Status
object containing the initial community informations
 1: procedure Parallel Louvain((G(E, V, ω), Status))
 2:    Qcurr ← 0
 3:    Qprev ← −∞
 4:    while true do
 5:        for each i ∈ Vk do
 6:            Ni ← Status.nodesT oCom[i]
 7:            for each j ∈ Γ(i) do
 8:               Ni ∪ Status.nodesT oCom[j]
 9:            end for
10:            target ← arg maxt∈Ni ∆Qi→t
11:            if ∆Qi→target > 0 then
12:               Status.nodesT oCom[i] ← target
13:            end if
14:        end for
15:        Qcurr ← Compute modularity for new assignments
               Q     −Qprev
16:        if | curr
                  Qprev     | < θ then, θ being a user specified threshold
17:            break
18:        else
19:            Qprev ← Qcurr
20:        end if
21:    end while
22: end procedure



   The cycle in line 4 represents a parallel evaluation of all nodes in the input graph. The possible modularity
gain is calculated for all nodes in parallel with the resulting new collaboration assignment.




                                                         77
5     ForceAtlas
The ForceAtlas [3] algorithm generates a force-directed layout on a given set of nodes, acting as an n-body
simulation. Nodes repulse each other while edges attract the nodes they connect. These forces are driving the
system to converge to a balanced state. This final configuration is expected to give a map of the input nodes,
that will help the interpretation of the data they represent. It is generally true, by applying this technique
not all graphs will have exactly the same output given multiple runs. The end result highly depends on the
forces applied, but it can be influenced by the initial state of the system too. The result is very different from
a Cartesian projection, the position of a node cannot be read, it has to be compared to other nodes. Still,
this approach has a unique advantage: it makes the graph’s visual interpretation much easier. As it works the
structurally relevant data is represented in a visually connected map.
ForceAtlas has two main features, that makes it stand out from other force-directed algorithms:

    • The final layout highly depends on the given graph’s energy model, which represents how the repulsion and
      attraction is handled.

    • In a force-directed algorithm, the layout is applied iteratively to the graph. These steps are simulating the
      movements that will make the system reach the balance. In every step the forces are recomputed and the
      nodes are moved from their original position according to the results.

5.1    Energy Model
The force-directed algorithms relies on a certain formula to calculate the repulsion and another one to get the
attraction. As described above, force-directed algorithms are simulating a physical system, that involves repulsion
and attraction. As in any physical system, such forces are proportional to the distance between the interacting
nodes. The nodes that are closer to each other has a higher repulsion force, but lesser attraction and vice versa.
The proportionality can be linear, exponential or logarithmic.

5.2    Repulsion by degree
ForceAtlas was designed to work well with different web graphs and social networks. Commonly these networks
have many leaves. This is because of the power-law distribution of degrees that describes many real-world data.
The forest of leaves surrounding the highly connected nodes is one of the most significant sources of visual
cluttering.
   To solve this, the idea is to bring less strongly connected nodes closer to highly connected ones. In this
algorithm the repulsion force is calculated (Equation 6) in such way so the poorly connected nodes and strongly
connected nodes repulse less. As a result they will get closer to each other in the balanced state. The repulsion
force Fr is proportional to the degrees plus one of the two nodes. The coefficient kr is provided by the user
settings.
                                                       (deg(n1 ) + 1)(deg(n2 ) + 1)
                                    Fr (n1 , n2 ) = kr                                                        (6)
                                                                 d(n1 , n2 )
    In this formula the +1 ensures that even nodes with a degree of zero still have some repulsion force.

5.3    A classical attraction force
The attraction force (Equation 7) Fa between two connected nodes n1 and n2 depends linearly on the distance
d(n1 , n2 ).
                                           Fa (n1 , n2 ) = d(n1 , n2 )                                  (7)

6     Parallel ForceAtlas
As our tool needs to process very big graphs with even tens of thousands of nodes and hundreds of thousand of
edges it is important to increase the performance of the ForceAtlas computation too. To optimize the algorithm
in the current implementation we worked on the repulsion (Subsection 5.2) and attraction (Subsection 5.3) part
of process. The original algorithm was designed with user interaction in mind, which means the user could
see some animation as the iterations were progressing. This is a nice feature considering the calculations were
running on the client side. Because we would like to focus on performance and thus would push the calculations
to the server side, we left this feature out.




                                                         78
6.1   Repulsion phase
For this part first we use the Barnes-Hut algorithm [13] to recursively divide our nodes into groups by storing
them in a quad-tree. The nodes of this tree represents a region of the two-dimensional space. The root node
stands for the whole space, and its four children represent the four quartets of the space. The full space is
recursively subdivided into quartets until each subdivision contains 0 or 1 nodes. We distinguish two types of
nodes in the tree: internal and external nodes. The external node has no children and is either empty or points
to single node from the original space. Each internal node represents the group of nodes belonging to it, and
stores the center of mass and the total mass of all its children nodes.
   After this we use the regions as nodes and repulse them from each other. In those cases where the region
contains multiple nodes, they will repulse together with the region. Naturally those regions where only one node
is present will repulse the node itself.

  Using the created regions, a Θ user provided threshold and coefficient predefined value, algorithm 2 shows
how the repulsion is calculated.
Algorithm 2 Parallel ForceAtlas repulsion phase with inputs of the arrays of regions and nodes
 1: procedure Parallel Repulsion((regions, nodes))
 2:    for each n ∈ nodes do
 3:       r ← root regions
 4:       while true do
 5:          if r has sub-regions then
 6:              distance←distance between n and r
 7:              if 2 ∗ regions[r].size/distance < Θ then
 8:                  xDist ← n.x − r.massCenterX
 9:                  yDist ← n.y − r.massCenterY
10:                  f actor ← coef f icient ∗ n.mass ∗ r.mass/distance2
11:                  n.dx ← xDist ∗ f actor
12:                  n.dy ← yDist ∗ f actor
13:                  r ← next sibling
14:              else
15:                  r ← f irst child
16:              end if
17:          else
18:              There is only 1 node in the region, repulse from that
19:              if There are no more siblings then
20:                  break
21:              end if
22:          end if
23:       end while
24:    end for
25: end procedure


  From line 2 repulsion is calculated for all nodes in parallel.

6.2   Attraction phase
Here we would like to calculate how the links between the nodes will affect the repulsion of them. The nodes
connected by edges will be attracted to each other, keeping them closer to other connected nodes. All the edges
are processed in parallel and because multiple edges can connect to the same node to change the distance of the
effected elements, we use atomic operations to accumulate all the modifications. Because the distance values
are stored as floating point numbers and the generic C++ floating point types do not have atomic addition
operations defined, we use the compare-exchange functions to define the summation.
   The user can control through the ForceAtlas settings how much the edge weights should influence the
attraction. Different graphs possibly will react differently to a specific set of settings, resulting in a different
layout in the end. Users are encouraged to experiment with the different values to see how will they yield the




                                                        79
best layout that suits best their use cases.

    Algorithm 3, using the coefficient predefined value shows how the attraction is done.
Algorithm 3 Parallel ForceAtlas attraction phase with inputs of the arrays of nodes and edges
 1: procedure Parallel Attraction((regions, nodes))
 2:    for each e ∈ edges do
 3:       n1 ← e.source
 4:       n2 ← e.target
 5:       w ← e.weight
 6:       Testing edge influence user setting
 7:       if inf luence = 0 then
 8:           ewc ← 1
 9:       end if
10:       if inf luence = 1 then
11:           ewc ← w
12:       else
13:           ewc ← winf luence
14:       end if
15:       xDist ← n1.x − n2.x
16:       yDist ← n1.y p− n2.y
17:       distance ← xDist2 + yDist2
18:       f actor ← −coef f icient ∗ ewc
19:       if distance > 0 then
20:           AtomicAdd(n1.dx,xDist ∗ f actor)
21:           AtomicAdd(n1.dy,yDist ∗ f actor)
22:           AtomicAdd(n2.dx,−(xDist ∗ f actor))
23:           AtomicAdd(n2.dx,−(yDist ∗ f actor))
24:       end if
25:    end for
26: end procedure


   The cycle in line 2 represents the parallel evaluation of the attraction applying on each node of the input
graph.

7    Evaluation
For testing purposes we took 4 graphs from the Collaboration Spotting’s database, that is based on data, coming
from the Web of Knowledge. For the test set the searches were run to gather the papers currently in the database
related to 3D, CT, Database and Silicon. The produced graphs are containing all the organizations collaborating
on that specific technology. The reason why we choose these graphs is that they can effectively demonstrate the
use of community detection and layout generation. We will take a look at the details of the graphs as we evaluate
the different runtimes. The evaluation is strictly done on the described algorithms, the database processing and
retrieval time is not under discussion in this paper.
   The optimized version is implemented in C++, so for a fair comparison between the two, we prepared the
implementation of the standard versions of the Louvain and ForceAtlas algorithms also in the same environment.
The tests of both versions were running on the following (Table 1) system:

                                               Table 1: The test system

                                          CPU        Intel Core i7 4710HQ
                                          RAM            24 GB DDR3
                                           OS          Windows 10 x64
                                         Threads               8




                                                         80
  The optimized version was running on 8 threads in both algorithms.

7.1     Runtimes
We are interested in examining the performance of the whole compuation, so we measure the runtime on the
Louvain community detection algorithm (Section 3) in its entirety, running its original and optimized version on
all test graphs and then we do same with the ForceAtlas (Section 5) calculation. These runtimes are compared
to each other, showing the performance gain of our modified algorithms, while also showing the ratio how these
computations are dividing the whole process among them.

7.1.1    3D graph
For the testing first we use the graph that contains the collaborators on all the papers that are related to 3D
technology. This graph has 35763 nodes and 192336 edges. The runtime of the algorithms on this input can be
seen on Figure 1.




                           Figure 1: Runtime on a graph containing papers about 3D

    These results are showing us that the overall computation time of the original implementation is toriginal =
127, 4s and the optimized version is just toptimized = 21, 8s. For the ”3D” graph the original Louvain algorithm
took tlouvainoriginal = 92, 2s, the optimized code took tlouvainoptimized = 10, 8s to finish the whole community
detection. The original ForceAtlas took tf orceatlasoriginal = 35, 2s to finish and the parallel version took only
tf orceatlasoptimized = 11s to generate the final balanced layout.
    Overall, the full process was able to finish with everything ∆t = 5, 8 times faster on the optimized algorithms.

7.1.2    CT graph
The next graph contains the collaborators working on papers referring to the CT technology. This graph has
53039 nodes and 325490 edges. The runtime of the processing can be seen on Figure 2.




                           Figure 2: Runtime on a graph containing papers about CT

   The results are showing that the overall computation time of the original implementation is toriginal = 243, 82s
and the optimized version is handling the same processing in just toptimized = 35, 707s. For the ”CT” graph
the original Louvain algorithm took tlouvainoriginal = 179, 1s, the optimized took tlouvainoptimized = 18, 1s to fully




                                                         81
generate the communities. The original ForceAtlas took tf orceatlasoriginal = 64, 72s and the optimized took only
tf orceatlasoptimized = 17, 6s to return with the finished layout.
    For the overall computation of both the community detection and layout generation, the runtime was ∆t = 6, 8
times faster on the optimized algorithms.

7.1.3   Database graph
The next graph contains the network of collaborators connected to Database technologies. The whole graph
consists of 17832 organizations and 113622 corresponding edges among them. The full processes runtime values
can be seen on Figure 3.




                        Figure 3: Runtime on a graph containing papers about database

   This shows us that the overall computation time of the original implementation is toriginal = 36, 45s and
the optimized version is just toptimized = 7, 44s. For the ”Database” graph the original Louvain algorithm took
tlouvainoriginal = 20, 5s to finish, while the optimized took only tlouvainoptimized = 2, 6s to return the communities.
The original ForceAtlas took tf orceatlasoriginal = 15, 95s to generate the layout, while the optimized could finish
in tf orceatlasoptimized = 4, 84s.
   For the overall computation the runtime was ∆t = 4, 89 times faster on the optimized algorithms.

7.1.4   Silicon graph
The last graph contains the network connected to technologies working with silicon. This graph has 24923 nodes
and 185594 edges. The final runtime can be seen on Figure 4.




                          Figure 4: Runtime on a graph containing papers about silicon

    The overall computation time of the original implementation is toriginal = 53, 6s and the optimized version
is only toptimized = 11, 67s. For the ”Silicon” graph the original Louvain algorithm took tlouvainoriginal = 28, 3s
to finish its computations and the optimized took tlouvainoptimized = 3.51s on the same dataset. The original
ForceAtlas took tf orceatlasoriginal = 25, 3s to process the graph and the optimized was able to conclude in only
tf orceatlasoptimized = 8, 16s.
    For the whole computation on the tested algorithms the overall runtime was ∆t = 4, 6 times faster on the
optimized algorithms.




                                                          82
7.2   Analysis
From the detailed runtimes we can see, that the Louvain computation takes more time to finish from the two
algorithms, also it is where the biggest speedup can be achieved. The optimized Louvain algorithm is in average
∆t(louvain) = 9, 1 times faster than the sequential implementation, while the ForceAtlas gains in average a
∆t(f orceatlas) = 3, 4 speedup. This happens because of two things: first the ForceAtlas runs on the quad-tree
generated by the Barnes-Hut algorithm to calculate the repulsions (Subsection 6.1) leading to smaller number
of nodes than what we have in the original input graphs, requiring less operations to finish the process. On the
other hand the Louvain has to check every possible neighbor for the nodes to decide which community will yield
the best possible modularity gain (Subsection 3), thus leading to more computation.
    The speed up of the Louvain algorithm is in line with the increase of the computing resources, using 8 threads
instead of just 1. The higher performance increase can be explained with the usage of the parallel heuristics
(Section 4). It reduces the number of operations in the swapping scenarios, where no modularity gain check
is required, as the nodes are always simply moved towards the community with a bigger label. Overall the
clustering scales well with the available resources.
    These can help the Louvain to perform better, than ForceAtlas. Furthermore While the quadratic tree
decreases the required amount of computation, it introduces an additional bottleneck in the form of irregular
memory accesses. In this case the algorithm cannot effectively utilize the CPU’s caching system, thus reducing
the scalability in the current implementation.
    While the Louvain algorithm is inherently sequential with the heuristics we can drastically increase the perfor-
mance of the computation, tlouvainoptimized << tlouvainoriginal . It may be that the ForceAtlas algorithm did not
receive such boost in performance, but the gained runtimes from the optimization are still significantly better,
tf orceatlasoptimized < tf orceatlasoriginal .

8     Future work
As we stand now our work is tested on a single machine, with a single CPU using 8 threads (Section 7). Naturally
there is a limit at how much can be processed on such systems and as we are starting a service based on these
graph calculations it is given that the system will run on multiple machines utilizing multiple CPUs and much
more resources than what was given for the analysis. We have to further explore the possibilities about how
much this system can scale on a bigger environment and how much performance boost it can provide using the
described techniques.

9     Conclusion
By applying the heuristics described in Section 4 for the Louvain algorithm and by using an optimized version of
the ForceAtlas (Section 5) we were able on our biggest test graph about ”CT” technology (Subsection 7.1.2) to
achieve a ∆t = 6, 8 times overall speedup (Subsection 7.1) compared to the original CPU based implementation.
Based on what we see in the detailed analysis (Section 7.2) we can expect the system to scale well on distributed
systems when the graph that it’s used on does not only have a high volume of edges, but also a larger amount
of nodes. In all test cases the community detection enjoyed a significantly higher speedup than the ForceAtlas
algorithm, thanks to the introduced heuristics (Section 4): tlouvainoptimized << tf orceatlasoptimized . Even though
the ForceAtlas computation was also more efficient using the parallel solutions (Section 6): tf orceatlasoptimized <
tf orceatlasoriginal . This leads us to the conclusion, the parallel Louvain modularity (Table 1) can greatly improve
the overall performance of the computations thanks to the introduced heuristics and the parallel ForceAtlas
(Tables 0, 0) also increases the overall experience of the tools usage. Also, this makes the Collaboration Spotting
tool capable of handling very large graphs without compromising the users work.

References
[1] V. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, J.
    Stat. Mech. Theory Exp. (2008) P10008

[2] Hao Lu, Mahantesh Halappanavar, Ananth Kalyanaraman, Parallel heuristics for scalable community detec-
    tion, Parallel Computing 47 (2015) 1937




                                                         83
[3] Mathieu Jacomy, Tommaso Venturini, Sebastien Heymann, Mathieu Bastian, ForceAtlas2, a Contin-
    uous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software,
    http://dx.doi.org/10.1371/journal.pone.0098679 (2014)

[4] Collaboration Spotting Visual Analytics Tool, CERN, http://collspotting.web.cern.ch
[5] M.E.J. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69 (2)
    (2004) 026113.
[6] S. Fortunato,      Community detection in          graphs,   Phys.   Rep.   486    (35)   (2010)   75174,
    http://dx.doi.org/10.1016/j.physrep.2009.11.002.
[7] V.A. Traag, P. Van Dooren, Y. Nesterov, Narrow scope for resolution-limit-free community detection, Phys.
    Rev. E 84 (1) (2011) 016114.
[8] D. Bader, J. McCloskey, Modularity and graph algorithms, SIAM AN10 Minisymposium on Analyzing Mas-
    sive Real-World Graphs (2009) 1216.
[9] J.W. Berry, B. Hendrickson, R.A. LaViolette, C.A. Phillips, Tolerating the community detection resolution
    limit with edge weighting, Phys. Rev. E 83 (5) (2011) 056119.
[10] U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, D. Wagner, On modularity cluster-
    ing, IEEE Trans. Knowl. Data Eng. 20 (2) (2008) 172188.

[11] B. Hendrickson, T.G. Kolda, Graph partitioning models for parallel computing, Parallel Comput. 26 (12)
    (2000) 15191534.
[12] Hao Lu, Mahantesh Halappanavar, Ananth Kalyanaraman, Parallel heuristics for scalable community de-
    tection, Parallel Computing 47 (2015) 1937

[13] J. Barnes and P. Hut (December 1986). ”A hierarchical O(N log N) force-calculation algorithm”. Nature
    324 (4): 446449.




                                                       84