The technique of structuring social network graphs for visual
analysis of user groups to counter inappropriate, dubious and
harmful information

                M Kalameyets1, A Chechulin1 and I Kotenko1,2
                1
                 Laboratory of computer security problems, St. Petersburg Institute for Informatics and
                Automation of the Russian Academy of Sciences, Saint-Petersburg, 194100, Russia.
                2
                 International Laboratory of Information Security of Cyber-physical Systems, ITMO
                University, Saint-Petersburg, 197101, Russia.

                Abstract. The paper proposes a data visualizing technique for analyzing social networks in
                order to identify and counteract inappropriate, dubious and harmful information. The proposed
                technique is based on the force-layout technique of drawing graphs in which the parameters of
                vertices and edges are calculated depending on the number of links. The paper provides an
                example of using the proposed data visualization technique for social networks analysis on the
                basis of visual analytics of several groups in the "VKontakte" social network in four display
                modes: without using the technique, in 2D, in 3D and in augmented reality. The example also
                contains a description of various drawing techniques and estimates them.


1. Introduction
Nowadays, the importance of social networks is growing, because they influence greatly on the
individuality and behavior of people. At the same time, the reputation institute has spread widely in
the Internet, therefore users have a growing trust in social networks as a relevant information source in
comparison with traditional communication channels - television, radio, magazines, etc. But in
addition to news, social networks have become a platform for the dissemination of inappropriate,
dubious and harmful information, for example, the ideas of homeopathy, denial of the Holocaust,
human immunodeficiency virus (HIV) dissidence, etc. It is extremely difficult to assess the scale of
such information dissemination and to influence on its distribution [1]. A possible solution for
discovering and analyzing information flows in social networks is to visualize data which allows one
to represent complex processes in a simple graphic manner [2-4].
    A distinctive feature of visual analytics is that all decisions are made by a person. Thus, it can be
used to detect trends, identify and make individual decisions in cases where automated techniques (for
example, machine learning methods) give a high percentage of false positives. Nevertheless, in view
of the large amount of information analyzed, visualization of social networks often becomes
unreadable and difficult to perceive in order to carry out effective analysis.
    In this paper, in order to simplify the visual analysis of social networks, it is proposed to use the
force-layout technique of graph drawing. The novelty of this work lies in the approach for the force-
layout parameters calculation on the basis of statistical data, which allows to spatially cluster the
vertices of the graph, thereby allocating clusters of users. The contribution of this work is the
technique for selection of parameters for force-layout drawing of social networks in the interests of
identifying and countering inappropriate, dubious and harmful information.


                                                                                                           87
   The paper is organized as follows: the second section reviews the techniques used to visualize
social networks: the third section describes the proposed approach for calculating the parameters of the
force-layout graph of the social network and describes the experiments conducted; the fourth section is
an evaluation of the proposed approach and explanation of results; the fifth section conclude the paper
results and plans for future work.
   The paper uses the terminology which are used in the social network “VKontakte” and can differ
from the terminology of other social networks:
    1) User in social network represents user’s webpage with a microblog in which he or she can post
         content like video, images, texts and audio.
    2) Group in social network is a webpage which is administered by several users together. Users
         can join group to read its content. Groups can be created either by companies or news
         agencies, or by simple users.
    3) Post is a content like video, images, texts and audio that can be posted by user or group.
    4) Repost is a post which a user or a group copied from the page of another user or group. Repost
         contains information from whom it was copied.
    5) Repost tree is a chain of reposts that occur when users repost posts.

2. Related work
Currently, there are many visualization models that allow one to display various processes occurring
in social networks [5]. The most studied models include the models based on graphs, TreeMaps and
matrices. So, with the help of graphs, one can effectively analyze the connectivity of users. Interval
graphs models allow one to analyze the temporal relationship between different events.
In the form of tree, the repost tree can be represented, where the source of the post is the root.
    TreeMaps models allow one to efficiently analyze metrics (such as the number of “likes”, user
coverage, post discussion) of reposts.
    Using matrices one can analyze small graphs with a large number of links.
    It is also possible to use more specific models, for example: combined models [6], models based on
Voronoi maps [7] in the case of planar graphs and Voronoi TreeMaps [8] as an alternative to the
TreesMaps, Chord diagrams and geo-map models. The list of possible models can be continued [5].
    It also important to note, that the same dataset can be visualized using different visualization
models [9-10]. For example, in Figure 1 you can see how single dataset with two types of links (links
that form a hierarchical structure and links that form a planar structure) can be visualized using
different visualization models [9].
    At the same time, effective visual analytics should correspond to the principle of Ben
Schneiderman [11]: to be able to view the entire data, be scalable and filterable.
    The following principles can also be considered as the key requirements for visualization:
     1) Ability to use different drawing techniques [12]. For graphs, this can be radial, force-layout, or
          cluster-based rendering (Figure 2).
     2) The naturalness of the techniques of human-machine interaction. For example, a natural way
          to visualize 3D graphs is augmented reality, and in 2D mode, when using filters, one can use
          the capabilities of touch screens.
    It should be noted that complex and multilevel visualization models (such as multi-level graphs,
Chord diagrams, Voronoi TreeMaps, etc.), despite their clear advantage in the ability to display a large
number of metrics, is more difficult to manage and percept. For this reason, the most common way to
visualize social networks are graphs.
    Graphs are the most universal visualization model, which can be relatively easily structured, and
which supports data visualization capabilities even with hundreds of thousands of nodes and no fewer
number of links.
    The main disadvantage of the graph representation is that it becomes difficult to perceive by a large
number of edges and vertices. In the field of visualization, there are many works devoted to clustering
graph elements, ways of structuring, ways of drawing, building trees of different types, reducing the
size of data, etc. Researchers who deal with this problem agree that in order to simplify the visual
analysis, it is necessary to develop techniques for structuring the elements of the graph.

                                                                                                       88
                (a)                                 (b)                                (c)
   Figure 1. Dataset can be visualized using different visualization models: (a) Graph-TreeMap,
                              (b) Tree-Matrix, (c) Chord diagram [9].


Figure 2. One graph can be depicted using different layouts [12]: (A) poly-line layout, (B) straight-
                               line layout, (C) orthogonal layout.
   For visual analysis of social networks, we propose a technique that is based on the force-layout
drawing of the graph in which the parameters of vertices and edges are calculated depending on the
number of links.

3. Technique of calculating the parameters of the force-layout graph of the social network
The force-layout technique of graph drawing is used to simplify perception by structuring the
arrangement of vertices and the edges [13, 14]. The force-layout technique is based on physical
simulation, which consists of three main components:
    1) Each vertex is given the value of the charge, after which the vertices, similar to electrons with
        the same charge, are "repelled" from each other. The magnitude of this charge determines with
        what force this vertex repels. Thus, the vertices with a smaller charge will be closer to each
        other, while the vertices with a bigger charge will be farther away.
    2) The edges of the graph connect the vertices similar to springs. Each edge is given a value of
        the length, which is similar to the length of the spring in the stretched form. Thus, the distance
        between the connected vertices is bounded by the length of the edge.
    3) Each edge is given a value of strength, which is similar to the spring's elasticity. Strength
        determines the possibility of "stretching" and "squeezing" the rib. Thus, for edges with the
        strength equal to 1, the length of the edge is initially equal to the given, and in the other case,
        the edge will expand and contract depending on the values of strength and the charge of the
        vertices connected to them.


                                                                                                        89
    Since the vertices repel, and their edges restrain, the graph "comes into motion" until it comes to an
equilibrium state. This state is the result of the algorithm. Getting a readable graph depends on the
correct selection of these parameters for a certain graph structure.
    For social networks we suggest to use the following technique:
     1) For vertices that have only one edge and their edges:
             a. to set small charge value for vertex;
             b. to set maximum strength value of 1 for edge (the edge cannot be stretched);
             c. to set a small edge length.
     2) For vertices that have the largest number of edges (statistically correspond to "outliers") and
         for edges between these vertices:
             a. to set a big charge value for vertex;
             b. to set maximum strength value of 1 (the edge cannot be stretched) for edges between
                  these vertices (having the largest number of edges).
             c. to set a big edge length.
     3) For all other vertices and edges,
             a. to set an average charge value for vertex;
             b. to set value of 0.1 for edge (the edge can be strongly stretched);
             c. to set an average edge length.
    Let us consider examples of using this technique on the basis of visual analytics of several groups
in "VKontakte" social network.
    The initial data is a JSON file that has the following information: an array of objects that consists
of users and groups; an array of links that consists of relations "repost" between users and users, users
and groups, groups and groups.
    Each vertex is a subject – user or group who reposted the post from the group or another group
participant.
    The edge indicates the presence of a repost between them.
    Let us consider examples of implementation.
    The simplest way to represent is a 2D graph. The graph of repost between users and groups is
depicted in Figure 3 and Figure 4 in 2D display mode using the force-layout graph drawing technique,
where the vertex is the user or group and the edge is the repost.


                                                        Figure 4. The force-layout technique of graph
    Figure 3. Force-layout technique of graph
                                                           drawing in which for vertices and edges
                     drawing
                                                       individual parameters are used depending on the
     in which standard parameters are used.
                                                                number of edges at the vertex.


                                                                                                       90
    In Figure 3, the standard parameters for the vertices and edges of the graph are used: the charge
values, edge lengths, and strengths are the same for all vertices and edges.
    Obviously, the force-layout drawing technique simplified the perception of the graph, and spatially
clustered some vertices.
    Nevertheless, in view of the great connectivity, such a graph is still difficult-readable. For
comparison, in Figure 4, the vertices and edges parameters, proposed in the technique, are used.
    It is noticeable that the readability of the resulting image is ensured by the following:
     1) The vertices, which have the greatest number of edges, tend to be located at a great distance
          from each other, forming a kind of skeleton of the graph, around which the remaining vertices
          will be located. Let us call such vertices as “skeleton vertex”.
     2) Vertexes with a single edge are grouped by semi-rings around the skeleton vertex associated
          with them, forming the "Hair Ball".
     3) Other vertices are located between skeleton vertices, the posts of which they reposted. It can
          be seen that some of these vertices also form small groups if they reposted each other's posts.
    An obvious advantage of such force-layout drawing parameters is a more visible spatial clustering
of vertices.
    Vertexes that have one edge and correspond to users or groups who repost posts from only one
group are clustered by an arc near the skeleton vertex (usually, it is a group).
    Vertices-users are clustered between common skeleton vertices. And skeleton vertices are located
far enough from each other to distinguish clusters of common users between them.
    We discuss the practical importance of clustering in the next section.
    The effectiveness of this technique of drawing is especially clear when the graph is depicted in 3D
(Figure 5).


              Figure 5. The force-layout technique of graph rendering in 3D in two angles;
               the initial data used to construct the graph does not differ from the data
                                  used for the graph in Figures 3 and 4.
   Graph display in 3D allows to avoid overlapping edges on each other and even more spatially
clustering vertices by using the third dimension.
   It should be noted that the advantage of 3D rendering leveled by the inefficiency of human-
machine interaction. For example, in 2D drawings (Figure 3 and Figure 4), scaling can be done by
simply zooming of image, and in 3D drawing (Figure 5), one need to simulate a camera in 3D space
that has 3 motor and 3 rotational degrees of freedom.
   Thus, when zooming and navigating, the user is forced to use the mouse and keyboard, while often
losing part of the image from view in view of unnatural camera control.
   In order not to navigate in 3D space with the keyboard and mouse, one can use the graph
visualization in augmented reality (Figure 6).


                                                                                                      91
           Figure 6. The force-layput technique of 3D graph drawing in augmented reality;
               the initial data used to construct the graph does not differ from the data
                                 used for the graph in Figures 5, 4 and 3.
   Augmented reality makes the process of zooming and navigation more natural: a person physically
approaches, moves away and turns his head, perceiving the graph as a real physical object.
   Thus, several models are formed at once, which can be used for various tasks. For example, to
navigate in large-sized graphs, one can use graphs in augmented reality, and to analyze local user
groups, one can use 2D or 3D mode, depending on the degree of vertex connectivity.

4. Discussion
An obvious advantage of the proposed technique is the spatial clustering of the graph vertices into 4
categories (see Figure 7):
    1) The vertices of a graph with a large number of edges, forming a skeleton. Some such vertices
        are highlighted by the red region in Figure 7. Usually, these vertices are representing big
        groups and they are the main sources of information dissemination. Smaller groups and users
        are clustered around this vertices.
    2) Clusters of vertices with one edge. Usually, they are clustered by semi-rings around one
        skeleton vertex-group with which they are connected by this single edge. Some such clusters
        of vertices are identified by the blue region in Figure 7. These vertices are not the
        disseminators of information. Usually, these are endpoint users who reposted a single or
        several posts that were not further distributed. Nevertheless, such clusters of users create
        a coverage of the views of the post, and the size of the resulting arc can be used to judge the
        potential of this coverage.
    3) Groups of graph vertices that have edges or small sequences of edges with two or more
        skeleton vertices. Usually, they are clustered between the skeleton vertices. Some such clusters
        of vertices are identified by the yellow region in Figure 7. These clusters typically represent
        users that are reposting posts from several public posts. This indicates their greater
        involvement in the analyzed topics.
    4) Small local user clusters. Typically, this is a small number of users who are part of one of the
        previous clusters and at the same time repost posts from each other. Some such clusters of

                                                                                                     92
        vertices are distinguished by a green region in Figure 7. The fact that such users repost posts
        from the same groups and from each other may indicate that these users are familiar with each
        other.


                  Figure 7. The force-layout technique forms four categories of clusters:
                   big groups that are main sources of information dissemination (red),
                      the endpoint users that do not distribute the information (blue),
                    involved users (yellow) and the potentially familiar users (green).
    Thus, the vertices belonging to the first category are the primary sources or the most significant
information distributors. The second category - the end users from whom the information is not further
disseminated. The third category is users involved in the topic of group relations. The fourth category
is local groups, probably familiar to each other.
    At the same time, it is obvious that visual analysis of users of the third category is complicated.
Through them, a large number of edges pass, and they are not so easy to distinguish in comparison
with other categories.
    Nevertheless, this disadvantage can be leveled by 3D mode and convenient computer-human
interaction techniques that allow to locally change the layout of the graph and provide convenient
scaling tools.
    Thus, the proposed technique provides a simple way of visual analytics to identify user clusters and
assess the extent of information dissemination, its potential coverage and the degree of social
connectivity of users.


                                                                                                     93
   Examples of implementing the described visualization modes can be found on the following links:
    1) for a 2D graph without using the proposed technique [15];
    2) for a 2D graph that used the proposed technique [16];
    3) for the 3D graph that used the proposed technique [17];
    4) for the 3D graph that used the proposed technique in augmented reality (you need a mobile
       phone based on Android 8+ and Google AR Core support) [18].

5. Conclusion
The paper suggested the technique of visualizing data to analyzing social networks in order to identify
and counteract inappropriate, dubious and harmful information.
   The proposed technique is based on the force-layout technique of graph drawing in which the
parameters of vertices and edges are calculated depending on the number of links.
   As a result, the vertices are spatially clustered into 4 categories: big groups that are main sources of
information dissemination, the endpoint users that do not distribute the information, involved users
and the potentially familiar users.
   The paper provides the example of using the proposed data visualization technique for social
networks based on visual analytics of several groups in "VKontakte" in four display modes: without
using a technique, in 2D, in 3D and in augmented reality.
   The paper provides the evaluation that explains the meaning of the clusters on the picture which
was obtained as a result of the experiment.
   In future works, it is planned to explore the possibility of using other technique for structuring
social network graphs, and also create computer-human interaction techniques that are necessary for
visual analysis, which will effectively apply various layout techniques for different parts of one graph.

6. References
[1] Kotenko I, Chechulin A, Shorov A and Komashinsky A 2014 Analysis and Evaluation of Web
      Pages Classification Techniques for Inappropriate Content Blocking Lecture Notes in Computer
      Science 8557 pp 39-54.
[2] Zhao J, Cao N, Wen Z, Song Y, Lin Y and Collins C 2014 # FluxFlow: Visual analysis of
      anomalous information spreading on social media IEEE transactions on visualization and
      computer graphics 20 pp 1773-1782.
[3] Smith M, Shneiderman B, Milic-Frayling N, Mendes E, Barash V, Dunne C, Capone T, Perer A
      and Gleave E 2009 Analyzing (social media) networks with NodeXL Proc. fourth international
      conference on Communities and technologies pp 255-264.
[4] Ribarsky W, Wang D and Dou W 2014 Social media analytics for competitive advantage
      Computers & Graphics 38 pp 328-331.
[5] Chen S, Lin L, and Yuan X 2017 Social media visual analytics Computer Graphics Forum 36
      pp 563-587.
[6] Kolomeec M, Gonzalez-Granadillo G, Doynikova E, Chechulin A, Kotenko I and Debar H 2017
      Choosing Models for Security Metrics Visualization Lecture Notes in Computer Science 10446
      pp 75-87.
[7] Kolomeets M, Chechulin A and Kotenko I 2016 Visualization Model for Monitoring of
      Computer Networks Security Based on the Analogue of Voronoi Diagrams Lecture Notes in
      Computer Science 9817 pp 141–157
[8] Balzer M, Deussen O and Lewerentz C 2005 Voronoi treemaps for the visualization of software
      metrics Proc. 2005 ACM symposium on Software visualization 165-72.
[9] Vehlow C, Beck F and Weiskopf D 2015 The state of the art in visualizing group structures in
      graphs Eurographics Conference on Visualization (EuroVis)-STARs 2.
[10] Ghoniem M, Fekete J and Castagliola P 2005 On the readability of graphs using node-link and
      matrix-based representations: a controlled experiment and statistical analysis Information
      Visualization 4 pp 114-135.


                                                                                                        94
[11] Craft B and Cairns P 2005 Beyond guidelines: what can we learn from the visual information
     seeking mantra? Proc. Ninth International Conference on Information Visualisation pp 110-
     118.
[12] Jusufi I 2013 Multivariate networks: visualization and interaction techniques.
[13] Novikova E, Kotenko I, Fedotov E 2015 Interactive Multi-view Visualization for Fraud
     Detection in Mobile Money Transfer Services International Journal of Mobile Computing and
     Multimedia Commmunications 6 4 pp 72-97.
[14] Novikova E and Kotenko I 2018 Visualization-Driven Approach to Fraud Detection in the
     Mobile Money Transfer Services Algorithms, Methods and Applications in Mobile Computing
     and Communications IGI Global pp 205-236.
[15] Laboratory of Computer Security website. Implementation example: 2D graph without using the
     proposed technique. URL: http://comsec.spb.ru/files/forceWithout.html. Access date
     (02.09.2018).
[16] Laboratory of Computer Security at SPIIRAS website. Implementation example: 2D graph with
     using the proposed technique. URL: http://comsec.spb.ru/files/forceWith.html. Access date
     (02.09.2018).
[17] Laboratory of Computer Security at SPIIRAS website. Implementation example: 3D graph that
     used the proposed technique. URL: http://comsec.spb.ru/files/force3D.html. Access date
     (02.09.2018).
[18] Laboratory of Computer Security at SPIIRAS website. Implementation example: 3D graph that
     used the proposed technique in augmented reality (you need a mobile phone based on Android
     8+ and Google AR Core support). URL: http://comsec.spb.ru/files/forceAPK.html. Access date
     (02.09.2018).

Acknowledgments
This work was supported by the grant of the RSF No. 18-71-10094 in SPIIRAS.


                                                                                              95