<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Research and analysis of links in social networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>M.I. Khotilin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A.V. Blagov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I.A. Rytsarev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Moskovskoye shosse, 34, 443086, Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>84</fpage>
      <lpage>87</lpage>
      <abstract>
        <p>This paper is devoted to the analysis of data and links in social networks. The approach of representation of a social network in the form of a graph is considered. The algorithms for finding communities and main nodes ("hubs"), which are the accounts that have the greatest impact on communities, have been explored and planned for finalization. Existing software environments for visualizing social network data are explored, a software package is developed. Over the past decade, social networks have played a huge role in the life of society. They, being the subject of socialization of people, occupy one of the leading positions in the production of "big data". The ability to spread and share messages, photos, music, videos with friends, and create and conduct various events, including for the purpose of business promotion - all this represents a huge amount of constantly generated, aging, updated data. Large amounts of data, including social networks data, as well as the relationships (links) between them must be presented in the form convenient for perception [1-6]. Often, when it comes to objects representing a network, for example, a social one, the notion of data visualization is closel y related to the notion of graphs. The network represented as a graph is simple for perception and further analysis. An important task is to represent links in social networks to identify various kinds of dependencies.</p>
      </abstract>
      <kwd-group>
        <kwd>social networks</kwd>
        <kwd>big data</kwd>
        <kwd>graph</kwd>
        <kwd>adjacency matrix</kwd>
        <kwd>SCAN-algorithm</kwd>
        <kwd>Gephi</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>After entering the login and password, the authorization of the user in the social network and access to the necessary
information, namely: to the list of the user's friends, the list of communities, photos, messages, etc., takes place via the open
OAuth authentication protocol version 2.0. In the framework of this work, we were interested in the possibility of extracting the
user's friends from the social network, so the remaining items were ignored.</p>
      <p>Each user of the social network has a unique identifier, or else an ID, which allows you to uniquely identify the user. Using
the properties of the built-in API of the social network “Vkontakte”, you can, knowing the user ID, extract information about his
friends, up to nesting level N. In other words, you can extract a list of friends (N = 1), friends of friends (N = 2), etc. We were
interested in the list of friends up to the level of nesting N = 2.</p>
      <p>The list of friends extracted from the social network and converted, takes the form of a text file containing the user ID,
authorized in the social network and then in the tabular form of the user ID and his name (full name). Knowing the user's ID, you
can also find out the list and his friends, which is similarly recorded in the file. An example of the output file part by user and
each of the user-friends is shown in figure 2.</p>
      <p>Further, by concatenating the files, a common list of all friends is obtained, along which a list of all friends (dimension K) is
constructed, from which an adjacency matrix (of dimension K × K) is constructed, on which the dependency graph is
subsequently built, by the Gephi software.</p>
      <p>An adjacency matrix is a K × K dimension matrix containing a list of friends horizontally and vertically, and a row of 0 or 1
at the intersection of the row and column. The contents of the cell of the table matrix is 0 if the users are not familiar (not
included in the list of common friends between the user and the friend of the user) and 1 otherwise if there is a "friendship"
relationship between the specified users. After construction, this matrix is saved in the .csv format for further loading into Gephi.
An example of an adjacency matrix is shown in figure 3.</p>
    </sec>
    <sec id="sec-2">
      <title>3. The construction of a graph, classification of vertices, finding the most significant vertices</title>
      <p>
        Constructed on the basis of the list of all friends, the contiguity matrix of the future graph is loaded into the Gephi software
tool, in order to further visualize the graph of dependencies. Gephi is a software product for network analysis and visualization
of data, written in the high-level Java language [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The constructed Gehpi graph looks like this (figure 4):
      </p>
      <p>Data Science / M.I. Khotilin, A.V. Blagov, I.A. Rytsarev
In this graph, the vertices are users of the social network, and the edges are the relation "friendship" between users.</p>
      <p>It is worth noting that friends of friends of the user who do not have common relations with the user did not interest us in the
framework of this work, so these tops-friends of friends were deleted from the graph.</p>
      <p>
        The next step is to classify the vertices of the graph. The following classification is proposed in the paper:
- core is a vertex containing at least μ vertices in an ε-neighborhood
- hub is a separate node whose neighbors belong to two or more different clusters;
- outlier - this is a separate vertex, all neighbors of which belong to the same cluster, or do not belong to any cluster [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
To implement such a classification, the SCAN algorithm is used [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The principle of the SCAN algorithm is described below.</p>
      <p>
        Search begins with an initial visit of each vertex once [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], in order to find structurally-connected clusters, and then visit
isolated vertices to identify them (hub or outlier).
      </p>
      <p>
        SCAN performs one network pass and finds all structurally-related clusters for a given parameter. In the beginning all
vertices are marked as unclassified. The SCAN algorithm classifies each vertex as either a member of a cluster, or as not being a
member [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. For each vertex that is not yet classified, SCAN checks whether this vertex is a kernel. If the vertex is the core, the
new cluster expands from this vertex. Otherwise, the vertex is marked as not being a member of the cluster.
      </p>
      <p>To find a new cluster, SCAN starts with an arbitrary kernel V and looks for all vertices that are structurally reachable from
V. This is quite enough to find a complete cluster containing the vertex V. A new cluster ID is generated, which will be assigned
to all found vertices.</p>
      <p>
        SCAN begins by setting all the vertices in the ε-neighborhood of the vertex V in the queue. For each vertex in the queue, all
directly reachable vertices are calculated, and those vertices that have not yet been classified are inserted into the queue. This is
repeated until the queue is empty [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Vertexes that are not members of clusters can be additionally classified as hubs or extraneous. If a single vertex has edges on
two or more clusters, it can be classified as a hub. Otherwise, it's an outsider.</p>
      <p>A distinctive feature is the presence of parameters and that can be set by the user or expert. In this case, the optimal value of
these parameters can be found by machine learning the system using certain network segments.</p>
      <p>
        Since Gephi is an opensource platform [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], one of its great advantages is the ability to write its own modules that implement
various algorithms. Thus, using the algorithm from [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a module that implements the SCAN algorithm was written.
      </p>
    </sec>
    <sec id="sec-3">
      <title>4. Results and Discussion</title>
      <p>The result of the SCAN algorithm work on the graph, constructed in Gephi, is a text file containing a list of the user's friends
and typing it as the top of the graph (figure 5).</p>
      <p>It is worth noting that the SCAN algorithm has a certain limitation on the dimension of the graph used. On graphs with high
dimensionality (N&gt; 500) there is an error in the work.</p>
      <p>
        One way to solve this problem is to modify the algorithm for parallel computations [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The idea of this modification is to
split the sets of communities into subsets between processors. However, one should take into account that for balancing these
subsets should have roughly the same sum of squares of community sizes.
      </p>
      <p>The authors set the task of creating a distributed modification of the SCAN algorithm for ultrahigh-dimensional graphs.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements References</title>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Social networks and connections in them are the subject of research in this research work. The approach based on the
representation of social networks in the form of graphs, makes it possible to apply algorithms for clustering graphs of high
dimension. The algorithms described in the paper make it possible to classify segments of a social network, and also to find
elements of the greatest interest, for example, users that affect all elements of the same community. These algorithms are
planned to be finalized for subsequent application in solving practical problems of finding communities in the segment of social
networks in the Samara region.</p>
      <p>The Gephi tool, which makes it possible to implement visualization of social networks, was explored, and a software tool that
allows to present data in the form required for research was developed.</p>
      <p>The work has been performed with partial financial support from the Ministry of Education and Sciences of the Russian
Federation within the framework of implementation of the Program for Improving the Samara university Competitiveness
among the World's Leading Research and Educational Centers for the Period of 2013-2020s.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Tan</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blake</surname>
            <given-names>MW</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saleh</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dustdar</surname>
            <given-names>S</given-names>
          </string-name>
          .
          <article-title>Social-network-sourced big data analytics</article-title>
          .
          <source>IEEE Internet Computing</source>
          <year>2013</year>
          ;
          <volume>5</volume>
          :
          <fpage>62</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Blagov</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rytcarev</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strelkov</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khotilin</surname>
            <given-names>M</given-names>
          </string-name>
          .
          <article-title>Big Data Instruments for Social Media Analysis</article-title>
          .
          <source>Proceedings of the 5th International Workshop on Computer Science and Engineering</source>
          <year>2015</year>
          ;
          <fpage>179</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Ivanov</surname>
            <given-names>PD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopukhovsky</surname>
            <given-names>АG</given-names>
          </string-name>
          .
          <article-title>BigData technologies and various methods of representing big data</article-title>
          .
          <source>Engineering Journal: Science and Innovation</source>
          ,
          <year>2014</year>
          ;
          <fpage>9</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Agafonov</surname>
            <given-names>AA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Myasnikov</surname>
            <given-names>VV</given-names>
          </string-name>
          .
          <article-title>Method for the reliable shortest path search in time- dependent stochastic networks and its application to GIS-based traffic control</article-title>
          .
          <source>Computer Optics</source>
          <year>2016</year>
          ;
          <volume>40</volume>
          (
          <issue>2</issue>
          ):
          <fpage>275</fpage>
          -
          <lpage>283</lpage>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-2016-40-2-
          <fpage>275</fpage>
          -283.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Popov</surname>
            <given-names>SB.</given-names>
          </string-name>
          <article-title>The Big Data methodology in computer vision systems</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <year>2015</year>
          ;
          <volume>1490</volume>
          :
          <fpage>420</fpage>
          -
          <lpage>425</lpage>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>1613</fpage>
          -0073-2015- 1490-420-425.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ilyasova</surname>
            <given-names>NYu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kupriyanov</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paringer</surname>
            <given-names>RA</given-names>
          </string-name>
          .
          <article-title>Formation of features for improving the quality of medical diagnosis based on discriminant analysis methods</article-title>
          .
          <source>Computer Optics</source>
          <year>2014</year>
          ;
          <volume>38</volume>
          (
          <issue>4</issue>
          ):
          <fpage>851</fpage>
          -
          <lpage>855</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[7] An open platform for presenting data in the form of graphs GEPHI</article-title>
          . URL: https://gephi.org (
          <volume>13</volume>
          .
          <fpage>02</fpage>
          .
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Khotilin</surname>
            <given-names>МI</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blagov</surname>
            <given-names>АV</given-names>
          </string-name>
          .
          <article-title>Visual presentation and cluster analysis of social networks</article-title>
          .
          <source>Information Technologies and Nanotechnologies</source>
          ,
          <year>2016</year>
          ;
          <fpage>1067</fpage>
          -
          <lpage>1072</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Xu</surname>
            <given-names>X</given-names>
          </string-name>
          , et al.
          <article-title>Scan: a structural clustering algorithm for networks</article-title>
          .
          <source>Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM</source>
          ,
          <year>2007</year>
          ;
          <fpage>824</fpage>
          -
          <lpage>833</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Drobyshevskiy</surname>
            <given-names>MD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korshunov</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turdakov</surname>
            <given-names>DY</given-names>
          </string-name>
          .
          <article-title>Parallel modularity computation for directed weighted graphs with overlapping communities</article-title>
          .
          <source>Proceedings of the Institute of System Programming RAS</source>
          <year>2016</year>
          ;
          <volume>28</volume>
          (
          <issue>6</issue>
          ):
          <fpage>153</fpage>
          -
          <lpage>170</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>