<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">VISUALIZATION AND CLUSTER ANALYSIS OF SOCIAL NETWORKS</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Khotilin</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Samara National Research University</orgName>
								<address>
									<settlement>Samara</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">A</forename><forename type="middle">V</forename><surname>Blagov</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Samara National Research University</orgName>
								<address>
									<settlement>Samara</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">VISUALIZATION AND CLUSTER ANALYSIS OF SOCIAL NETWORKS</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E51C669AFF88A50E02AED637FE076EA7</idno>
					<idno type="DOI">10.18287/1613-0073-2016-1638-843-850</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:16+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>big data</term>
					<term>graphs</term>
					<term>data visualization</term>
					<term>data analysis</term>
					<term>clustering</term>
					<term>modularity</term>
					<term>SCAN</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The article is devoted to the analysis of social networks, which are presented by graphs. The paper presents approaches to the modeling of distributions of social networks as well as the algorithms used for finding communities, as well as accounts that have the greater impact on the community.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>In the modern world it is continuously generated a huge amount of data, whether the data received from the satellite, or sensors in the aircraft, bank transactions, patient diagnostic data, etc. A special place is occupied by social networks. The significance of social networks is due to the fact that, on the one hand they are the subject of socialization of people, and on the other -the most powerful and affordable political, ideological and economic instrument <ref type="bibr" target="#b0">[1]</ref>. A number of papers are dedicated to researches of social networks as systems, which contain large volumes of data <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref>. Large amounts of data as well as the relationships (connections) between them must be present in comfortable and readable form. The data from social networks can be presented in various forms: a tag cloud, charts, historical flows <ref type="bibr" target="#b4">[5]</ref>, but graphs are more often used for this purpose.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Representation of the network as a graph</head><p>Generally, when it talks about objects representing the network, such as social, data visualization concept is closely related to the concept graphs. An important task is to present links in social networks to identify different kinds of dependencies.</p><p>The graph is a collection of non-empty set of vertices and the set of edges: (-set of vertices, -set of edges). The vertices in a graph, which describes the social network, are user accounts, and edges -the connections between them. For example, a subscription in the network such as Twitter, and the attitude of the "friendship" in social networks like The Facebook (figure <ref type="figure" target="#fig_0">1</ref>).  </p><formula xml:id="formula_0">     Ck k p ,   z e k k z k p   ! .</formula><p>Coefficients and are found for a particular segment of the social network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Clustering and communities finding algorithms based on the modularity</head><p>To simplify the graph, and also for finding the so-called "communities" in a social network, which is described by graph, the clustering is applied.</p><p>There are a number of algorithms and approaches for clustering, one of which is the modularity <ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref>. This functionality was proposed by Newman and Girvan in the process of developing clustering algorithm of graph vertices <ref type="bibr" target="#b6">[7]</ref>. Under the modularity means a scalar value from the interval [-1, 1], expressed by the following formula:</p><formula xml:id="formula_1">  j i j i j i ij C C m d d A , 2 2m 1 Q ,             ,</formula><p>where A -adjacency matrix, ij A -(i,j) element of matrix A , i d -the degree of i graph vertex, i Cthe label of the vertex (the number of the community, to which the vertex is belongs), m -the total number of edges in the graph,  </p><formula xml:id="formula_2">j i C C ,  -delta-function (one, if j i C C  , zero otherwise).</formula><p>The task of finding isolation of communities in the graph is reduced to search such i C , which will maximize the value of modularity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig. 3. The value of the modularity factor in the allocation of different clusters</head><p>The advantages of modularity may include the following:  modularity simply interpreted. Its value is equal to the difference between the proportion of edges within the community and expected share of links, if the edges were placed randomly;  it is possible to count modularity effectively with small changes in the clusters. However, there are also some disadvantages:  the functionality is not continuous, and the task of its optimization is discrete. The approximation schemes are used to find the global optimum. Some of them are really optimize the functional value, while others choose the value of the modularity by the best solution found, without warranty of local optimality of solutions;</p><p> there is a resolution problem (the functionality does not "see" small communities). This problem is solved by using the modified functionality, which retains all the advantages and adds scale parameter <ref type="bibr" target="#b6">[7]</ref>. As modularity describes the quality of separation of the graph in the groups, the problem of finding the optimal partition of the graph can be approached by solving the problem of maximizing of modularity. However, using a simple brute force method to solve this problem is almost impossible, since the number of options for separating n nodes into k groups grows exponentially with n To solve this problem the "greedy algorithm" of optimization of modular functions was proposed. This algorithm has its base in an association of two groups giving the highest increase modularity step by step. Let us consider some decomposition of the N nodes into k groups (Na set of nodes with the number of elements n) <ref type="bibr" target="#b7">[8]</ref>. The modularity function will be equal: ).</p><formula xml:id="formula_3">𝑄 1 = 1 𝑚 ∑ (</formula><p>This implies that the greatest growth of modularity occurs by combining groups 𝑁 𝑖 and 𝑁 𝑗 , for which the value:</p><formula xml:id="formula_4">∆(𝑁 𝑖 , 𝑁 𝑗 ) = 𝑚 𝑖,𝑗 − 𝑑(𝑁 𝑖 )𝑑(𝑁 𝑗 ) 2𝑚 is maximal.</formula><p>It is also seen that the combination of groups, between which there are no edges (𝑚 𝑖,𝑗 =0), can not give increasing of the modularity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Classification of vertices, finding the most significant, SCAN algorithm</head><p>Apart from finding a community in a social network which is represented by the graph, the classification of vertices in the graph has a considerable interest.</p><p>In the article <ref type="bibr" target="#b6">[7]</ref> the following classification of the vertices was introduced (see figure <ref type="figure">4</ref>):</p><p>-core -is the vertex which contains in ε-neighborhood at least μ vertices -hubis an isolated vertex which has neighbors belonging to two or more different clusters; -outlieran isolated all its neighbors of which either belong to only one cluster or do not belong to any cluster.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig. 4. An example of a graph with different types of vertices</head><p>To perform such classification a SCAN algorithm is proposed <ref type="bibr" target="#b6">[7]</ref>. The principle of its work is described as follows.</p><p>The search begins by first visiting each vertex once to find structure-connected clusters, and then visiting the isolated vertices to identify them as either a hub or an outlier (hub or outlier). SCAN performs one pass of a network and finds all structure-connected clusters for a given parameter setting. At the beginning all vertices are labeled as unclassified. SCAN classifies each vertex either a member of a cluster or not a member. For each vertex that is not yet classified, SCAN checks whether this vertex is a core. If the ver-tex is a core, a new cluster is expanded from this vertex. Otherwise, the vertex is labeled as not a member of the cluster.</p><p>To find a new cluster, SCAN starts with an arbitrary core V and search for all vertices that are structure reachable from V. This is sufficient to find the complete cluster containing vertex V. new cluster ID is generated which will be assigned to all found vertices. SCAN begins by inserting all vertices in 𝜀neighborhood of vertex V into a queue.</p><p>For each vertex in the queue it computes all directly reachable vertices and inserts those vertices into the queue which are still unclassified. This is repeated until the queue is empty.</p><p>The non-member vertices can be further classified as hubs or outliers If an isolated vertex has edges to two or more clusters, it is may be classified as a hub. Otherwise, it is an outlier <ref type="bibr" target="#b6">[7]</ref>.</p><p>A distinctive feature is the presence of parameters  and  , which can be set by the user or an expert. At the same time the finding of the optimal values of these parameters can be carried out using a machine learning system, using a certain network segments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiment and results</head><p>In order to test the algorithms mentioned above, the analysis of the community of Samara airsoft group "BPF" if social network "Vkontakte" was produced in order to identify possible communities by breaking into clusters. The adjacency matrix for the graph (in this case, the state of the friendship between members of the community to each other) is shown in Figure <ref type="figure">5</ref> Fig. <ref type="figure">5</ref>. The adjacency matrix</p><p>The graph with edges that describe the state of dependency of "friendship" and vertices -members of the group is shown in Figure <ref type="figure">6</ref>.</p><p>For this graph the modularity had been calculated. With the resolution of 0.5 the value of the modularity parameter was 0.021, while the graph has been divided into five communities. By the way in the communities data includes accounts of people who are:</p><p>-students together in one university; -graduated from the same school;</p><p>-those who have been invited by the participant Raguzin; -those who have been invited by the participant Korobov; -those who have been invited by the participant YuriNagulov. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>Presentation of social networks in the form of a graph and its further analysis, including clustering and finding dependencies, is an urgent task in Big Data. Using the methods described in this article and approaches permits to produce the classification of the social network segments and find the elements of greatest interest, for example, users, affecting several separate communities (in the graph representation -such as the edge of "hub"). When finding the power distribution of the nodes of the graph, which the social network, it is possible to carry out modeling of social networks with a given distribution. The algorithms presented in this article are planned for completion and use in the study of social segments of Samara region. The authors have developed the necessary tools to graph visualization of necessary social networks segments and distributed methods of processing of high-dimensional graphs.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. A fragment of the graph of social network "VKontakte" One of the main related characteristics that should be considered is the metric. The metric of the graph is based on the notion of distance. For example, we call the distance   ij d j x i x d  , between vertices i x and j x of the</figDesc><graphic coords="2,202.22,181.95,192.25,137.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. The distribution of the degrees of vertices</figDesc><graphic coords="2,213.80,522.51,169.10,132.79" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 6 .Fig. 7 .</head><label>67</label><figDesc>Fig. 6. The graph of the community This division into clusters can be seen in the figure below:</figDesc><graphic coords="7,182.55,359.60,231.60,167.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Now join the group i and j in one, which is denoted as an𝑁 𝑖∪𝑗 = 𝑁 𝑖 ∪ 𝑁 𝑗 . The modularity function for the new graph will be:The number of edges within the group 𝑁 𝑖∪𝑗 is equal to the sum edges within groups 𝑁 𝑖 and 𝑁 𝑗 plus the number of edges between them. In other words: 𝑚 𝑖∪𝑗 = 𝑚 𝑖 + 𝑚 𝑗 + 𝑚 𝑖,𝑗 The degree of the united group 𝑁 𝑖∪𝑗 is equal to the sum of groups of degrees 𝑁 𝑖 and 𝑁 𝑗 : 𝑑(𝑁 𝑖∪𝑗 ) = 𝑑(𝑁 𝑖 ) + 𝑑(𝑁 𝑗 ),</figDesc><table><row><cell></cell><cell></cell><cell>𝑘 𝑙=1,𝑙≠𝑖,𝑙≠𝑗</cell><cell cols="2">𝑚 𝑙 −</cell><cell cols="2">(𝑑(𝑁 𝑙 )) 2 4𝑚</cell><cell>) +</cell><cell>1 𝑚</cell><cell>(𝑚 𝑖 + 𝑚 𝑗 −</cell><cell>(𝑑(𝑁 𝑖 )) 2 + (𝑑(𝑁 𝑗 )) 2 4𝑚</cell><cell>)</cell><cell>.</cell></row><row><cell>𝑄 2 =</cell><cell>1 𝑚</cell><cell cols="3">∑ (𝑚 𝑙 − 𝑘 𝑙=1,𝑙≠𝑖,𝑙≠𝑗</cell><cell cols="2">(𝑑(𝑁 𝑙 )) 2 4𝑚</cell><cell>) +</cell><cell>1 𝑚</cell><cell>(𝑚 𝑖∪𝑗 −</cell><cell>(𝑑(𝑁 𝑖∪𝑗 )) 2 4𝑚</cell><cell>) .</cell></row><row><cell cols="3">Consequently:</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="10">(𝑑(𝑁 𝑖∪𝑗 )) 2 = (𝑑(𝑁 𝑖 )) 2 + (𝑑(𝑁 𝑗 )) 2 + 2𝑑(𝑁 𝑖 )𝑑(𝑁 𝑗 ).</cell></row><row><cell cols="8">Taking this into account, we obtain:</cell><cell></cell></row><row><cell cols="3">∆𝑄 = 𝑄 2 − 𝑄 1 =</cell><cell>1 𝑚</cell><cell cols="2">(𝑚 𝑖,𝑗 −</cell><cell cols="4">2𝑑(𝑁 𝑖 )𝑑(𝑁 𝑗 ) 4𝑚</cell><cell>) =</cell><cell>1 𝑚</cell><cell>(𝑚 𝑖,𝑗 −</cell><cell>𝑑(𝑁 𝑖 )𝑑(𝑁 𝑗 ) 2𝑚</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Social-network-sourced big data analytics</title>
		<author>
			<persName><forename type="first">W</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">W</forename><surname>Blake</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Saleh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dustdar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Internet Computing</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="62" to="69" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">How people describe themselves on Twitter</title>
		<author>
			<persName><forename type="first">K</forename><surname>Semertzidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pitoura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tsaparas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks</title>
				<meeting>the ACM SIGMOD Workshop on Databases and Social Networks</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="25" to="30" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Big Data Instruments for Social Media Analysis</title>
		<author>
			<persName><forename type="first">A</forename><surname>Blagov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Rytcarev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Strelkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Khotilin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International Workshop on Computer Science and Engineering</title>
				<meeting>the 5th International Workshop on Computer Science and Engineering</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="179" to="184" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Real-time analysis of parameters of multiple object detection systems</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">I</forename><surname>Protcenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">L</forename><surname>Kazanskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">G</forename><surname>Serafimovich</surname></persName>
		</author>
		<idno type="DOI">10.18287/0134-2452-2015-39-4-582-591</idno>
	</analytic>
	<monogr>
		<title level="j">Computer Optics</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="582" to="591" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note>In Russian</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Big Data technologies and different methods of presenting large data</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Ivanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Lopukhovsky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Science and innovations</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="1" to="10" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note>In Russian</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Optimal design of spatial distribution networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Gastner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Newman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Physical Review</title>
		<imprint>
			<biblScope unit="volume">74</biblScope>
			<biblScope unit="page" from="16117" to="016126" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Finding and evaluating community structure in networks</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Newman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Girvan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Physical Review</title>
		<imprint>
			<biblScope unit="volume">69</biblScope>
			<biblScope unit="page" from="26113" to="026115" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Scan: a structural clustering algorithm for networks</title>
		<author>
			<persName><forename type="first">X</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the 13th ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="824" to="833" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Fast algorithm for detecting community structure in networks</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Newman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Physical Review</title>
		<imprint>
			<biblScope unit="volume">69</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="66133" to="066135" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
