<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The study of the relationship between publications in social networks communities via formal concept analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kristina Pakhomova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alina Belova</string-name>
          <email>belova94@mail.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Siberian Federal University</institution>
          ,
          <addr-line>Kraskoyarsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays the users generate considerable amount of information in the Internet. Therefore, they have a deal with the issue of how to retrieve the required information perhaps much more relevant than he/she supposed. Moreover, the mix of diferent data sources usually muddles users in the case of searching for the instructive information. The authors of this paper will introduce the approach based on text processing and formal concept analysis in order to structure the information from a variety of sources, particularly, social media communities. Additionally, they will clarify the relations between community posts with the same topic, where these relations become a recommendation tool for the user's decision making. In conclusion, the authors will build a diagram that will be a convenient visualization tool in an efort to structure information that was obtained from various sources.</p>
      </abstract>
      <kwd-group>
        <kwd>Formal Concept Analysis</kwd>
        <kwd>Semantic Analysis</kwd>
        <kwd>Social Network</kwd>
        <kwd>Data Mining</kwd>
        <kwd>Community topic</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Due to a rapidly increasing amount of data which is generated by users of the
Internet, how to deal with this data becomes the most important issue. For
instance, in 2019, 4.39 billion people were registered in services provided by social
networks, which is 366 million (9%) more than in January 2018 (information
provided by ’We Are Social agency’ and ’Hootsuite’ service [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). Social
networking services contain information that includes a variety of data, for instance,
text and media. Obviously, text information is publications in social networking
communities, each of those are described by a wide range of topics. In this case,
users need to quickly filter and analyze a large amount of information.
Currently, the filtering may be carried out in two ways: automatically, by using data
mining methods, and manually by the user. However, the relevant result of the
user request is an extremely important issue, but also the speed of request and
visualization of the result has value. According to the wide range of data mining
methods, the authors will explain a mathematical approach - formal concept
analysis (FCA) that satisfies the above criteria.
      </p>
      <p>
        Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).


[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
(, ,
      </p>
      <p>
        Current research that based on the analysis of social network data is
presented in the following works [
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">5,6,7,8</xref>
        ]. It is worth noting that the most
significant work is the one [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], in which scientists deal with social media data via
FCA.
      </p>
      <p>This paper includes four-section: an introduction, approach explanation,
experiment computation, and conclusion. Firstly, the authors will explain briefly
the theory of FCA and basics semantic analysis methods.Secondly, the authors
propose a solution based on methods of semantic data analysis and FCA.
Finally, they will explain an experiment implementation based on the social
network dataset and also they will propose to visualize the results of computation
by using concepts lattice as a diagram, in which each circle will be marked in a
certain color.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Computation approach</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Brief review about Semantic Analysis</title>
        <p>
          In order to study the text data ,in particular, the meaning of the text obviously
the authors deal with semantic analysis [
          <xref ref-type="bibr" rid="ref11 ref12 ref13">11,12,13</xref>
          ]. According to its theory in
general, the text should be subjected to the basic text manipulation methods
such as tokenize, lemmatize, and etc. We apply semantic analysis in order to
compute the set of keywords that will be related to the posts with the common
topic. Where set of posts 
and symbols is 
= { 1,  2, ..  }, 
= { 1,  2, ...,   },
        </p>
        <p>= 0, ...,  , then the set of words
= 0, ...,  when the word includes in text
⊆  for  ∈  and</p>
        <p>∈  . We deal with tokenize, lemmatizer, removing stop
words and parts of speech other than nouns in order to compute keywords. So the
final set of keywords is</p>
        <p>which satisfy 
between objects and attributes explains  ⊆ 
× 
for  ∈  , 
∈ 
when
, if object  has an attribute  .</p>
        <p>Formal concept is pair (,</p>
        <p>) : 
intent of formal concept (, 
between (2 , ⊆) and (2 , ⊆),  ′ = 
called the concept lattice B (, , 
⊆  ,</p>
        <p>
          ⊆ 
) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>satisfy the Galois connection
and  ′ =  . Where</p>
        <p>and  - extent and
). The ordered set of all formal concept forms is
2.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Concept lattice building</title>
        <p>
          In a previous subsection, the authors have defined the main mathematical method
founded on formal concept computation. Therefore, the ordered concepts build
the concept lattice, which is visualised by diagram, Hasse. Every set of formal
concepts has a great common subconcept as supremum. Its extent consists of
those objects that are common to all extents of the set. Every set of formal
concepts has a least common superconcept, the intent of which comprises all
attributes which all objects of that set of concepts have. Additionally, ordered
this way the set should satisfy the axioms defining a lattice, there are
commutative, associative, absorption laws. Thus, a complete lattice is an ordered
poset in which all subsets have both a supremum and an infimum [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. This
diagram consists of main elements such as circles are set of formal concepts, lines
explain the relation between formal concepts and labels. Notably, an attribute
can be reached from an object via an ascending path according to
subconceptsuperconcept hierarchy. Additionally, it satisfies if and only if the object has the
attribute.
        </p>
        <p>
          Each post includes specific metadata that describe users’ personal interest in
the particular post, they are likes and reposts. Concerning of those measures we
want to compute for every concept the average values of likes and reposts, and
after we are going to clustering the ordered set of concepts by using k-means
(KMeans) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] . This also will assist to fix the color of each circle of the diagram
in order to visualize the clusters of formal concepts.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Computation experiment</title>
      <p>
        The authors have investigated the social network dataset concerning the common
topic that takes place in the social network communities. Thus, this dataset
includes information about Id post, the text of the post, value of users’ attitudes,
and value of reposts (see Tab. 1). It was obtained from ’Vk’ social network, It
community, which text of the posts in Russian and English [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>In order to compute keywords,the authors used semantic analysis which was
explained before. We computed the formal context according to set of key-words
will be presented as attributes of a formal context and the set of objects - a
number of posts (see Tab. 2).</p>
      <p>According to FCA theory the set of formal concepts was computed. Although
the set of posts includes no more than 100 items, the number of obtained formal
concepts is quite huge,approximately 800 items. Moreover, for each formal
concept the average of users’ likes and average of reposts were computed (see Tab.
3).</p>
      <p>However, the great number of ordered formal concepts build a huge diagram,
for this reason, the authors have explained only the set of concepts and their
clustering. According to Tab. 3 in which an object ′ ′ takes place by
following the next set of formal concepts. We deal with diagram in order to
concentrate on users preferences,for instance, the user who takes an interest in
hiking a job or career development and he ∖ she takes an interest on ′ ′
may be recommended the next set of posts: 1297419, 1297266, 1296779 and
etc. This argument makes sense due to lattice lines properties that explain the
relation between formal concepts.</p>
      <p>By using the measures (likes and reposts) we computed three concept
clusters, where centroids according to likes are (  :  ) : ( :
108.333,  : 55.3985,  : 24.227), and reposts are ( : 9.1345,  :
3.254). Additionally, each circle of the diagram has its own color that explains
a cluster number. This opportunity allows users to visualize their searching and
to rank posts according to measures. For instance, the user concentrates on
′ ′ therefore high priority has 1296779 and after 1296779,1297419 as
stated by post likes. However, measure repost supports a few users so its values
explain only two clusters but another hand this measure makes more sense than
likes measure in the opinion of an issue of ranking.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and future work</title>
      <p>This paper has discussed the approach which manipulates the social network
dataset by using FCA, particularly, it assists the user of social media to deal with
communities’ posts. The authors take into account the specific framework of a
dataset that is satisfied with a variety of social services. Moreover, the authors
concentrated on such criteria as accessible, high quality, immediate, and relevant
information that can be provided to the user by his/her request. Although this
approach partially satisfies this number of criteria, so it will be tried to improve
in the future by using FCA advantages.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Ferr´e,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Huchard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kaytoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. O.</given-names>
            ,
            <surname>Napoli</surname>
          </string-name>
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Formal Concept Analysis: From Knowledge Discovery to Knowledge Processing</article-title>
          .
          <source>A Guided Tour of Artificial Intelligence Research</source>
          , pp.
          <fpage>411</fpage>
          -
          <lpage>445</lpage>
          , Springer Nature Switzerland (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts</source>
          .
          <source>Ordered Sets</source>
          , pp.
          <fpage>445</fpage>
          -
          <lpage>470</lpage>
          , Springer Netherlands (
          <year>1982</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal Concept Analysis</source>
          . Berlin.Heidelberg: Springer-Verlag (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Davey</surname>
            ,
            <given-names>B.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Priestley</surname>
            ,
            <given-names>H.A.</given-names>
          </string-name>
          :
          <article-title>Introduction to Lattices and Order</article-title>
          . Cambridge: Cambridge University Press (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Concept-based Recommendations for Internet Advertisement</article-title>
          . Palacky University, vol.
          <volume>433</volume>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>166</lpage>
          ,
          <string-name>
            <surname>Olomouc</surname>
          </string-name>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Medina</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pakhomova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramirez-Poussa</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Recommendation Solution for a Locate-Based Social Network via Formal Concept Analysis</article-title>
          .
          <source>Trends in Mathematics and Computational Intelligence. Studies in Computational Intelligence</source>
          , vol.
          <volume>796</volume>
          , pp.
          <fpage>131</fpage>
          -
          <lpage>138</lpage>
          , Springer, Cham (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cordero</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          et al.:
          <article-title>Knowledge discovery in social networks by using a logic-based treatment of implications</article-title>
          .
          <source>Knowledge-Based Syst</source>
          ,
          <string-name>
            <surname>Elsevier</surname>
            <given-names>B.V.</given-names>
          </string-name>
          , vol.
          <volume>87</volume>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>25</lpage>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Missaoui</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <source>Formal Concept Analysis of Social Networks</source>
          . Springer International Publishing (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>LNCS</given-names>
            <surname>Homepage</surname>
          </string-name>
          , https://www.web
          <article-title>-canape.ru/business/vsya-statistika-internetana-2019-god-v-mire-i-v-rossii/</article-title>
          .
          <source>Last accessed 19 Jun 2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. LNCS Homepage, https://vk.com/habr.
          <source>Last accessed 19 Jun 2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
            ,
            <given-names>E.: Natural</given-names>
          </string-name>
          <string-name>
            <surname>Language Processing with Python. O'Reilly Media</surname>
          </string-name>
          , Inc.,Gravenstein Highway North, Sebastopol (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,Schu¨tze,H.:
          <article-title>Foundations of Statistical Natural Language Processing</article-title>
          . MIT Press. Cambridge (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norvig</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Artificial Intelligence:
          <article-title>A Modern Approach, 3rd Edition</article-title>
          . Prentice Hall (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>MacQueen</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Some methods for classification and analysis of multivariate observations</article-title>
          .
          <source>In Proc. 5th Berkeley Symp. on Math. Statistics and Probability</source>
          , pp
          <fpage>281</fpage>
          -
          <lpage>297</lpage>
          ,(
          <year>1967</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>