<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The analysis method of primary data for monitoring social processes using Big Data and fuzzy criteria</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Alexander M. Bershadsky Doctor of technical sciences, professor Penza State University 40</institution>
          ,
          <addr-line>Krasnaya street, Penza</addr-line>
          ,
          <country country="RU">Russia 440026</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Alexander S. Bozhday Doctor of technical sciences, professor Penza State University 40</institution>
          ,
          <addr-line>Krasnaya street, Penza</addr-line>
          ,
          <country country="RU">Russia 440026</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Alexey Y. Timonin Postgraduate Penza State University 40</institution>
          ,
          <addr-line>Krasnaya street, Penza</addr-line>
          ,
          <country country="RU">Russia 440026</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>329</fpage>
      <lpage>335</lpage>
      <abstract>
        <p>The presented work deals with mathematical models' development for determining the significance degree of the social information. It contained in unstructured raw data from public web-sources. The significance criteria of social environment descriptions based on fuzzy logic and probability theory methods. Also, the set theory apparatus used to represent the extracted social objects. Results implementation allows to improve the data relevance and reliability determining in the social profiling system based on Big Data.</p>
      </abstract>
      <kwd-group>
        <kwd>Big Data</kwd>
        <kwd>data analysis</kwd>
        <kwd>evaluation</kwd>
        <kwd>fuzzy logic</kwd>
        <kwd>membership function</kwd>
        <kwd>probability theory</kwd>
        <kwd>set theory</kwd>
        <kwd>social management</kwd>
        <kwd>social profiling</kwd>
        <kwd>unstructured data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Modeling and analysis of social processes is currently based both on the classical techniques of sociology and various
theories of computer science [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The apparatus of graph theory, optimization methods, processing methods for
scaleinvariant [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and “small-world” [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] networks are used to describe the society structure. In different work, analysts are
increasingly resorting to collecting initial information from electronic sources of social media. This data is high available
and complete. Big Data, Data Mining, Machine Learning and visual analytics techniques are used for processing such
content. The society analysis is fraught with the difficulties of related activities. Some of them are the introducing
information technologies and developing new associated scientific approaches, ensuring the confidentiality of the processed
information, as well as obtaining timely and reliable results. The task of social profiling is no exception.
      </p>
      <p>
        The social profile (SP) [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] is a heterogeneous semantic network consisting of socialized data with varying structure
degrees. That data is defining a particular person or social groups. The static part of the social profile is a structured set that
uniquely identifies the selected subject from the social environment. The SP static part named the information card [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Its
elements are verified definitions presented in verbal form. The SP dynamic part consists of heterogeneous information
(such as text, multimedia, etc.) about personalities, social groups and phenomena. Dynamic content [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is used to determine
implicit dependencies and expand the information card. Social profiles can be applied [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in the tasks of assessing the
social tension level and response to socially significant events. Other applied spheres [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] are designing a customer
reliability rating for insurance and banking companies, compiling an employee’s portfolio for human resources
departments, issuing medical recommendations based both on lifestyle data and the environment states, etc.
      </p>
      <p>Therefore, it is critical to identify discrepancies in the collected information and determine the data relevance degree.
Results should be divided by importance in a particular research task. The initial SP information is taken from various
sources and has a significant size (up to several terabytes). There is a problem of determining its reliability in a finite time.
It is also impossible to exclude the human factor influence from the manual process of data`s social values. It is necessary
to use automated tools to solve these problems. The instruments should be based on formalized mathematical criteria for the
importance of social data.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Background</title>
      <p>Studies of textual data have been conducted in order to automatically determine their relevance and consistency by the
scientific community for a long time. Currently, Text Mining methods are used for this purpose. Over the past five years, a
sufficient number of scientific papers have been published, which served as sources of ideas for developing a criterion for
the significance of the initial social data. We give a description of quite interesting works.</p>
      <p>
        Abdul Ghaffar Shoro and Tariq Rahim Soomro note the possibilities of Big Data analysis and recognize meaningful
information from popular social networks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], when explore perspectives Apache Spark framework. Authors of paper
named “Learning to Match using Local and Distributed Representations of Text for Web Search” has develop [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] new
document ranking model outperforms traditional baselines based on neural networks. Research results of the knowledge
extraction from large clinical narratives in Bulgarian language are proposed in “Text Mining and Big Data Analytics for
Retrospective Analysis of Clinical Texts from Outpatient Care” [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        A number of articles are devoted to the use fuzzy logic methods in the semantic analysis of unstructured text and natural
language. Farman Ali provides [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] fuzzy decision-making solution to monitor transportation activities and to make a city-feature
polarity map for travelers in “Fuzzy ontology-based sentiment analysis of transportation and city feature reviews for safe
traveling”. “Public Sentiments Analysis Based on Fuzzy Logic for Text” devoted to the technique [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] defined as public
sentiments discriminator and considering both fuzzy logic and sentiment complexity. Amir Karami and others solved the problem
of increasing the productivity [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for processing large-scale medical documents by using fuzzy clustering in topic modeling.
      </p>
      <p>
        There are works on determining the records tonality from the Internet social media. Han Liu and Mihaela Cocea review
the concepts and techniques of granular computing in general, and focus on [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] the characteristics of fuzzy information
granulation in their investigation. The paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] of James M. Heilman, et. al. quantifies the production and consumption of
Wikipedia’s medical content in order to determine the demographic characteristics of authors and readers, assess the
completeness of the contained medical information.
      </p>
      <p>
        Also, there are researches on measuring the quantitative and bibliometric characteristics of information sources and
contained content. Mehdi Jafari and others simulate human-like methods by integrating fuzzy logic with traditional
statistical approaches to improve [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] textual summarization accuracy. Dejian Yu has make a retrospective analysis of
scientific publications for 50 years with text mining and bibliometric [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. He provides the complex list of bibliometric
characteristics.
      </p>
      <p>
        Works “Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media” of
Kashyap Popat [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and “Evaluating the credibility of English web sources as a foreign-language searcher” of Alyson L.
Young [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] are devoted to the methods of determining the unstructured texts reliability.
      </p>
      <p>
        We take as a basis the scientific principles and analytical technologies from the above works. The popularity of using
fuzzy logic methods in analysis of natural language texts was discovered. A methods review was conducted for assessing
the accuracy of textual data from the world wide web public sources. The most interpreted bibliometric parameters for
digital sources of textual data are determined. However, we note that the developed significance criterion must be adaptable
to use in the system of a social profile building [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Also, it will be applied to each individual object, connection or property
of the social environment within the proposed [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] social profile and social environment models.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 The determining problem of the information importance in the social profiling process</title>
      <p>
        The developed mathematical model [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] of the social environment provides information relating to the information cards of
individuals, groups and social phenomena. They are divided into actants — the definitions of real-world objects &amp;
phenomena, and predicates — the relations between actants. Each individual element of the social environment has its own
unique list of attributes, presented in tabular form. Let's consider them in more detail.
      </p>
      <p>A personal SP contains many of the names and pseudonyms of the concerned person. Also, it consists key dates of life
and related locations, personal characteristics and the most significant social phenomena, as well as links to profiles of other
people and social groups, including a list of the person`s social roles and group relationships types. All this have links to
sources in dynamic content. A conclusion is made about the mood of the person, both as a whole and in relation to the
surrounding objects of the social environment. It based on the analysis of the source data context and tonality containing the
elements presented above. Such content may include fairly subjective evaluation information, which is nonetheless
important for drawing up a complete picture of the social profile.</p>
      <p>In a group SP, its own unique parameters are added to the information from the profiles of its members. They can change
drastically over time. Therefore, the included personal profiles have timestamps in the group relation descriptions to
organize the analysis results. The complexity of defining the social groups boundaries also depends on the fact that they
may be informal, conditional and implicit. In view of this, we restrict ourselves to the description of the personal social
profile model only.</p>
      <p>
        Social phenomena are information about the objects and events of the social environment. They are connected with a
person or group in the task of a social profile building. Phenomena can be described directly or indirectly - when existence
of one indicated by other phenomena. What is more, they have a number of properties [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]: time and place of occurrence,
the sets of involved participants and accompanying real world objects, a list of sources confirming the existence of this
phenomenon, relationships between phenomena and other social objects, the related concepts that give a descriptive
characteristic of this phenomenon.
      </p>
      <p>All the above information should have the following qualities to achieve the most truthful and complete results:
- Reliability - the analyst must know whether the collected initial information is true or false, and to what extent. This
directly affects the veracity of the social profiling results.</p>
      <p>- Relevance - social profile data from public sources can be updated at different intervals, ranging from a few changes
per hour to just one change per week or even longer. The less time passes from the origin of a certain tangible event to its
recording into the sources of the initial SP data and from its gathering to the completion of the analytical processing, the
more accurately the results of the analysis will reflect reality.</p>
      <p>- Detail - also affects the final results of social profiling. This quality allows for a more in-depth analysis of the social
environment. The more properties an object has, the easier it is to compare one with other social environment elements.</p>
      <p>The initial data for a social profile construction is taken from a variety of public Internet sources. It is impossible to state
unequivocally that the sample will contain information that fully satisfies the indicated qualities. There may be a
contradictory or completely false information in the samples. Moreover, even if the SP consist of only true information, it is
not always fully used in further research due to the insignificant value of its individual components. Semantic value is
determined primarily by the number of links of the selected element with the surrounding social environment objects.
Therefore, it is required to introduce additional criteria related to information sources. The reliability and moderability show
resources that have a high reputation. The attendance provides information how much the resource is known, readability —
how popular the published content is. The appearance speed and the number of new unique publications give an idea how
active the authors of the content are.</p>
      <p>Accounting for these qualities leads to conduct activities to study the significance of the collected data variety used in the
social profiling process. As a result, each element from the social environment description must be assigned a certain
weighting factor. Weights show the conformity of the element to these qualities. Assessment of the quality indicators
carried out not only by computer means, but also handled by experts. They can determine arbitrary values from Absolute
Truth to Absolute Falsehood. Therefore, the apparatus of the fuzzy logic theory is the most suitable for modeling the
significance criterion of the initial social data.</p>
      <p>
        Then the mathematical model of personal social profile will take the form [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]:
      </p>
      <p>PSP  P  X , v  , R Y , u  , Q  m  n  ,
(1)
where P  P1,, P  – the social objects array of the person in question (themes, events, persons, etc.);
m
R  R ,, R  – the social connections between social objects array of the person in question;</p>
      <p>1 n
m, n - the number of social objects and connections in the personal SP, respectively, and m 1  n ;
Q(m  n)  {Q1,, Qmn} – the fuzzy set of “intonations” of social characteristics, fundamental for a person`s complex
mood assessment and depending on the number of social characteristics; it may take values in the interval: Q 1; 1 ;
X, Y – property sets of social objects and relationships, respectively;
v, u – weights of social objects and connections, respectively, they are fuzzy values.</p>
    </sec>
    <sec id="sec-4">
      <title>4 The model development for the significance criterion of SP objects</title>
      <p>
        Approaches to the processing [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] of text and multimedia data differ. We give their descriptions. The algorithm for
analyzing an unstructured social profile source text consists of the following steps:
1. Determining the language of the text;
2. Detection of the information card elements in the initial textual data;
3. The primary lexical analysis is the compilation of a linguistic network, which is the basis for representing a social
profile as a graph:
3.1. The division of text into logical parts;
3.2. Search for grammar foundations;
3.3. Identify the text tonality;
3.4. Determination of the predicates;
3.5. Highlighting definitions;
4. Secondary lexical analysis - grouping the elements of the resulting linguistic network:
4.1. Search for sentences with verbal constructions (keywords) from the information card;
4.2. Selection from the found sentences of words - candidates for the role of social objects and social connections;
4.3. Recursive processing of the source text to the manual depth, performing steps 4.1 - 4.2;
4.4. Determining the mood of social objects by summarizing of the total mood of their attributes. Search for emotionally
colored phrases in the information card descriptions, adding conclusions about the expert assessment of the associated
multimedia content tonality. In conclusion, the adjustment of the object`s mood is carried out considering the emotional
background of the social environment.
      </p>
      <p>A SP multimedia content is analyzed as follows:
1. Content is divided by nature (audio, images);
2. Content is divided into author's content and information about the selected person or group;
3. For author's content, multimedia objects are compared with existing samples from the Internet and the social profiles;
also, the service information is extracted to find out information about the studied person`s activities and preferences;
4. For information about the studied person, a detailed content-analysis is carried out using the available tools of
multimedia analytics (such as Machine Learning) in order to isolate essential information about the social environment
from the multimedia object itself;</p>
      <p>5. For all types of data, tonality analysis is carried out on the basis of the content expert assessment itself and the phrases
extracted from it;
6. The results of the analysis are recorded to the information card with links to the processed content.</p>
      <p>In this way, textualized information from multimedia content can also be evaluated. The most efficient way is to check
the data significance at the stage of secondary lexical analysis. The SP information card is filled at this step. It also shows
the semantic properties of the future social profile elements.</p>
      <p>
        We pay attention to the belief functions and plausible reasoning definitions from Dempster-Shafer evidence theory [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
in drawing up the data significance criterion. Schafer's approach allows us to interpret trust and credibility as interval limits
of the possible hypothesis truth. This is useful in assessing indicators whose probabilistic values are unknown. For example,
when the source of information previously not considered is included in the raw data sample, when the data is fragmentary
and incomplete for a comprehensive assessment, or when an expert has questions because of a lack of knowledge in a
certain area of interest.
      </p>
      <p>Initial data characteristics were selected empirically. The weight coefficients u, v are calculated on its basis. There are
verifiable difficulties, subjectivity of expert assessments and the prevalence of the human factor in data interpreting. It is
not possible to give an accurate importance assessment of an object or phenomenon. Formalized criteria can only be
considered ratings Vdeg, presented in numerical or percentage form. The fuzzy logic methods are proposed to solve this
problem. We defined the fuzzy variables that affect the value of the significance coefficient for SP entities: relevance (Vvol),
meaning (Vval), originality (Vunq), validity (Varg), credibility (Vver) of data and source dependability (Vauth). Their
characteristic functions µ are:</p>
      <p>– Relevance - the freshness of initial information. It actually represents the information appearance time into the raw
sample. We will consider the interval of updating SP information as one day. This value provides the convenience of
carrying out relevance calculations. The choice of a shorter period is not advisable, since the changes are mostly
insignificant during this time. A longer period may affect that some gathered information will be irrelevant at the collection
time.</p>
      <p>The fewer days that have passed since the publication until the moment of data collection or processing the more
relevant it is. We suppose the published social information will lose half its relevance a month later. Further reduction of
the relevance follows a power law. The 31 days interval was chosen because the people’s activities associated to calendar
cycles. For example, payouts and vacations are proportional to the months. The choice of the relevance weakness degree =
2 is justified by the fact that a person is usually socially active in the long term, but shortly he can have frequent intervals of
'silence'. We also determine that the information is most relevant during day of publication. Author may make changes to
the published message during this period, and readers may not be able to immediately get acquainted with it (e.g. when they
are in different time zones). This number will decrease in the future because the pace of life acceleration and the spread of
mobile telecommunications grow.</p>
      <p>However, sometimes information is not relevant yet at the collection time. This is usually represented by forecasts and
conditional assumptions. Past events information always relevant at the time of publication, as well as facts are not tied to
time.</p>
      <p>Vvol = {not actual yet, actual, not actual anymore} – possible values of relevance criteria;
vol = {no, yes} – shows the availability of more recent information;
time = {past, present, future} – shows the time of statements, which contain the social essence, to determine their
relevance;
xvol = [0; ∞) – elapsed time since publication;



0, vol  1  time  "future ";

Vvol  1, xvol  1  time  "past"; (2)
 1
 , xvol  1,
1  ( xvol 1)2
 30
– The meaning of information - shows the power of connectedness with other entities SP. This indicator depends on their
weights. We modify the class S membership function so that information is considered secondary if objects with the
connections no more than half of the all considered entity links, otherwise - primary.</p>
      <p>Vval = {meaningless, minor, major, critical} – information may not relate to the actual problem, have little or
great importance in solving it, and also be central by providing a task solution or opening previously unknown areas for
research;</p>
      <p>xval = [0; nlink] – the integer number of connections between social entities whose weight is greater than a certain
threshold value umin;
nlink – total number of unique connections with the entity under investigation;
(3)
(4)
</p>
      <p>Vval
0, xval  0;
</p>
      <p>2
 2xval , 0  xval 
 nlink 2
 
1 
 nlink 2

1, xval  nlink ,
2( xval  nlink )2
nlink
2</p>
      <p>;
,
nlink
2</p>
      <p> xval  nlink;

0, xauth  0;

Vauth  2xauth2 , 0  xauth  0.5;

1  2( xauth 1)2 , 0.5  xauth  1,
– The source dependability is determined empirically by an expert. There are no exact criteria. A zero value of the
indicator means that the source authority cannot be determined.</p>
      <p>Vauth = {not defined, low, high} – values showing how dependable the source is;
xauth = [0; 1] – average expert scale of authority;
– Data validity – defined by the existence of arguments, evidence, references of considered social element. The resulting
assessment is influenced by the fact that the sample contains the most reliable sources of information and the average level
of sources dependability. This is done in order to identify information noise: some social media resources copy facts from
the outside hastily. The assessment of this indicator more than 0.5 says that the considered fact has weighty evidence of
truth.</p>
      <p>Varg = {not enough, enough} – states of validity criteria;
xarg = [0; ∞) – number of possible arguments;
0, xarg  0;

 1
Varg   xarg , xarg  0, (5)</p>
      <p>1  e2(2maxxarg Vauth  xarVgauth )
– Data originality - the number of information mentions. This index used to determine the less common facts. This value
shows primary sources of information. Like validity, this indicator depends on the source dependability. It allows you to
separate the facts from the unsubstantiated rumors and assumptions. This simplifies the process of identifying verified
primary sources.</p>
      <p>Vunq = {no mention, few mention, widely known} – list of uniqueness values;
xunq = [0; ns] – the number of sources containing the analyzed statement, with a publication date less than in the
original;
ns = [1; ∞) – the total number of sources containing the subject statement from the SP;

1, xunq  0;

Vunq  1  2( xnunqs2Vauth )2 , 0  xunq  n2s ; (6)
 (ns 1    )2
 ns2xunq Vauth , n2s  xunq  ns,
– Data credibility - shows the different points of view (PoV) prevalence. It depends on the validity and originality. If
there is a single displayed opinion, we will consider it true until alternatives appear. Otherwise, the credibility of this PoV
will be depend on confidence indicator of others sights.</p>
      <p>Vver = {false, undefined, true} – possible states of credibility criteria;
xver = [0; ∞) – number of different points of view in initial dataset;
</p>
      <p>Vver</p>
      <p>1  maxixve1r  veri  , xver  1;
 </p>
      <p>1, xver  1,
where ver – the confidence indicator of proceeded data. It defined as the most authoritative opinion, which must either
be supported by many facts or be novel.</p>
      <p>ver  max Varg Vauth  , Vunq Vauth , nsmin  , (8)</p>
      <p>Thus, the summary mathematical representation of the weighting coefficient for data significance definition will be as
follows:</p>
      <p>u  Vvol  Vval umin   Vver  ver ,
The coefficient v is depicted similarly. The difference is that it focuses directly on the definition of actants.
(7)
(9)</p>
    </sec>
    <sec id="sec-5">
      <title>5 Discussion</title>
      <p>The introduction of the developed significance criterion for the initial text data allowed to detect and exclude from
consideration extraneous and fake information sources. Testing was conducted on a sample of unstructured social texts
from 30 public sources. These results were issued by Google search engine for the query “famous blockchain developers
biography”. Some selected resources in the sample contained a meaningless compilation of sentences and phrases borrowed
from trusted sources. Thereby they mimicking popular information resources. Some data sources were randomly sampled
and did not contains the required information.</p>
      <p>Also, the usage of a fuzzy significance criterion (relevance indicator) has simplified the sorting of links between social
objects in chronological order. It is a part of the ongoing research devoted to methods creation for social profiling task.</p>
      <p>In the future, it is planned to introduce the developed criteria into text-analytical module of the social profile building
system based on Big Data technologies. Detailed performance testing of the weight criterion will be carried out on a sample
of 1500 unique public web-pages with socialized data. If necessary, the characteristic functions of existing quantitative
indicators will be corrected to process bibliometric attributes.</p>
    </sec>
    <sec id="sec-6">
      <title>6 Conclusion</title>
      <p>The result of this work is a fuzzy criterion model. It determines the significance coefficient of social profile elements in a
comprehensive analysis. The criterion based on the such qualitative information characteristics as: connectivity, reasoning,
relevance, reliability, uniqueness of data and expert assessment of the source verification level. These parameters are
primary and necessary to identify outdated and unreliable information in the SP, as well as finding unreliable sources of
social media. The resulted fuzzy model can be implemented in the developed concept of the social environment states
description. It allows you to increase the consistency degree of the aggregated social profile elements. Also developed
significance criterion does simple the tonality analysis procedure of social media data by identifying the semantic content of
the particular textual elements. Next stage of our research work devoted to the development of modular implementation for
the software and instrumental complex of social profiling.</p>
      <p>Acknowledgments
This work is carried out with the support of the Russian Foundation for Basic Research grant №18-07-00408 in a research
project named “Fundamental theoretical bases development for self-adaptation of applied software systems”.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Ali</surname>
            <given-names>F.</given-names>
          </string-name>
          , et al.
          <article-title>Fuzzy ontology-based sentiment analysis of transportation and city feature reviews for safe traveling</article-title>
          .
          <source>Transportation Research Part C: Emerging Technologies</source>
          <volume>77</volume>
          : pp.
          <fpage>33</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Barabási</surname>
            <given-names>A. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Albert</surname>
            <given-names>R</given-names>
          </string-name>
          . Emergence of scaling in random networks // science,
          <year>1999</year>
          . - V.
          <volume>286</volume>
          , №. 5439. pp.
          <fpage>509</fpage>
          -
          <lpage>512</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bershadsky</surname>
            <given-names>A. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bozhday</surname>
            <given-names>A. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koshevoy</surname>
            <given-names>O. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Timonin</surname>
            <given-names>A. Y.</given-names>
          </string-name>
          <article-title>Social profiles - Methods of solving actual socioeconomic problems using digital technologies</article-title>
          and Big Data // Third International Conference «
          <source>Digital Transformation and Global Society (DTGS)»</source>
          ,
          <year>2018</year>
          , St. Petersburg, Russia, May 30 - June 2,
          <year>2018</year>
          ,
          <string-name>
            <given-names>Digital</given-names>
            <surname>Transformation</surname>
          </string-name>
          and
          <string-name>
            <given-names>Global</given-names>
            <surname>Society</surname>
          </string-name>
          .
          <source>DTGS 2018. Communications in Computer and Information Science</source>
          , vol
          <volume>858</volume>
          , pp.
          <fpage>436</fpage>
          -
          <lpage>445</lpage>
          , doi: 10.1007/978-3-
          <fpage>030</fpage>
          -02843-5_
          <fpage>35</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bershadsky</surname>
            <given-names>A. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bozhday</surname>
            <given-names>A. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Timonin</surname>
            <given-names>A. Y.</given-names>
          </string-name>
          <article-title>The Process of Personal Identification and Data Gathering Based on Big Data Technologies for</article-title>
          Social Profiles // First International Conference «
          <source>Digital Transformation and Global Society (DTGS)»</source>
          ,
          <year>2016</year>
          , St. Petersburg, Russia, June 22-24,
          <year>2016</year>
          , Volume
          <volume>674</volume>
          of the series Communications in Computer and Information Science, Springer International Publishing Switzerland, pp.
          <fpage>576</fpage>
          -
          <lpage>584</lpage>
          , ISBN:
          <fpage>978</fpage>
          -3-
          <fpage>319</fpage>
          - 49699-3, doi: 10.1007/978-3-
          <fpage>319</fpage>
          -49700-6_
          <fpage>57</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Boytcheva</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angelova</surname>
            <given-names>G.</given-names>
          </string-name>
          , Angelov
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Tcharaktchiev</surname>
          </string-name>
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Text mining and big data analytics for retrospective analysis of clinical texts from outpatient care</article-title>
          .
          <source>Cybernetics and Information Technologies</source>
          ,
          <volume>15</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>58</fpage>
          -
          <lpage>77</lpage>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Heilman</surname>
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>West</surname>
            <given-names>A.G.</given-names>
          </string-name>
          <article-title>Wikipedia and medicine: quantifying readership, editors, and the significance of natural language</article-title>
          .
          <source>Journal of medical Internet research</source>
          ,
          <volume>17</volume>
          (
          <issue>3</issue>
          ),
          <year>e62</year>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Jafari</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gheisari</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shahabi</surname>
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tao</surname>
            <given-names>X</given-names>
          </string-name>
          .
          <article-title>Automatic text summarization using fuzzy inference</article-title>
          .
          <source>In 2016 22nd International Conference on Automation and Computing (ICAC)</source>
          , pp.
          <fpage>256</fpage>
          -
          <lpage>260</lpage>
          . IEEE.
          <year>2016</year>
          , September.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Karami</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gangopadhyay</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karrazi</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Flatm</surname>
          </string-name>
          :
          <article-title>A fuzzy logic approach topic model for medical documents</article-title>
          .
          <source>In 2015 Annual Conference of the North American Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th World Conference on Soft Computing (WConSC)</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          ). IEEE.
          <year>2015</year>
          ,
          <year>August</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Liu</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cocea</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Fuzzy information granulation towards interpretable sentiment analysis</article-title>
          .
          <source>Granular Computing</source>
          <volume>2</volume>
          .4: pp.
          <fpage>289</fpage>
          -
          <lpage>302</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Mitra</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diaz</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Craswell N</surname>
          </string-name>
          .
          <article-title>Learning to match using local and distributed representations of text for web search</article-title>
          .
          <source>In Proceedings of the 26th International Conference on World Wide Web</source>
          (pp.
          <fpage>1291</fpage>
          -
          <lpage>1299</lpage>
          ).
          <source>International World Wide Web Conferences Steering Committee</source>
          .
          <year>2017</year>
          , April.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Popat</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukherjee</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strötgen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2017</year>
          , April).
          <article-title>Where the truth lies: Explaining the credibility of emerging claims on the web and social media</article-title>
          .
          <source>In Proceedings of the 26th International Conference on World Wide Web Companion</source>
          (pp.
          <fpage>1003</fpage>
          -
          <lpage>1012</lpage>
          ).
          <source>International World Wide Web Conferences Steering Committee.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Shafer</surname>
            <given-names>G.</given-names>
          </string-name>
          <article-title>Dempster-Shafer theory</article-title>
          .
          <source>Encyclopedia of artificial intelligence, 1</source>
          , pp.
          <fpage>330</fpage>
          -
          <lpage>331</lpage>
          .
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Shoro</surname>
            <given-names>A. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soomro</surname>
            <given-names>T. R.</given-names>
          </string-name>
          <article-title>Big data analysis: Apache spark perspective</article-title>
          .
          <source>Global Journal of Computer Science and Technology</source>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Sowa</surname>
            <given-names>J. F.</given-names>
          </string-name>
          <string-name>
            <surname>Conceptual Structures</surname>
          </string-name>
          : Information Processing in Mind and Machine // Reading, MA: Addison-Wesley,
          <year>1984</year>
          . - 481 p.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Timonin</surname>
            <given-names>A. Y.</given-names>
          </string-name>
          <article-title>Set-theoretic and mathematical modeling of social environment states // Izvestiya vysshikh uchebnykh zavedenii. Volga region</article-title>
          .
          <source>Engineering Sciences, no. 1</source>
          (
          <issue>49</issue>
          ), - Penza: PSU Publishing House,
          <year>2019</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>33</lpage>
          . doi:
          <volume>10</volume>
          .21685/2072-3059-2019-1-2
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Timonin</surname>
            <given-names>A. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bozhday</surname>
            <given-names>A. S.</given-names>
          </string-name>
          <article-title>Investigation of the analyzing process textual and multimedia social profile data from open information sources. Izvestiya vysshikh uchebnykh zavedenii. Volga region</article-title>
          . Engineering science, №
          <volume>2</volume>
          (
          <issue>42</issue>
          ), - Penza: PSU Publishing House,
          <year>2017</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>28</lpage>
          . doi:
          <volume>10</volume>
          .21685 / 2072-3059-2017-2-2
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Wang</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>H.</given-names>
          </string-name>
          , Xu Z.
          <article-title>Public sentiments analysis based on fuzzy logic for text</article-title>
          .
          <source>International Journal of Software Engineering and Knowledge Engineering</source>
          ,
          <volume>26</volume>
          (
          <issue>09n10</issue>
          ), pp.
          <fpage>1341</fpage>
          -
          <lpage>1360</lpage>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Watts</surname>
            <given-names>D. J.</given-names>
          </string-name>
          , Strogatz S. H. Collective dynamics of 'small-world' networks // Nature.
          <year>1998</year>
          . V.
          <volume>393</volume>
          (
          <issue>6684</issue>
          ). pp.
          <fpage>440</fpage>
          -
          <lpage>442</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Young</surname>
            <given-names>A. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Komlodi</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rózsa</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>Chu P. Evaluating the credibility of english web sources as a foreign-language searcher</article-title>
          .
          <source>In Proceedings of the 79th ASIS&amp;T Annual Meeting: Creating Knowledge, Enhancing Lives through Information &amp; Technology</source>
          (p.
          <fpage>42</fpage>
          ).
          <source>American Society for Information Science</source>
          .
          <year>2016</year>
          , October.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Yu</surname>
            <given-names>D.</given-names>
          </string-name>
          , Xu
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Pedrycz</surname>
          </string-name>
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          <string-name>
            <given-names>W.</given-names>
            <surname>Information Sciences</surname>
          </string-name>
          1968
          <article-title>-2016: A retrospective analysis with text mining and bibliometric</article-title>
          .
          <source>information sciences, 418</source>
          , pp.
          <fpage>619</fpage>
          -
          <lpage>634</lpage>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>