Approaches To Merging Linguistic Values — Users
Relationships
Anastasiia O. Khlobystovaa,b , Alexander L. Tulupyeva,b
a
    St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia
b
    St. Petersburg State University, St. Petersburg, Russia


                                         Abstract
                                         Social engineering attacks based on the human factor have long been the most frequently used in vi-
                                         olation of the information security policies. One of the ways to increase the organization’s level of
                                         protection against social engineering attacks is building a social graph of the organization’s employ-
                                         ees and its analysis. The nodes of such graph associated with users of the information system, and
                                         edge designate the relationships between them. Moreover, this kind of information can be obtained by
                                         analyzing social networks. However, often users have accounts in different social networks, and the
                                         information presented in them is different. The purpose of this article became to propose approaches to
                                         merging probabilistic estimates of the relationship between users, which are linguistic values of linguis-
                                         tic variable "type of relationship". The theoretical significance of the results lies in the proposal of new
                                         approaches to the merging of probabilistic estimates of linguistic variables, the practical significance
                                         consist in creating the basis for further analysis of the social graph of the organization’s employees,
                                         in particular, for detecting the most critical trajectories of attack development or solving backtrack-
                                         ing tasks of social engineering attacks, e.i. the investigation of cyber crime committed by using social
                                         engineering techniques.

                                         Keywords
                                         social engineering attacks, interaction intensity estimates, linguistic variable values, soft computing,
                                         merging social networks


1. Introduction
For a long time in the field of information security, one of the least developed sections remains
the issue of user protection from cyberattack [1]. Over the years, information security spe-
cialists have been developing technical means of protection against hack, for example, against
DoS attacks [2], packet sniffing [3], special programs (viruses, worms, trojans) [4] etc. At the
same time, organizations are still being unprotected due to the lack of employees’ awareness
about cybersecurity and, in particular, about social engineering attacks. We consider social
engineering attack as a set of applied psychological and analytical methods which malefactors
use for users’ motivation in terms of public or corporate network in relation to violations of
the settled rules and politics in the field of information security [5].


Russian Advances in Artificial Intelligence: selected contributions to the Russian Conference on Artificial intelligence
(RCAI 2020), October 10-16, 2020, Moscow, Russia
" aok@dscs.pro (A.O. Khlobystova); alt@dscs.pro (A.L. Tulupyev)

                                       © 2020 Copyright for this paper by its authors.
                                       Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
   This information is confirmed by the statistics of cybersecurity incidents both in Russia
[6] and abroad [7]. Moreover, even a small group of malefactors can commit more than 2
thousand cybercrimes in a short period of time, such information is proved in source [8], which
describes about the detention of three unemployed young people from St. Petersburg who stole
more than 4 million rubles within six months, by creating phishing “clones” of famous brands.
Also Ria News have reported the number of cybercrimes in Russia over 5 years has grown by
more than 25 times [9], and one of the most common types of cybercrime in Russia are social
engineering [10]. In addition, one of the largest banks in Russia expects that in 2021 the Russian
economy will lose up to 7 trillion rubles from cybercrime [11].
   This requires the development of effective and robust methods, models, methodologies and
automated tools against hack with applied social engineering. And since social engineering
attacks are directed at people, the overall research direction consists in increasing the level of
protection of users of information systems from social engineering attacks.

1.1. Prerequisites for research
One of the important steps to achieve this goal is to analyze the security of users from such
threats. Automation methods for building estimates of the security of users of information
systems from social engineering attacks were proposed in [5, 12]. In particular, the authors
developed a set of models “critical documents — information system — user — malefactor”
used to analyze user security, indirectly, critical documents and simulate scenarios of the social
engineering attack.
   One of the most used sources of information in social engineering attacks is social networks,
so according to [13] attacks on accounts on social networks are considered very effective. How-
ever, malefactor is not confined to an account on only one social networks. Often users use
different social networks for different purposes. In this regard, for the analysis of user secu-
rity, it is important to consider information from different social networks. This in turn raised
questions about aggregation and merging of data. The tasks of merging user accounts from
different social networks were discussed in [14, 15, 16]. But beyond merging user profiles, it is
also necessary to compare estimates of the probability of attack propagation for social graphs
built on different social networks [17]. That is, in other words, it is necessary to compare the
estimates of the probability of attack propagation. Thus, the purpose of this study is to pro-
pose approaches to the partial merging of social graphs, that is to compare estimates of the
probability of attack propagation for different social graphs obtained on the basis of data from
different social networks.

1.2. Related Work
The "Friend-of-a-Friend" technology, merging social graphs from various social networks into
one database, is described in [18]. However, the authors do not consider the problem of merg-
ing inconsistent data that belong to the same category. The analysis of social graphs in the
context of rumour spreading in social networks is also considered in [19]. Its authors pro-
pose an approach to identifying and blocking nodes that are most likely to disseminate a large
amount of false information. The work may be useful in developing approaches to the analysis
of a social graph in the context of identifying the most probability paths for the spread of the
social engineering attack, but in it only the values “friendship”, “follows” and “subscriptions”
are considered as edges of a social graph, which complicates the applicability to the present
study. The purpose of the study [20] is to propose a scheme for matching user content iden-
tifiers that are publicly available on social networks. Namely, the authors propose a method
based on natural language processing and text mining. The study is relevant and useful when
aggregating data from social networks, but it does not address the issues of combining infor-
mation about the interaction between users. A basis for this research was the work of [5, 21] in
which approaches to the construction and analysis of the social graph of employees were pro-
posed, methods for quantifying linguistic variables (the relationships between users associated
with edges of social graph employees of the organization).


2. Problem statement
                                                                                            {            }
Let the user’s profile 𝑈𝑖 and list of friends be known in the first social network 𝐹 = 𝑈𝑗1 , ..., 𝑈𝑗𝑛 (𝑗𝑘 ≠
𝑖, 1 ≤ 𝑘 ≤ 𝑛),  { also in another
                              }     social network, this user corresponds to profile 𝑈𝑖′ and list of
friends 𝐹 ′ = 𝑈𝑗′1 , ..., 𝑈𝑗′𝑛 (𝑗𝑘 ≠ 𝑖, 1 ≤ 𝑘 ≤ 𝑛). In this case, the user profile 𝑈𝑗𝑘 ∈ 𝐹 (𝑗𝑘 ≠ 𝑖, 1 ≤ 𝑘 ≤
𝑛) corresponds to the user profile 𝑈𝑗′𝑘 ∈ 𝐹 ′ (𝑗𝑘 ≠ 𝑖, 1 ≤ 𝑘 ≤ 𝑛). Obtaining this information relates
to the task of merging user profiles in different social networks and has successful solutions
[14, 22].
    It is known that 𝐸𝑖𝑗𝑘 corresponds to relationship between users 𝑈𝑖 and 𝑈𝑗𝑘 (𝑗𝑘 ≠ 𝑖, 1 ≤ 𝑘 ≤ 𝑛),
𝐸𝑖𝑗𝑘 relationship between 𝑈𝑖′ and 𝑈𝑗′𝑘 (𝑗𝑘 ≠ 𝑖, 1 ≤ 𝑘 ≤ 𝑛). In this case 𝐸𝑖𝑗𝑘 and 𝐸𝑖𝑗′ 𝑘 can be different
  ′

from each other. An example of this is social graphs on the “VKontakte” and “Instagram”:
in “VKontakte” users may be relatives (they are indicated at each other in the corresponding
public lists of friends), and in “Instagram” one of them may be follows for other. Figure 1
illustrates an example of merging user profiles and their relationships in two different social
networks. At the moment, we do not consider the comparison of relations between the friends
of the user, since for them everything will be the same.
    According to [5] when constructing a social graph of the organization’s employees for the
purpose of subsequent analysis to identify the most vulnerable places to social engineering
attacks, each relationship between users (𝐸𝑖𝑗𝑘 ) is associated with a probabilistic estimate (𝑝𝑖𝑗𝑘 ),
obtained by analyzing user interactions in social networks (assignment to any category from
the list of friends, as well as the availability of shared photos, information about likes, reposts,
comments, etc.). Based on the study [21], 𝑝𝑖𝑗𝑘 can be obtained by quantifying the types of user
relationships. For example, on a social network, information about the relationship between
users can be obtained by looking at the public list of friends of the user or the “personal infor-
mation” section on the profile’s main page. Similar actions can be performed with other social
networks.
    So with this approach, 𝐸𝑖𝑗𝑘 is a linguistic variable “type of relationship”, which characterize
1-to-1 relation between two users of this network.
    Let us remember that by a linguistic variable is meant a variable whose values are words
or sentences in a natural or artificial language [23, 24]. Linguistic variable is a quintuple
(𝐿, 𝑇 (𝐿), 𝑈 , 𝐺, 𝑀) in which
Figure 1: Merging example of communication between users.


    • 𝐿 is the name of the variable;
    • 𝑇 (𝐿) is the term-set of 𝐿, that is, the collection of its linguistic values;
    • 𝑈 is a universe of discourse;
    • 𝐺 is a syntactic rule which generates the terms in 𝑇 (𝐿);
    • 𝑀 is a semantic rule which associates with each linguistic value 𝑋 its meaning, 𝑀(𝑋 ) ,
      where 𝑀(𝑋 ) denotes a fuzzy subset of 𝑈 .
  Thus, for the case under consideration:
    • 𝐿 is “type of relationship”;
    • 𝑇 (𝐿) depends on the social network in question, for example, if the social network is
      “VKontakte” in our study 𝑇 (𝐿) = “Friends”+ “Best friends”+ “Colleagues”+ “School friends”+
      “University friends”+“Family“+ “Grandparents“+ “Parents“+ “Siblings“+ Children“+ “Grandchildren“+
      “In a relationship“+ “Engaged“+ “Married“+ “In a civil union“+ “In love“+ “It’s compli-
      cated“;
    • 𝑈 is set of values from [0, 1], denote the strength of the relationship between users;
    • 𝐺 is determined depending on the social network in question;
    • 𝑀 is a modified method by Khovanov described in [22, 21].
   Thus, the purpose of this article became to propose approaches to merging probabilistic
estimates of the relationship between users, which are linguistic values of linguistic variable
"type of relationship".
3. Approaches to merging probabilistic estimates of the
   relationship between users
This section provides approaches to merging probabilistic estimates of user relationships.

3.1. Highlighting the strongest communication
This approach is based on the assumption that users can subconsciously select one of the social
networks and be more active in it, including posting more detailed information about them-
selves and their social environment. In addition, often different social networks are used for
different purposes, in this regard, in one of them, for example, there will be more information
related to the work of the user, and in the other with family ties.
   Based on the approach to quantification of numerical estimates proposed in [22, 21], we will
consider the numerical probabilistic estimates characterizing the relationship between users to
be known (Table 1 and Table 1).

Table 1
Probabilistic estimates of the the linguistic values variable “type of relationship" from social network
"VKontakte"

                       Types of relationships   Mapped estimate of probability
                       Friends                             0.2938
                       Best friends                        0.7838
                       Colleagues                          0.4074
                       School friends                      0.4443
                       University friends                  0.3686
                       Family                              0.3641
                       Grandparents                        0.2507
                       Parents                             0.3421
                       Siblings                            0.4398
                       Children                             0.41
                       Grandchildren                       0.3474
                       In a relationship                   0.3075
                       Engaged                             0.3107
                       Married                             0.3793
                       In a civil union                    0.3223
                       In love                             0.4189
                       It’s complicated                    0.1922


   In this case, we can compare the linguistic values of the relationship variable, revealing a
connection with a larger rating and designating it as a stronger one. Then match this connec-
tion to the social graph edge. For example, let us want to compare the relationship between the
profiles of “VKontakte” and “Instagram” users. It is noted that “VKontakte” users are connected
by relationships 𝐸𝑖𝑗 — “school friends” (probabilistic assessment corresponds to this type of re-
lationship is 𝑝𝑖 𝑗 = 0.4443), in “Instagram” 𝐸𝑖𝑗′ marked as “𝑖 like 𝑗” (𝑝𝑖′ 𝑗 = 0.34). Then the “school
Table 2
Probabilistic estimates of the the linguistic values variable “type of relationship" from social media
"Instagram"

                     Types of relationships       Mapped estimate of probability
                     I follow                                 0.48
                     I like                                   0.34
                     I comment                                 0.4
                     I have photo with tag X                  0.62
                     X follows                                0.43
                     X likes                                  0.38
                     X comments                               0.43
                     X have photo with tag me                 0.54
                     Followers                                0.51
                     Common geotag                            0.42
                     Common hashtag                           0.36
                     X is a selebrity                         0.57


friends” connection is stronger than the connection “𝑖 like 𝑗” (𝐸𝑖𝑗 ≻ 𝐸𝑖𝑗′ ), therefore, in a further
analysis of the relationship between 𝑖 and 𝑗 will be considered “school friends” and estimated
in 𝑝𝑖 𝑗 = 0.4443.

3.2. Weighted average
With knowledge of the reliability of sources, a combination of estimates can be obtained using
a weighted average [25, 26]. Thus, the merging probabilistic relationship assessment will have
          ⌢
the form 𝑝 𝑖𝑗 = 𝑝𝑖𝑗 ⋅ 𝑤𝑖𝑗 + 𝑝𝑖𝑗′ ⋅ 𝑤𝑖𝑗′ , where 𝑝𝑖𝑗 and 𝑝𝑖𝑗′ is probabilistic estimates obtained using
relationship information about𝐸𝑖𝑗 and 𝐸𝑖𝑗′ respectively, 𝑤𝑖𝑗 , 𝑤𝑖𝑗′ is weights such that 𝑤𝑖𝑗 + 𝑤𝑖𝑗′ = 1.
   At the same time, knowledge about the reliability of sources can be obtained in different
ways, for example, by quantifying expert opinions or by obtaining estimates of the communi-
cation activity of users in social networks [27]. This approach will be explored further as part
of further research. In addition, in the future it is planned to consider the possibility of apply-
ing the conjunctive combination rule [28], or use in a weighted average estimate of weights
based on inverse dispersion.


4. Result
The approach based on highlighting the strongest connection is useful and convenient to use
in case of differences in the probabilistic estimates of relationships. It is expected that this
approach will give an optimal result when applied in the analysis of the social graph of the or-
ganization’s employees in the context of protection from social engineering attacks. However,
the question of the applicability of this approach to other tasks in the analysis of social net-
works has not been studied. In addition, it may not be applicable in the case of approximately
equal estimates. Also, in [29] it is noted that the propagation of information in a social graph
will be more difficult by the removal of a weak connection, which also confirms the need to
verify proposed approach.
   While the approach based on the weighted average value can be applied regardless of the
values of probabilistic estimates. Nevertheless, it requires additional, more in-depth studies to
find the optimal weights. It is also assumed that in the future these two approaches can be
combined.


5. Conclusions
Thus, the article proposed approaches to the merging of probabilistic estimates of the relation-
ship between users, based on the assumption that these probabilistic estimates are obtained by
quantification. The theoretical significance of the results lies in the proposal of new approaches
to the merging of probabilistic estimates of linguistic variables, the practical significance is seen
in creating the basis for further analysis of the social graph of the organization’s employees,
in particular, for detecting the most critical trajectories of attack development or solving back-
tracking tasks of social engineering attacks.


Acknowledgments
The work was carried out as part of the project according to the state task SPIIRAS No. 0073-
2019-0003, with financial support from the Russian Foundation for Basic Research, project No.
18-01-00626 — Methods for representation, truth estimates synthesis, and machine learning in
algebraic Bayesian networks and related models of knowledge with uncertainty: probabilistic-
logic approach and graph systems; Project No. 20-07-0083 — Digital twins and soft computing
in social engineering attacks modelling and associated risks assessment.


References
 [1] Sophos Whitepaper 2020, The impossible puzzle of cybersecurity: Results of
     an independent survey of 3,100 it managers commissioned by sophos, Sophos
     Whitepaper.        URL:       https://secure2.sophos.com/en-us/medialibrary/Gated-Assets/
     white-papers/sophos-impossible-puzzle-of-cybersecurity-wp.pdf,            2020.      URL:
     https://secure2.sophos.com/en-us/medialibrary/Gated-Assets/white-papers/
     sophos-impossible-puzzle-of-cybersecurity-wp.pdf.
 [2] Q. G. J. L. Q. Yan, F.R. Yu, Software-defined networking (sdn) and distrib-uted de-
     nial of service (ddos) attacks in cloud computing environments: A survey, some re-
     search issues, and challenges., IEEE communications surveys & tutorials 18 (2016, doi:
     10.1109/COMST.2015.2487361) 602–622.
 [3] G. B. V. B. J. G. C. F. G. R.-G. V. Elamaran, N. Arunkumar, Exploring dns, http, and icmp
     response time computations on brain signal/image databases using a packet sniffer tool.,
     IEEE Access, (992018) 6 (2018, doi: 10.1109/ACCESS.2018.2870557) 59672–59678.
 [4] M. R. V. Chang, Y. Kuo, Cloud computing adoption framework: A security frame-
     work for business clouds.,         Future Generation Computer Systems 57 (2016, doi:
     10.1016/j.future.2015.09.031) 24–41.
 [5] A. T. M.V. Abramov, T.V. Tulupyeva, Social Engineering Attacks: social networks and user
     security estimates, SUAI, St. Petersburg, 2018.
 [6] Lenta2020, Telephone fraud affects one third of russians, Lenta.ru. URL: https:
     //news.mail.ru/society/38435225/?frommail=1, 2020. URL: https://news.mail.ru/society/
     38435225/?frommail=1.
 [7] Sputnik2020, Almost half of the inhabitants of lithuania were deceived
     by telephone and internet scammers, Sputnik, URL: https://www.kurier.lt/
     polovina-zhitelej-stalkivalas-s-elektronnymi-ili-telefonnymi-moshennikami/,          2020.
     URL: https://www.kurier.lt/polovina-zhitelej-stalkivalas-s-elektronnymi-ili-telefonnymi-moshennikami/.
 [8] SecurityLab 2020, St. petersburg police caught three phishers, SecurityLab.Ru. News.
     URL: https://www.securitylab.ru/news/509364.php, 2020. URL: https://www.securitylab.
     ru/news/509364.php.
 [9] RIA News 2020, The head of group-ib called the most common types of cybercrime in
     russia, Ria News. URL: https://ria.ru/20200617/1573066952.html, 2020. URL: https://ria.ru/
     20200617/1573066952.html.
[10] RIA 2020, Sberbank warned of new coronavirus schemes of fraudsters, Ria
     News.        URL:       https://yandex.ru/turbo/s/forbes.ru/newsroom/finansy-i-investicii/
     397727-sberbank-predupredil-o-novyh-shemah-moshennichestva-na-fone,
     2020.       URL:        https://yandex.ru/turbo/s/forbes.ru/newsroom/finansy-i-investicii/
     397727-sberbank-predupredil-o-novyh-shemah-moshennichestva-na-fone.
[11] Kommersant 2020, Sberbank predicts economic losses from cybercrime in 2021 to 7 trillion
     rubles, Kommersant. URL: https://www.kommersant.ru/doc/4381088, 2020. URL: https:
     //www.kommersant.ru/doc/4381088.
[12] A. S. A. T. M. A. R. U. A.A. Azarov, T.V. Tulupyeva, Social Engineering Attacks: The prob-
     lems of analysis, Nauka, St. Petersburg, 2016.
[13] D. Katalkov, How social engineering opens the door to your organization for a hacker.,
     Positive Research 2018. Practical Safety Research Digest., URL: https://www.ptsecurity.
     com/upload/corporate/ru-ru/analytics/Positive-Research-2018-rus.pdf, 2018. URL: https:
     //www.ptsecurity.com/upload/corporate/ru-ru/analytics/Positive-Research-2018-rus.
     pdf.
[14] M. A. A. T. A.A. Korepanova, V.D. Oliseenko, Application of machine learning methods
     to the user accounts identification in two social networks, Computer tools in education
     3 (2019) 38–43.
[15] T. T. A.A. Korepanova, User identification across different social networks through social
     circles, in: Information Security of Russian regions (ISRR-2019). XI St. Petersburg inter-
     regional conference., volume 3 of Proceedings of the conference., SPOISY, St. Petersburg.,
     2019, pp. 442–443.
[16] R. H. T. M. Y. Yang, H. Yu, A fusion information embedding method for user iden-
     tity matching across social networks., 2018 IEEE SmartWorld, Ubiqui-tous Intelli-
     gence & Computing, Advanced & Trusted Computing, Scalable Computing & Commu-
     nications, Cloud & Big Data Computing, Internet of People and Smart City Innova-
     tion (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) 18 (2018, doi: 10.1109/Smart-
     World.2018.00340) 2030–2035.
[17] A. T. A.O. Khlobystova, M.V. Abramov, Identifying the most critical trajectory of the
     spread of a social engineering attack between two users., in: The Second International Sci-
     entific and Practical Conference “Fuzzy Technologies in the Industry – FTI 2018”., CEUR
     Workshop Proceedings, 2018, pp. 38–43.
[18] B. K. A.Y. Denzhakov, Methods and means of formalizing data in social networks., Actual
     problems of the humanities and natural sciences 12 (2010) 44–48.
[19] K. L. A.I.E. Hosni,        Minimizing the influence of rumors during breaking news
     events in online social networks.,              Knowledge-Based Systems 18 (2019, doi:
     10.1016/j.knosys.2019.105452).
[20] B. R. D.K. Srivastava, Words are important: A textual content-based identity resolution
     scheme across multiple online social networks., Knowledge-Based Systems 195 (2020, doi:
     10.1016/j.knosys.2020.105624) 105624.
[21] A. T. A.O. Khlobystova, M.V. Abramov, Soft estimates for social engineering attack prop-
     agation probabilities depending on interaction rates among instagram users., in: D. V.
     E. B. D. I. M. Kotenko I., Badica C. (Ed.), Intelligent Distributed Computing XIII. IDC
     2019., Studies in Computational Intelligence, Springer, Cham, 2019, doi: 10.1007/978-3-
     030-32258-8_32, pp. 272–277. doi:10.1007/978-3-030-32258-8_32.
[22] T. T. A. K. A.O. Khlobystova, A.G. Maksimov, An approach to quantification of relation-
     ship types between users based on the frequency of combinations of non-numeric evalua-
     tions., in: S. V. S. A. Kovalev S., Tarassov V. (Ed.), Proceedings of the Fourth International
     Scientific Conference “Intelligent Information Technologies for Industry” (IITI’19)., vol-
     ume 1156 of IITI 2019. Advances in Intelligent Systems and Computing., Springer, Cham.
     doi: 10.1007/978-3-030-50097-9_21, 2020, pp. 206–213.
[23] L. Zadeh, The concept of a linguistic variable and its application to approximate rea-
     soning., Learning systems and intelligent robots. Springer, Boston, MA (1974. doi:
     10.1007/978-1-4684-2106-4_1) 1–10.
[24] L. Zadeh, Linguistic variables, approximate reasoning and dispositions., Medical Infor-
     matics 8 (1983) 173–186.
[25] K. W. J. Smith, A simple explanation of the forecast combination puzzle., Oxford Bulletin
     of Economics and Statistics 71 (2009, doi: 10.1111/j.1468-0084.2008.00541.x) 331–355.
[26] J. L. D. Li, W. Zeng, Fuzzy group decision-making based on variable weighted averaging
     operators., in: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE,
     Beijing. doi: 10.1109/FUZZ-IEEE.2014.6891632, 2014, pp. 1416–1421.
[27] S. Z. R.R. Tolstyakov, N.V. Zlobina, Research on social media use: theoretical and practical
     approaches., Bulletin Michurinsk State Agrarian University 4 (2016) 85–95.
[28] T. Denoeux,          Conjunctive and disjunctive combination of belief functions in-
     duced by nondistinct bodies of evidence.,              Artificial Intelligence 172 (2008, doi:
     10.1016/j.artint.2007.05.008) 234–264.
[29] O. Kuznetsov, Models of activity propagation processes in network structures, in: XII
     All-Russian Meeting on Management, volume 3, Moscow, 2014, pp. 16–19.