System for Definition of Indicator Characteristics of So- cial Networks Participants Profiles Oleg Bisikalo[0000-0002-7607-1943], Anton Kontsevoi Department of Automation and Information-Measuring Technique of Vinnytsia National Tech- nical University Vinnytsia, Ukraine obisikalo@gmail.com, anton.96k@gmail.com Abstract . The system that implements the process of automated determination of indicator characteristics of social network profiles, as well as determining the response of social network participants to certain events has been developed and tested on the real text examples and data. The developed system showed that proposed approach has substantially improved the accuracy and the amount of the characteristics of user profiles inside the Twitter social network by using the approaches that combine various natural language processing models and tech- nologies. Keywords : model, associative imaginative thinking, social network, participants profiles, indicators, users classification, android, mobile application. 1 Introduction Social networks are very actual research field in computer sciences today because a vital information for most people on the planet is already in the digital form. Consider an approach to determining the indicator characteristics of profiles of participants in social networks (SN) based on the modeling of some generalized image. Typically, collecting data to solve such and similar problems is sought by 1) determining the set of significant (indicator) characteristics of related objects (profiles of SN participants), 2) obtaining values of such characteristics for one object, 3) accumulating statistics for values for of all objects under study. The next, fourth stage should ensure that you gain useful knowledge of statistics in one way or another - for example, statistical analysis, mashing learning, deep learning. The result is a breakdown of the primary set of objects into specific classes or categories [1]. The choice and effectiveness of a method depend, as a rule, on the composition and types of data being analyzed. We take advantage of the fact that a large part of the indicator characteristics of the profiles of SN participants is verbally assigned, that is, belongs to a plurality of words of some natural language. The idea behind the proposed approach is to use lin- guistic methods of analyzing the plurality of words that characterize a particular profile of a SN participant in order to obtain a concise natural-language description of that Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). participant, understandable to humans. We formalize the problem by using the model of associative imaginative thinking of the person, first proposed in [2], in particular, simulating the processes of solving a problem by a person. The subject of formalization will be certain mnemonic processes inherent in man. In [3], in order to formally solve this class of problems, we propose to apply algebraic models of the artificial mechanism of the operative memory of an artificial linguistic system. The actual task in the development of the model [3] is to determine and detail the image-solution of the problem situation by using a group of formal operations to the ensemble of images and modeling the orienting human reflex. For the first time, an algebraic solution to this problem was obtained in [4], which considered the processes of generalizing verbal information as a whole. In terms of the model of figurative thinking, a basic synthesis operation provides, according to Vygotsky [5], the influence (infusion) of the meaning of a particular event or situation into one image. Such an operation occurs in human memory, in particular on the basis of verbal features of the image, if the event (situation) became known through textual description. For example, a newspaper editor reads the news of an event and should give the message its own name - concise but striking and appealing to the reader. 2 Formal Model We show the fundamental possibility of constructing a resolution image within the model of the mechanism of memory [4]. We will assume that previously N verbal con- tender images have already been selected into the Check – Set memory stack. Then it is necessary to definitively determine the components of the vector of emotions Vector – Set (otherwise - such verbal images that correspond to positive signs of the situation) and to select one image with N with the highest weight. The formal formulation of the problem of constructing an image of solving a problem situation will be considered Check  Set ,Vector  Set  Focus  Bi | (1) Focus  Weight  Max(Weighti ), iN where Focus–Bi is binary image code in focus of attention; Focus–Weight is the weight of the associative relationships of the image in the focus of attention with the images- components of the emotion vector Vector–Set; Weighti is the weight of associative re- lations of the i-th image of the image ensemble (IA) with the images Vector–Set. Formally, we will look for the synthesis or construction of a solution image as an InsZX statement similarly to work [6]. The main sources of information for the model are short texts of the SN users, and according to [4] we use the following notation: Bi– Vectori is binary code of the i-th image of the emotion vector Vector–Set;Bi–OM is binary code of IA RAM. We add to the algebraic BAS system [3] formal operations and predicates on the boulevards that meet the problem: 1. Search operation of Seek i-1 unit (s) in Bi-OM binary code and End-Bi predi- cate, which is true when not all bits of code have been viewed Bi  OM Seek   k , "1"( i ) (2) i  n  End  Bi , (3) where i is the number of the image in the RAM memory; k is the unit number of the binary code corresponding to the i-th image of the IA; n is the number of units in binary code (current RAM). 2. Operation New–OM transfer of images-components of emotion vector Vector– Set in IA RAM: n NewOM  Bi  Vectori   Bi  OM . (4) i 1 3. Operation New–Vector insertion into the vector of emotions Vector – Set values from the Choice – Set stack: NewVector Choice  Set  Vector  Set . (5) 4. Operation New–Choice uploading to the Choice – Set stack of RAM memory: NewChoice Bi  OM  Choice  Set . (6) In graph diagrams for algebraic constructs, we include the following notations of struc- tural programming operators [7]: Do – loop by parameter or condition; * – composition. Consider the solution of problem (1) on the basis of such constituent operators. 1. Operator Pyramid–Images of binary code decomposition Bi–OM to the codes of the image-components. As a result of the action of the operator presented in Figure 1, the well-known Bi – OM code defines the codes of the images that are part of the IA. Pyramid-Images * Do 0i * End- Bi * i+1  i Seek»1»i  k Fig. 1. A graph of the operator Pyramid-Images. For this purpose, a cycle along the length of the Bi – OM code is organized, if there is a unit, then another element with a unit at this position and zeros on all others is added to the list of images (the cycle is limited by the number of units n in the Bi – OM binary code): Pyramid  Im ages :: 0  i * {[ End  Bi ] (i  1  i * (7) k 1 n Seek "1"i  k * Conc"0" "1" Conc"0"  Bi  I i )}, j 1 j  k 1 where Bi–Ii is the binary code of the i-th image in RAM; Conc“0” is concatenation of "0" characters into binary code digits. 2. Operator OM–Change–Vector (figure 2) simulates the launch of the mecha- nism of constructing the image of the untying, because New–Choice, New–OM and New–Vector operations are performed sequentially: New  Change  Vector :: New  Choice * (8) New  OM * New  Vector. New-Change-Vector * * New-Vec- tor New-Choice New-OM Fig. 2. A Graph of the operator OM-Change-Vector. 3. The InsZX unblocking constructor (Figure 3) runs the corresponding OM – Change – Vector mechanism and submits its results to the Orient – Reflect Oriented Reflex model of p.4.5.3.: InsertZX :: OM  Change  Vector * Orient  Re flect . (9) InsZX * OM-Change- Orient-Reflect Vector Fig. 3. A Graph of the operator InsZX. The algorithm for constructing the image of solving a problem situation is implemented within the model of the orienting reflex [4] with the help of algebraic operators (2) - (9), as well as programmatically implemented and tested on the basis of Kotlin + Apache OpenNLP technology. Let us illustrate the possibilities of the proposed approach by solving problem (1) for experimental data of a cross-sectional test example “Semantic WEB Standards” from [4]. Five linguistic images (LI) were selected to the Check – Set memory stack - “Ontology (111)”, “XML (88)”, “Tool (93)”, “Resource (73)”, “RDF (87)”, and the constituents of the Vector – Set emotions vector have been set up by LI (56), Infor- mation (34) and Network (17). Formally, in accordance with the statement of problem (1), we define Check– Set=011111000000, Vector–Set={000000010000, 000000000010, 000000000001}. As a result of the action of the OM-Change-Vector operator, the Vector – Set linguistic images are transferred to the IA and replaced by the LI from the Check – Set. The Orient – Reflect oriented reflex operator consistently determines the weights for all linguistic images (Bi – OM = 000000010110), which ultimately results in the largest of them being Focus – Weight = 8 for the LI language (56). However, even with the limited natural-language material of the cross-cutting example, the “network (17)” is gaining weight 4, which makes it a competitor to the “language (56)” LI in generalizing mean- ingful concepts of the Semantic WEB Standards topic. Thus, based on a formal look at the phenomena of the ensemble of images, emo- tions vector, focus of attention and orientation reflex, the possibility of constructing an image of solving a problem situation by means of an algebraic system is shown. Unlike works [8, 9] the main source of information for the model is a textual description of the problem situation, which determine its features. The proposed approach to operator construction is based on already known models [3, 4] and provides an invariant repre- sentation of the content of a brief description of a situation, which is proved on the basis of a through test example [10]. 3 Application Development New application based on the model for determine and detail the image-solution of the problem situation is proposing. The program fulfills the tasks of analyzing the reactions of users of the social network Twitter to current news, as well as determining indicator characteristics of the profiles of participants in this social network. This development is planned to be used to collect statistical information about the response of users to an event that occurs in real time, as well as the analysis of indicator characteristics of network user profiles. The goal is to analyze statistical information and display it in the form of histograms and tables. Data collection, analysis and pro- cessing takes place on the device itself, which increases the security of the data the user is working with [11]. Like the vast majority of programs for Android, the program consists of a graph- ical interface and other application resources (graphical, text and sound resources), as well as the business logic of the application. For the program to work, you need to configure the access settings for the Twitter social network API. These parameters are key: value pairs: • consumer key; • consumer secret; • access token; • access secret. These values are used when sending requests to the servers of the social network and necessary for the process of collecting information. To get these parameters, you must have a user account with developer privileges on Twitter. These parameters are unique for each user and allow identifying his program when accessing data on the server Data collection, analysis and processing takes place on the device itself, which increases the security of the data the user is working with [12]. User interaction with the program starts from the main screen. In the center of the screen, there is an input field where the user can enter a user nickname whose profile he wants to analyze or a hashtag to which some discussion on the network is attached. Nicknames should be entered with the use of the symbol "@" in front of the nickname itself without spaces between them. In case it is necessary to analyze the reaction to the news, the user enters the corresponding hashtag starting with the input symbol "#". After that, the user can click on the “Start analysis” button, which will start the process of analyzing user reactions to events on the network or analyzing the profile of the network member depending on the input. In addition, on the screen there are two additional buttons “Use test data (tags)” and “Use test data (user)”. Each of them starts the process of analyzing relevant content using pre-stored data from the Twitter network. This is necessary if the user wants to see the result of the work of the program, but does not have a Twitter developer account and therefore cannot access its API. In this case, the program will not access the Twitter API, but will use pre-prepared response files from Twitter servers. The main screen of the application is shown in Figure 4. Fig. 4. The main screen of the application. When entering a username or hashtag, the system first calls the Twitter API. The cor- responding module of the program sends the locks to the server using the previously specified access parameters, as well as the user's request. The server returns data in JSON format (Figure 5). Fig. 5. The example JSON response from the Twitter API. At the stage of parsing each reaction of the user, the text of the reaction itself is ex- tracted and divided into tokens, after which these tokens are added to the list of tokens. The list of tokens is necessary for further text processing, since many other text pro- cessing methods are needed in preliminary tokenization. Next, the program determines the names of people, locations and dates that are in the text of tweets using the Named Entities Recognition API library Apache OpenNLP. Then, using the Chunker API, the program selects keywords and phrases (sets of words combined in meaning and grammatically) in tweet texts. For this function to work, the program needs a list of tokens, as well as the result of the POS Tagger API, whose task is to determine the parts of speech for each word in the incoming text. This will allow the Chunker API to identify words and phrases related in meaning in the text, which in the end result will help determine the reaction of a particular user to an event on the network, as well as to the general reaction of users. In the case of analyzing the profile of a Twitter participant, the Language Detector API is additionally applied, which, based on the model embedded in it, recognizes the language of the text and presents a list of languages that it was possible to recognize together with the probability coefficient that this language is the language of the text. To study and analyze the reactions of users to events, the responses of users on the Twitter network regarding events in the network united by the tag “#ukraine” were selected as input data for analysis, which means that if a given tag or word is present on a tweet, it will fall into search results. To conduct the study, 1,500 user responses in English were collected, since the recognition model works with English speech. This figure is small for real analysis, since it covers only a small percentage of network users' reactions, but it is dictated by the limitations of the Twitter API. This limitation can be circumvented by sending re- quests at fixed intervals, after which the request counter is reset to zero and new re- quests can be sent. However, this approach will not work in the context of a mobile application, since even if it will collect data on demand in the background for a certain period of time, there is a risk that the system will stop the background process. For the test needs, this number of tweets is enough, because the goal is not to collect the most accurate statistics, but to demonstrate an approach to solving the problem. Figure 6 shows the results of an analysis of user reactions to events on the network. Fig. 6. User response to events on the network. The user can interact with the graphs, zooming in and scrolling through them to see all the data. In the course of processing the results of the analysis of user reactions, the pro- gram identified the five most common names in the text of tweets, they turned out to be "Donald Trump", "Joe Biden", "Rudy Giuliani", "Vladimir Putin", "Volodymyr Zelensky" (Figure 7). With small amounts of data (1-5 thousand tweets), there is no need to display more results, since often the names of other people are rarely found in tweets, more often they are in the @nickname format. Fig. 7. The diagram of the most frequently encountered names in tweet texts. Figure 8 shows a pie chart with the mentioned locations. Fig. 8. The most frequently mentioned locations. Named Entity Recognition API was able to highlight the names of locations in the text, in this case, the names of the most mentioned countries. It is worth noting that, despite the high accuracy of the results obtained, the model may not determine the names of little-known cities, which may be a problem when analyzing more “local” events in the network. Figure 9 shows a diagram with the most common keywords and phrases in the texts of reactions to news on the network. Fig. 9. The most common keywords and phrases. It is worth noting that the approach using Tokenization, POS Tagging and Chunking API gives much greater accuracy and greater semantic load embedded in the final re- sults compared to using Tokenization and Tuples (for generating pairs of keywords in the text). The data approach works well in English, since it has a strict word order in a sentence, therefore there is a high probability of finding matches among the received words and phrases. To study and analyze the profile of a member of a social network, the profile of the current US president was chosen, as well as his tweets in the amount of 2500 pieces. This figure is small for real analysis, since it covers only a small number of recent user tweets and is dictated by the limitations of the Twitter API. Figure 10 shows the results of the analysis of the profile of a member of the Twit- ter network. Fig. 10. Results of the analysis of the profile of a member of the social network Twitter. The analysis results show that the system recognized four languages that were suppos- edly used when writing tweets by a user. The result with highest balls was the accuracy of the English language, which is true. The remaining languages were recognized, since the text could contain names, names or locations that are written similar to these lan- guages, as well as due to errors in the recognition model. Figure 11 shows the pie dia- gram with the languages detected in the user tweets. Fig. 11. The languages detected in the user tweets. We also analyzed the most mentioned places and personalities on the same principle as when analyzing user reactions to events on the network. These results were put on one chart in order to determine the dependence between the name and location. With this approach, between related pairs of names and locations, the difference in the frequency of references will be insignificant. However, in the case of this user and data set, this dependence was not detected. Summing up, it can be argued that the created program successfully fulfills the tasks of analyzing the profiles of members of a social network, as well as determining the response of users to events in it. In the course of the work, a new coordinated approach to solving the above prob- lems was proposed, implemented using the Apache OpenNLP library. It made possible to increase the accuracy of the received results, which was confirmed by the testing results of the developed program. 4 Conclusion The system for implement the process of automated determination of indicator charac- teristics of social network profiles, as well as determining the response of social net- work participants to certain events has been proposed in the article. Based on a formal look at the phenomena of the ensemble of images, emotions vector, focus of attention and orientation reflex, the possibility of constructing an image of solving a problem situation by means of an algebraic system is shown. The architecture and main modules of the proposed system were developed and tested on the real text examples. To develop the software for solving the goal of research, the Kotlin programming language was chosen because of it and its ease of use when creating Android applica- tions, as well as Apache OpenNLP library for parsing data about news and network users. So, the developed system showed that proposed approach has substantially im- proved the accuracy and the amount of the characteristics of user profiles inside the Twitter social network by using the approaches that combine various natural language processing models and technologies. According to the authors, the further development of the research subject is the study of different natural language processing models, which can be used as base of the pro- posed system for definition of indicator characteristics of social networks participants’ profiles. References 1. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer Science+Business Media, New York (2013). 2. Bisikalo, O., Cięszczyk, S., Yussupova, G.: Solving problems on base of concepts formalization of language image and figurative meaning of the natural-language constructs. In: Proc. SPIE, vol. 9816, Optical Fibers and Their Applications, pp. 419–432 (2015). 3. Bisikalo O., Ivanov, Y., Karevina, N.: Encoding of Natural Language Information on the Basis of the Power Set. In: 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), pp. 17-20. Lviv, Ukraine (2018). 4. Bisikalo, O., Ivanov, Y., Sholota, V.: Modeling the phenomenological concepts for figurative processing of natural-language constructions. In: CEUR Workshop Proceedings, vol. 2362, pp. 1–11 (2019). 5. Vygotskiĭ, L.: Thought and language. MIT press, Cambridge, MA (2012). 6. Lund, K., Burgess, C., Audet, C.: Dissociating semantic and associative word relationships using high-dimensional semantic space. In: Cognitive Science – COGSCI, vol. 18, pp. 603- 608 (1996). 7. Eisenbud D.: Commutative Algebra with a View Toward Algebraic Geometry. Springer– Verlag, New York, NY (1995). 8. Zaki, M.: Scalable algorithms for association mining. In: IEEE Transactions on Knowledge and Data Engineering, vol. 12, issue 3, 38 p. (2000). 9. Cabrera, I.: A note on the envelopes of an associative pair. In: Comm. Algebra, vol. 32, pp. 4133–4140 (2004). 10. Mitkov, R.: The Oxford Handbook of Computational Linguistics. Oxford University Press (2005). 11. Lee, V.: Mobile Applications: Architecture, Design, and Development. Prentice Hall (2004). 12. Murphy, M.: The Busy Coder's Guide to Advanced Android Development. In: CommonsWare, LLC (2011).