Machine Training for Intelligent Analysis of Text for the Identification of the Author Nadiia Pasieka1, Vasyl Sheketa2, Myroslava Kulynych3, Yulia Romanyshyn2, Svitlana Chupakhina1, and Olha Khytrova4 1 Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, 76000, Ukraine 2 National Tech. University of Oil & Gas, Ivano-Frankivsk, 76068, Ukraine 3 Ukraine Academy of Printing, Lviv, 79020, Ukraine 4 Chernivtsi Trade and Economic Institute Kiev National University of Trade and Economics, Ukraine Abstract The continuous development of information technology has led to an increasing danger and critical cyberattacks, which have recently developed and penetrated unimpeded in various institutions that have a sophisticated infrastructure of information technology use. Based on the analysis of the last three years, there have been critical cases of cybercrime around the world, primarily involving significant leaks of critical information, the spread of fake messages, cyberbullying, and cloud-based cryptojacking. As a result, scientific research has sprung up around the world to unambiguously identify the cybercriminal. For this purpose, various agencies have improvised innovative methods to combat this vice, as well as on the possibility of bringing the perpetrators to justice, in connection with such critical cybersecurity issues. As one option to effectively address this problem, the Forensic Writer Identification system, which works on the principles of stylometry is being considered. Indeed, the intellectual analysis of text for belonging to one or another author is a complex technological task, at its base uses artificial intelligence technology to identify, protect, recognize, create, extract and document digital evidence, which can then be used as evidence of wrongdoing regarding social media users or simply to analyze critical data. Thus, the main goal of this study is to examine in detail the capabilities of Forensic Writer Identification technology to analyze the tweets of different users around the world and unequivocally and apply it to reduce the search time for criminals by providing the police with the most accurate methodology. As well as to compare the accuracy of different methodologies. Conducted analytical research behind a logical literature review that examines the most important methods of text analysis. The study used texts from various Twitter users for intelligent analysis. Various online and offline databases were used to expedite the study, and information systems were used to efficiently search for relevant scholarly results. As systems analysts have recently emphasized computer methods for rapid analysis of digital text in order to establish authorship, the results presented are very encouraging. Thus, this research provides a general framework and rationale for the use of text and author identification methods. This article reviews current research methods and software applications, and touches on the issues of evaluating the performance of such research. Various research strategies for digital text research are summarized, and a more detailed description of two combined methods is presented. Thus, through the use of textures, algorithms, and polygraphs, new technologies are beginning to show valuable levels of performance. Nevertheless, the use of combined methods to analyze text for its identity will play a vital role in future technologies. In this regard, the goal of formulating a project proposal is to create an analytical analysis system that automatically recognizes authorship of all aspects of technology on a global scale, which may partially solve the problems of modern cybercrime. Keywords 1 Machine learning, tweet analysis, cyberbullying, cybercrime, text analysis, stylometry. CPITS-II-2021: Cybersecurity Providing in Information and Telecommunication Systems, October 26, 2021, Kyiv, Ukraine MAIL: pasyekanm@gmail.com (N. Pasieka); vasylsheketa@gmail.com (V. Sheketa); kumyr@ukr.net (M. Kulynych); yulromanyshyn@gmail.com (Y. Romanyshyn); cvitlana2706@gmail.com (S. Chupakhina); olga_hitrova@ukr.net (O. Khytrova) ORCID: 0000-0002-4824-2370 (N. Pasieka); 0000-0002-1318-4895 (V. Sheketa); 0000-0002-9271-7855 (M. Kulynych); 0000-0001-7231- 8040 (Y. Romanyshyn); 0000-0003-1274-0826 (S. Chupakhina); 0000-0003-2253-4356 (O. Khytrova) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 58 1. Introduction The permanent process, the development of innovations in digital communications, such as network technologies (social networking platforms), SMS, online forums, blog posts and emails, have defined the initial conditions for a faster as well as effective transmission of digital information through an innovative intelligent IT-infrastructure. Under certain conditions, the author of texts can remain incognito, but unfortunately, this status can also lead to various cybersecurity problems on all available social networks. Digital forensics of anonymous texts for authorship is the process of using information technology, namely the storage, extraction and transmission of data from various technical platforms to identify evidence and combine it into useful information that can be effectively used to solve network problems like cyberbullying around the world. Thus, effective digital, authentication and attribution of a specific author of an online source text is a critical tool in combating cybercrime. Unfortunately, in our study, the length of the original text is limited to a few steps, which complicates the process of effective testing. [1, 2, 4, 6] Thus, the goal of our study is to process the differentiation of triggering messages from Twitter that are limited to one hundred and forty characters in length. In our study we consider and evaluate the basic stylometric signs of the digital text belonging to a particular author, for this purpose we use abstract methods for examination, as well as explicit signs for the social network Twitter, such as URLs, hashtags, responses or statements. [3, 5, 11, 16, 28] Subsequently, such analytical approaches are able to achieve an optimum accuracy of about eighty percent in determining text authorship. Thus, with the increasing number of cybercrimes against social network users around the world, there is a critical need to identify the author of digital text with an acceptable and adequate quality of author identification, there is an important issue for scientific expertise in order to effectively combat cybercrime, such as cyberbullying, emasculative notes and deception on the Internet by various criminals. Of course, the anonymity provided by phones with ready-made SIM cards, public networks, public Wi- Fi zones and decentralized network resources, for example such as Tor, an application that hides the personal attributes of the user, can make the task of identifying online customers much more difficult. Thus, sometimes the content of a separately published message text becomes the only effective information to identify its creator. At the same time, it is also worth noting that the real existence of online text messaging clients sometimes turns out to be quite different from what they appear to be and how they look on the Internet. [7, 9, 12, 24, 36] However, it is necessary to emphasize the permanent processes that change the nature as well as the results of this and are constantly evolving. In addition, in various publications circulating in the Internet space there is an example of a media organization which in its activities has allegedly created and used pseudonymous virtual personages to conduct a coordinated campaign of disinformation through the network media systems. Along with this, it is also expected that some of these tasks to determine the authorship of text messages in order to stop illegal actions will be supported by state structures. To correctly identify an anonymous author from a published text message (which looks like a cyberbullying) through the analytical process of authorship identification in social and public networks is presented in the model Figure 1. The permanent development of the Internet offers more and more opportunities for cybercriminals to secretly spread their malicious deeds, such as phishing, cyberbullying, fraud and spam. To this end, an original study has been proposed to determine the authorship of such digital works on the Internet (e.g., discussion comments, analysis of text messages, tweets, SMS) The use of scientific phonetics provides a special mathematical apparatus for the examination of such digital text messages based on a certain linguistic evidence of belonging to a particular author Thus, online identity is ways of orthographic and phonetic composition of original digital text, and an This study also involves computer analysis of text messages for combining styles or style metrics from an archive of such malformed messages. [8, 13, 18, 40] The study solves several different problems, namely the automated text message extraction analysis, in order to determine and identify the real author of the analyzed content. Thus, the largest subjects of storage and generation of digital text messages make considerable efforts involving data analysts. [10, 17, 31] However, the use of applications without advanced artificial intelligence (AI) functions, does not provide sufficient information to unambiguously identify the authorship of digital text messages, both real and 59 suspected cybercriminals. The Internet provides a useful environment (arena) for cybercriminals to covertly carry out their intended activities, such as phishing, fraud and spam. Figure 1: A model for identifying the author of a text message Thus, the relevance of original research on anonymous digital text messages such as, discussions, comments on them tweets. Therefore, the qualitative determination of the authorship of anonymous text messages directed against users in the Internet environment and received serious direction in the analytical analysis of the available information. Scientific phonetics interrogates experts who investigate linguistic turns (unknown or authorized works used in cybercrimes) during the preliminary analysis of digital text messages. In addition, the analytical examination of digital text messages, which malicious actors tend to obscure problems with online content, are called creation checks. [14, 15, 21] Essentially, online digital text message originality checks are an examination of voice or computational attributes that may be recorded by a combination of known or unknown software systems. In addition, the process of digital text message author identification can also include checking the composition style or style accents in the investigated signature. [19, 22] It should also be emphasized that there are various challenges to implementing this process, for example, manual extraction and automatic text analysis to identify the true author of a digital text message has become a significant challenge for various specialized units. Based on the paradigm that most of the specialized units that are engaged in deep analysis of big data, use computer systems that do not have innovative capabilities with elements of artificial intelligence, and therefore cannot cope with the task of identifying the author of a digital text message, both real and perceived. In addition, various projected applications around the world lack basic artificial intelligence algorithms for analyzing digital text messages for anonymous author identification, which can solve some delicate 60 problems, such as language attribute recognition in electronic communication services. Internet short messages, such as the signature set in Twitter characters, have some attributes that make the origin of the location test comparison, and moreover, the official content of the artistic function is written as follows. [19, 26, 32, 37] It should be emphasized that the web digital text message is short or long in most cases, indicating that a specific language complex measure that depends on the number of words in the content may not be appropriate. In addition, some computer applications may not recognize parts of speech. In fact, part-of-speech tagging classifies the parts of speech for each word in online content, taking into account the quality of the word and the specific circumstances in which it appears. In addition, in most cyberattacks, cybercriminals use false and fraudulent methods to gain access to various Internet and Web technologies, manipulating user information without user authorization. Undoubtedly, the constant growth in the use of information and communication systems and technological equipment has led to an increase in global cybercrime and conflicts between governments and criminals. [20, 23, 35, 41] Ultimately, in order to correctly identify an anonymous user, a significant volume of banned digital text messages must be processed for analysis. However, cybercrime is generating more and more new methods and techniques for its unlawful acts and as a result there are more and more cyberattacks on various Internet sites. [25, 27] As a rule, cybercriminals, for their unlawful deeds, use mass distribution in social networks specially programmed false accounts, which they use as: - а means to legally carry out cybercrime business; - organizing the embedding of special information flows with the purpose of discrediting both organizations and private users of social networks; - unlawful possibility of mass theft of personal data of social network users; - artificial creation of certain conditions for deterioration of trust in social networks; - the artificial creation of fake news feeds and fake votes (ratings); - artificial creation of problems for social marketing. To achieve the task of identifying an anonymous author of a digital text message we used methods of computer visualization (for graphical representation of social networks), methods of collecting personal data from various Internet services with analytical analysis, mathematical methods of visual pattern recognition, methods of piece intelligence, methods of hierarchical clustering, neural network methods, fuzzy clustering methods (to conduct clustering Web users into groups with the same attributes), as well as simulation modeling (to analyze the results obtained). 2. Analysis Literature In fact, some of the nastiest digital text messages posted in the Internet environment and on various social media platforms can be used in criminal investigations of content authorship, especially through the method of forensic phonetic identification of anonymous authors. Unfortunately, many authors of such illegal online content messages still remain out of reach. Despite the fact that on some digital platforms where web technologies are hosted, the problem of identifying the anonymous author of a particular digital textual content can be quite a challenge, since the length of digital text messages on the Internet has limitations on a certain number of characters. [29, 30] Consequently, the goal of a responsible data analyst is to identify the anonymous sender of a tweet or the owner of a virtual Twitter account, where a tweet digital text message can be up to two hundred and eighty characters long. Therefore, it became a matter of developing an algorithm to evaluate the key metrics used for initial attribution, which can identify the key characteristics of each cybercriminal's compositional style. Specifically, research data analysts applied stylistic measurement techniques to various digital text messages for natural language processing (NLP), including specifying an anonymous author, author profile, author verification, detecting changes in a particular text style, and grouping different tweets. [22, 34, 39] Obviously, stylistic algorithms were used to identify the anonymous author of the digital textual content or the blog of a given tweet from the so-called list of investigated authors in order to achieve a positive result at the center of this study. The aforementioned methods were used to test 61 different data sets, and finally, ideas for future work were considered. In fact, this research topic has important implications for the application of data analysis tools to help reduce cybercrime worldwide. Obviously, with offline and online digital text messages, it is not easy to track any direct intra- species handwritten creations to identify or verify an anonymous author. Nevertheless, many research papers have been published on author-specific technology for identifying an anonymous author from digital text messages, which has become a basic tool for big data analytics. Therefore, they have been used to use stylistic measurement techniques to analyze social media texts for this article. In addition, online social networks (OSN), such as Facebook, LinkedIn and Twitter, provide increasingly new factors for triggering anonymous attribution. [33, 38–42] These online tools are believed to provide effective and quick virtual linking methods that help anonymous social media users commit cybercrimes. Essentially, users can use screen names, aliases, or VPNs in these places, and other users may not provide the correct credentials for the record. In fact, to obtain meaningful test results, the data scientist or analyst should use an information validation method based on reliable information that contains known, validated models and key points applicable to the test being performed. Will be most accurate as shown in Figure 2 below. In addition, to compare multiple literature articles, the report should consider the following aspects. Figure 2: Model of the information-analytical system for identifying an anonymous author 3. Analytical identification versus verification In essence, semantic language properties are focused on obtaining information about the creators of digital textual content in the field of stylistics. In addition, most big data analysts use a variety of stylistic identity verification methods to identify anonymous customers based on their digital text messages. This study details the digital content originality verification methods used for sequential validation using unstructured online partitions based on message content. The online file is then split into continuous grids of short messages, and (sequential) confirmations are selected in these grids to separate true and self- proclaimed practices. To achieve the exact goal of the study, analysts are encouraged to analyze content containing one hundred forty, two hundred eighty, and five hundred characters based on the proposed algorithm, which is designed to extract the digital text message in the study. In addition, the list of features includes traditional digital text selection such as morphology, syntax, explicit selection, as well as innovative selection created in the study of n-grams. In addition, the proposed method includes a number of methods to circumvent the emerging problems associated with unbalanced sets of big data, and uses the obtained information and the analyzed information as component selection methods and support vector mechanisms (SVMs) for grouping digital text messages. Experimental evaluation of the proposed method and text message parsing algorithm based on Enron and Twitter emails gave promising results: according to the syntax of the algorithm used, the EER is approximately 23.86%. In addition, the clustering of the analyzed digital text message can be obtained by the formula (1) shown in Figure 3 below, where A and λ 62 (sigma) represent the normalization of the big data set, SK represents the Gaussian distribution of the data set, and Du represents the u-th digital text message. 𝑛0 𝑛0 1 1 1 (𝑆 − 𝑆𝑘 ) 𝑃𝑡,𝑢 (𝑆) = ∑ 𝐺(𝑆𝑘 ) = ∑ 𝑒𝑥𝑝 [1 ] (1) 𝐴 𝑘=1 𝐴 √2𝜋 𝑘=1 2 4. Methodologies for analyzing digital text messages As one of the criteria for identifying digital text messages or tweets, the research uses stylistic methods to accurately perform the post-creation task of identifying anonymous authors of content. In this section, we will comprehensively describe the main methods that can be used to automatically identify an anonymous author across multiple Twitter accounts. Essentially, the desired methodology involves the selection of software that uses stylometric, semantic algorithms and the fleeting parts of authoring mechanisms. Stylometric mechanisms for identifying an anonymous author by content. Of course, different analysts use stylistic variables to identify the anonymous authorship of amateur bloggers on social media as one way to tackle the growing problem of cybercrime. In addition, the stylistic analysis of the corpus is usually one of the main methods of identifying the author of a Twitter post. Usually their consideration depends on the field of application. For example, the source attribution method and other short online posts include basic style tags such as good news, document links, specific HTML tags, etc., or bizarre style tags such as misspellings. Since our research area is abstract writing, we decided to include a wider range of style markers. The stylistic variables commonly used in this study are specific aspects such as punctuation in tweets and n-gram text arrays. In fact, the corpora used in this study usually consists of multiple digital text messages from several anonymous authors. Like any corpus-based method, corpus configuration is strictly considered to produce reasonable research results. Regarding source attribution, the characteristics of the corpus (text type, language, time, case) affect the accuracy of attribution. In addition, the ideal consensus recommendation is based on the summary of the author's company, taking into account the narrowest possible language range. In addition, we assume that we have received a set of predetermined posts or tweets as information in online media, and each method will generate a similarity score for each pair of accounts analyzed. In fact, the corpus used in this study usually consists of multiple posts by multiple authors. As with every model-based method, the configuration of the model is considered from a critical point of view to produce the most reasonable research results. Regarding the attribution of provenance, model characteristics (text type, language, time, case) affect the accuracy of attribution. In addition, it has been clearly suggested that it is ideal to accumulate a corpus of digital text messages based on anonymous authors, who have considered the narrowest possible language range. In addition, we assume that we have obtained a set of time-stamped posts or digital text messages (tweets) as information in online media. Therefore, each suggested determination method will generate a similarity score for each pair of accounts analyzed. In addition, this technology is used in conjunction with the spry information mining program that is essential for text extraction, and the check also corresponds to the four important stages of tweet recognition, such as preprocessing different tweets, extracting tweet features, and comparing the same tweets And to determine the identity of the author. Logically speaking, the stylistic research of tweets is similar to the stylistic research of different types of short messages, such as online discussion posts or online text conversations. It is worth noting that they are random and comparative in design and syntax. For our tests, a complete feature list has been developed, taking style data into account in any case, and assuming that creators unknowingly follow a certain design and are predictable in their decisions. In the analysis process, the algorithm will use iterative techniques to search the array tree and identify various characteristic vocabulary features- this will help produce the most accurate results. In addition, preprocessing tweets involves deleting suspicious tweets from some authors. In fact, the proposed tweet preprocessing technique relies on the association of slang words with other matching words to check the importance of slang words and their presumed interpretation. We use n-gram to detect connections and condition arbitrary fields to check the meaning of slang terms. However, an 63 important issue in this field is data matching about hype, relevance, emoji, folklore classification, and slang. Recognizing the characteristics of tweets involves analyzing pre-suggested tweets to determine the total number of words in a particular tweet, the special characters used, the consonants and vowels, the total number of characters, and the regularity of the use of certain words or phrases. In fact, recognizing appropriate selections for learning opinions in tweets remains an open area of research, as text-ordering methods face the problem of parsimony and part-of-speech (POS) labeling strategies bombard the lack of linguistic construction of tweets. The character-based selection, namely n-gram characters, is now very mature because they do not contain any language. Logically speaking, the analyst must find content that matches the content stored in the database in order to use feature matching techniques to detect similarities when possible. In addition, the feature evaluation algorithm of selected tweets can be used to directly identify friendships and find evaluations. For example, decision-making evaluation is an important area, and presumption checking of tweets has been widely used in the past few years. Lexical This is the technique of Twitter sentiment investigation. In fact, it attempts to quantify the diversity of popular assessments of retail signs. The first is a vocabulary-based strategy, which uses word references and semantic scores of words to calculate the final endpoint of a tweet and incorporate grammatical feature tags. In addition, with the rapid spread of interpersonal organizations, Weibo applications, and gatherings, its main role is developing significantly. Structural Basically, Twitter is a Weibo platform that spreads about 450 million tweets every day. It requires a structured algorithm to accurately analyze all data. In addition, it also solves the important information hotspots of disease fighting and control in local areas. This research investigates the structural algorithm methods used to separate and analyze Twitter information, including the attributes and representativeness of the information; information sources, access, and cost; inspection methods; information boards and cleanliness; standardized measurements; and examinations. Syntactic In essence, the language used on Twitter has some peculiarities, such as the use of hashtags or client references. Therefore, data preprocessing technology will use syntactic algorithms or data structures to speed up the analysis process. In order to improve the effectiveness of language preparation techniques (such as morphological restoration and syntactic analysis), we have performed some standardization steps. We eliminate #images, all @ notices, and connect and perform lowercase conversion. Similarly, if a vowel is repeated multiple times in a word, we reduce it to a single event, and reduce the different back-to-back accent marks to a single event. Finally, we lemmatize the standardized content. N-grams With the advancement of global Web technology, the Internet has been used as a hot spot to get news about the latest developments. Recently, Twitter is probably the most famous online media platform that allows public customers to share news. This stage develops rapidly, especially among young people who may be affected by data from mysterious sources. In this way, foreseeing the credibility of information on Twitter becomes a need, especially in the event of a crisis. This paper proposes an arrangement model based on managed AI strategy and word-based N-gram analysis to naturally rank Twitter messages as sounds instead of entities. Applied and specifically analyzed five different management characterization programs: Linear Support Vector Machine (LSVM), Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB) and K-Nearest Neighbor (KNN). This exam explores two descriptions (TF and TF-IDF) and different word N-gram ranges. For model preparation and testing, a 10-crease cross-approval was performed on two data sets in different dialects (such as English). In addition, the application of syntactic concepts will also be used to relieve the sentence structure about stress and the grammatical form-part of speech. According to the theory, punctuation is an important guide to characterize restrictions and make tweets or posts meaningful by dividing paragraphs into sentences and dividing each sentence into different tags, such as exclamation marks, quotations, and periods. The arrangement of basic accents includes single sentences, commas, periods, colons, semicolons, question marks, and shout signs. In addition, according to the Unicode design with code and specific grammar, a series of notable emphasis is made using various symbols, such as (!, "). The function is subject-specific and captures the creator’s style in various themes These highlights 64 include the number of sentences per square of text, the number of normal characters, words and sentences in a square text, and the number of normal sentences beginning with uppercase and lowercase. Text Representation Logically, for each style variable, the data analyst creates a loop vector, where each measurement is related to an alternative component. In addition, in most experiments, data analysts will select different combinations of factors, and then treat each report as a link to a repeating vector related to these factors. Basically, some of the important features include the total number of characters in each tweet, the total number of words in each sentence in the tweet, the frequency of words in the tweet, the total number of common words in the dictionary, emoji in the tweet and its appearance in the tweet. In the meaning of Google and various online applications, trend words are for specific reasons-maybe a day or a week or so. Fundamentally, the main period of Twitter content confirmation will be pre-measurement of different tweets from certain clients, where untrusted parts will be removed from content written by others, for example, other web-based media-Twitter-customers Tweets forwarded by the end. Many etymological characters have been recommended for origin checking, for example, the determination of specific words and syntactic structure. In sharp contrast to topic-based content arrangements (the essential issue is a set of texts), key arrangements are expanded by adding vocabulary, syntax, and application explicit focus. Such a mix can better convey the creator's style. In this way, the expert will use the created calculations to coordinate the highlights of the indistinguishable content characters. This is done after the component is discovered, and it is compared with the data sets of different authors to find similarities. It is worth emphasizing that the grouping model includes various configuration files created independently for a single customer. 5. Results Thus, analyzing the results obtained by using different models to determine the anonymous authorship of digital text messages and machine learning to automate this process, we found that the best model is the model of increasing the gradient with an accuracy of 37.43%. We can state that the probability of detection is low, but we can also note that this figure can be directly related to the nature of the digital text messages processed. Since these anonymous authors can change on bot farms, and it is very likely that some authors are constantly changing agents, that is why the prediction is so low. Which will ultimately affect the accuracy of the proposed model. Due to the cumbersome nature of analytical calculations, we present only the general values of accuracy of the models considered by the anonymous author by numerical analysis of text messages, which was for: Gradient Boosting 37.43%; Decision Tree 23.17%; Bayesian Network 19.89%. As the results of the different models show, they are in line with our expectations, along with demonstrating the significant potential of different machine learning algorithms in identifying anonymous social media authors. And some differences in the percentage of detection for each algorithm successfully shows us that machine learning algorithms can be used to extract useful information from significant amounts of digital text message data and predict the author of a particular textual content. This could play a huge role in the future in the fight against cybercrime, cyberbullying, and identity theft from social media users. 6. Conclusions The main purpose of the study is to automate the identification of the author of a digital message using various analytics technologies. To identify the anonymous author using the recognition of his text messages on the social network Twitter to minimize and prevent cybercrime. In fact, the science of data analytics has activated the development of a unique stylistic evaluation model for digital text messages that has the ability to use computer analysis of unique linguistic and stylistic "fingerprints" for affiliation and author identification. The study attempted to test the proposed concept of digital text message authorship by testing and evaluating the accuracy of the proposed model, the principle approaches that were tried to test, proves that they can significantly help to automate these mechanisms for the relevant structures. The results of the study clearly delineate the research topics. In addition, the evidence presented in the study subsequently confirmed that the proposed model is an acceptable mathematical 65 model for the process of author identification using stylometry technology. In fact, empirical research on determining the author of a digital text message has answered almost all of the questions posed and achieved the basic goals of being reviewed and investigated through computer dialogue and analytic analysis. However, there is still room for improvement in method models and algorithms, which requires additional research to achieve better results that will have a positive impact on reducing cybercrime. Due to the limited amount of data sets used in the proposed model, the ones used in this study are not highly accurate. These datasets consist of popular Twitter users who are likely to find a way to continue generating digital text messages or pay penalties to the organizers and continue participating in social media. Future research will explore other models, methods, and algorithms to obtain more robust metrics, and will greatly expand the data set that is now being accumulated on chimera resources. This research is also a successful starting point for further research, and applying models and methods to digital text messaging data significantly reduces the potential for cybercrime, which can be useful for various specialized units. 7. References [1] Adak, C., Chaudhuri, B. B., & Blumenstein, M. (2019). An empirical study on writer identification and verification from intra-variable individual handwriting. IEEE Access, 7, 24738-24758. doi: 10.1109/access.2019.2899908 [2] Agile Business Consortium Limited. (2021). Agilebusiness.org. agilebusiness.org. https://www.agilebusiness.org/page/ProjectFramework_06_Process [3] Alonso-Fernandez, F., Belvisi, N. M., Hernandez-Diaz, K., Muhammad, N., & Bigun, J. (2020). Writer Identification Using Microblogging Texts for Social Media Forensics. International Journal of Recent Trends in Engineering and Research, pp. 1-22 [4] Analysis of Stylometric variables in long and short texts. Procedia - Social and Behavioral Sciences, 95, 604-611. Retrieved from https://doi.org/10.1016/j.sbspro.2013.10.688 [5] B. Durnyak, B. H. O. Tymchenko, O. Tymchenko and D. Anastasiya, "Research of image processing methods in publishing output systems," 2018 XIV-th International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH), 2018, pp. 178-181, doi: 10.1109/MEMSTECH.2018.8365728. [6] B. Durnyak, B. Havrysh, O. Tymchenko, M. Zelyanovsky, O. O. Tymchenko and O. Khamula, "Intelligent System for Sensor Wireless Network Access: Modeling Methods of Network Construction," 2018 IEEE 4th International Symposium on Wireless Systems within the International Conferences on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS), 2018, pp. 93-97, doi: 10.1109/IDAACS-SWS.2018.8525792. [7] B. Durnyak, O. Tymchenko, O. Tymchenko and B. Havrysh, "Applying the Neuronetchic Methodology to Text Images for Their Recognition," 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), 2018, pp. 584-589, doi: 10.1109/DSMP.2018.8478482. [8] Belvisi, N. M., Muhammad, N., & Alonso-Fernandez, F. (2020). Forensic authorship analysis of Microblogging texts using N-grams and Stylometric features. 2020 8th International Workshop on Biometrics and Forensics (IWBF), 1-6. Retrieved from https://arxiv.org/pdf/2003.11545.pdf [9] Dronyuk I., Nazarkevych M., Fedevych O. (2016) Synthesis of Noise-Like Signal Based on Ateb- Functions. In: Vishnevsky V., Kozyrev D. (eds) Distributed Computer and Communication Networks. DCCN 2015. Communications in Computer and Information Science, vol 601. Springer, Cham https://doi.org/10.1007/978-3-319-30843-2_14 [10] Gomez Adorno, H. M., Rios, G., Posadas Durán, J. P., Sidorov, G., & Sierra, G. (2018). Stylometry-based approach for detecting writing style changes in literary texts. Computación y Sistemas, 22(1). Retrieved from http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid [11] Joshi, S., & Deshpande, D. (2018). Twitter Sentiment Analysis System. International Journal of Computer Applications (IJCA), 180(47), 35-39. Retrieved from https://arxiv.org/ftp/arxiv [12] Killian, A., Brounstein, T., Skryzalin, J., & Garcia, D. (2019). Stylometric and Temporal Techniques for Social Media Account Resolution. Sandia National Laboratories Journal, 1-8. 66 https://www.osti.gov/servlets/purl/1456316 Kramer, S. (2020, June 9). Tracking coronavirus disinformation on Twitter. [13] Kula, S., Choraś, M., Kozik, R., Ksieniewicz, P., & Woźniak, M. (2020). Sentiment Analysis for Fake News Detection by Means of Neural Networks. Springer, Cham. Retrieved from https://link.springer.com/chapter/10.1007/978-3-030- 50423-6_49 [14] M. Pasyeka, V. Sheketa, N. Pasieka, S. Chupakhina and I. Dronyuk, "System Analysis of Caching Requests on Network Computing Nodes," 2019 3rd International Conference on Advanced Information and Communications Technologies (AICT), Lviv, Ukraine, 2019, pp. 216-222, doi: 10.1109/AIACT.2019.8847909. [15] Mariya Nazarkevych, Andrii Marchuk, Lesia Vysochan, Yaroslav Voznyi, Hanna Nazarkevych and Anzhela Kuza Ateb-Gabor Filtering Simulation for Biometric Protection Systems. CPITS 2020 pp. 14-22 [16] Medykovskyy M., Pasyeka M., Pasyeka N. & Turchyn O. (2017). Scientific research of life cycle perfomance of information technology. 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, , 1 425-428. doi:10.1109/STC- CSIT.2017.8098821 [17] Mishchuk, O., R. Tkachenko, & I. Izonin Missing Data Imputation through SGTM Neural-Like Structure for Environmental Monitoring Tasks. Advances in Intelligent Systems and Computing. Vol. 938. 2020, pp. 142-151, doi:10.1007/978-3-030-16621-2_13 [18] Mykhailyshyn H., Pasyeka N., Sheketa V., Pasyeka M., Kondur O. & Varvaruk M. (2021). Designing network computing systems for intensive processing of information flows of data doi:10.1007/978-3-030-43070-2_18 [19] N. Pasieka, V. Sheketa, Y. Romanyshyn, M. Pasieka, U. Domska & A. Struk «Models, Methods and Algorithms of Web System Architecture Optimization» IEEE International Scientific- Practical Conference Problems of Infocommunications, Science and Technology (PIC S&T), Kyiv, Ukraine, 2019, pp. 147-153, doi: 10.1109/PICST47496.2019.9061539. [20] Nazarkevych M., Logoyda M., Dmytruk S., Voznyi, Y. & Smotr O. (2019). Identification of biometric images using latent elements. Paper presented at the CEUR Workshop Proceedings, 2488 pp. 99-108. [21] Nazarkevych M., Logoyda M., Troyan O., Vozniy Y. & Shpak Z. (2019, September). The Ateb- Gabor Filter for Fingerprinting. In International Conference on Computer Science and Information Technology pp. 247-255. Springer, Cham. [22] Nazarkevych M., Logoyda, M., Troyan, O., Vozniy, Y., & Shpak, Z. (2019, September). The Ateb- Gabor Filter for Fingerprinting. In Conference on Computer Science and Information Technologies (pp. 247-255). Springer, Cham. [23] Nazarkevych M., Lotoshynska N., Klyujnyk I., Voznyi Y., Forostyna S. & Maslanych I. (2019, July). Complexity Evaluation of the Ateb-Gabor Filtration Algorithm in Biometric Security Systems. In 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON) pp. 961-964 [24] Nazarkevych M., Lotoshynska, N., Brytkovskyi, V., Dmytruk, S., Dordiak, V., & Pikh, I. (2019). Biometric identification system with ateb-gabor filtering. Paper presented at the 2019 11th International Scientific and Practical Conference on Electronics and Information Technologies, ELIT 2019 - Proceedings, 15-18. doi:10.1109/ELIT.2019.8892282 [25] Nazarkevych M., Oliiarnyk R., Nazarkevych H., Kramarenko O., & Onyshschenko I. (2016, August). The method of encryption based on Ateb-functions. In 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP) pp. 129-133. [26] Nazarkevych, M., Oliarnyk, R., & Dmytruk, S. (2017, September). An images filtration using the Ateb-Gabor method. In 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT) (Vol. 1, pp. 208-211 [27] Neal, T., Sundararajan, K., Fatima, A., Yan, Y., Xiang, Y., & Woodard, D. (2018). Surveying stylometry techniques and applications. ACM Computing Surveys, 50(6), 1-36. doi: 10.1145/3132039 67 [28] Nirkhi, S., Dharaskar, R., & Thakare, V. (2016). Authorship verification of online messages for forensic investigation. Procedia Computer Science, 78, 640-645. Retrieved from https://www.sciencedirect.com/science/article/pii/S1877050916001137 [29] Pascual, F. (2020, August 19). Twitter sentiment analysis with machine learning. MonkeyLearn Blog. Retrieved from https://monkeylearn.com/blog/sentiment- analysis-of-twitter/ [30] Pasieka, N., Sheketa, V., Romanyshyn, Y., Pasieka, M., Domska, U., & Struk, A. (2019). Models, methods and algorithms of web system architecture optimization. Paper presented at the 2019 IEEE International Scientific-Practical Conference: Problems of Infocommunications Science and Technology, PIC S and T 2019 – pp. 147-152. doi:10.1109/PICST47496.2019.9061539 [31] Pasyeka M., Sheketa V., Pasieka N., Chupakhina S. & Dronyuk, I. (2019). System analysis of caching requests on network computing nodes. 3rd International Conference on Advanced Information and Communications Technologies, AICT2019 - Proceedings, pp. 216-222, doi:10.1109/AIACT.2019.8847909 [32] Pasyeka M., Sheketa V., Pasieka N., Chupakhina S. & Dronyuk, I. (2019). System analysis of caching requests on network computing nodes. Paper presented at the 2019 3rd International Conference on Advanced Information and Communications Technologies, AICT 2019 - Proceedings, 216-222. doi:10.1109/AIACT.2019.8847909 [33] Riznyk O., Povshuk O., Kynash Y., Nazarkevich M., & Yurchak I. (2018). Synthesis of non- equidistant location of sensors in sensor network. 14th International Conference on Perspective Technologies and Methods in MEMS Design, MEMSTECH 2018 - Proceedings, 204-208. doi:10.1109/MEMSTECH.2018.8365734 [34] S. Babichev, A. Sharko, B. Durnyak, V. Zhydetskyy and I. Izonin, "Application of Huang Transform and Wavelet Analysis for Acoustic Emission Signal Filtering," 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON), 2019, pp. 859-863, doi: 10.1109/UKRCON.2019.8879839. [35] Sharon Belvisi, N. M., Muhammad, N., & Alonso-Fernandez, F. (2020). Forensic authorship analysis of Microblogging texts using N-grams and Stylometric features. 2020 8th International Workshop on Biometrics and Forensics (IWBF), 1-6. doi: 10.1109/iwbf49977.2020.9107953 [36] Sikora, L., Lysa, N., Fedyna, B., Durnyak, B., Martsyshyn, R., & Miyushkovych, Y. (2018). Technologies of development laser based system for measuring the concentration of contaminants for ecological monitoring. Paper presented at the 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2018 - Proceedings, 1 93-96. doi:10.1109/STC-CSIT.2018.8526602 [37] Singh, T., & Kumari, M. (2016). Role of text pre-processing in Twitter sentiment analysis. Procedia Computer Science, 89, 549-554. https://doi.org/10.1016/j.procs.2016.06.095 [38] Tkachenko, R., Izonin, I., Kryvinska, N., Dronyuk, I., & Zub, K. (2020). An approach towards increasing prediction accuracy for the recovery of missing iot data based on the grnn-sgtm ensemble. Sensors (Switzerland), 20(9) doi:10.3390/s20092625 [39] Tkachenko, R., Izonin, I., Vitynskyi, P., Lotoshynska, N., & Pavlyuk, O. (2018). Development of the non-iterative supervised learning predictor based on the ito decomposition and sgtm neural- like structure for managing medical insurance costs. Data, 3(4) doi:10.3390/data3040046 [40] V. Buriachok, et al., Invasion Detection Model using Two-Stage Criterion of Detection of Network Anomalies, Cybersecurity Providing in Information and Telecommunication Systems (CPITS), pp. 23–32, Jul. 2020. [41] Y. Romanyshyn, V. Sheketa, L. Poteriailo, V. Pikh, N. Pasieka and Y. Kalambet Social- communication web technologies in the higher education as means of knowledge transfer. IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). – Vol.3. – 2019. – Lviv, Ukraine. – pp. 35–39. [42] Zharikova M. & Sherstjuk, V. (2017). “Academic integrity support system for educational institution,” 2017 IEEE 1st Ukraine Conference on Electrical and Computer Engineering, UKRCON 2017 - Proceedings, 1212-1215. doi:10.1109/UKRCON.2017.8100445 68