Method of Determining the Text Sentiment by Thematic Rubrics Anatoliy Sachenko1,2, Taras Lendiuk1, Khrystyna Lipianina-Honcharenko1, Maciej Dobrowolski2, Gena Boguta1 and Leonid Bytsyura1 1 West Ukrainian National University, Lvivska str., 11, Ternopil, 46000, Ukraine 2 Kazimierz Pulaski University of Technology and Humanities in Radom, Department of Informatics, Jacek Malczewski str., 29, Radom, 26 600, Poland Abstract A method for determining the text sentiment of by thematic rubrics is proposed. It is based on a complex approach that integrates natural language processing, machine learning and linguistic analysis for automatic classification of text data. To implement the method, a generalized client-server architecture of the text sentiment for analysis system was developed and a set of data was collected from a wide range of article texts from the Internet sites of Ukraine, which ensure the representativeness of various styles, genres and topics. The high efficiency of the system in terms of classification of texts by rubrics was experimentally confirmed with a correspondence of 92%. Keywords text sentiment, thematic rubrics, natural language processing, machine learning, linguistic analysis, automatic classification, text data 1 1. Introduction Currently, we are witnessing many information wars being waged on the world stage, with a special emphasis on the war in Ukraine. Russia uses information operations as a key element of its hybrid war against Ukraine and directs significant resources to the creation and dissemination of disinformation, the purpose of which is to influence public opinion, undermine trust in the Ukrainian state, its institutions and leaders, as well as to distort the real state of affairs in the international community In the context of increasing information attacks from Russia, the importance of an accurate and objective analysis of the text sentiment of related to various aspects of the war and its impact on various spheres of life becomes extremely relevant. Automated analysis of the emotional coloring of texts can help detect attempts to manipulate public opinion, assess the general mood in society regarding certain events 1 COLINS-2024: 8th International Conference on Computational Linguistics and Intelligent Systems, April 12–13, 2024, Lviv, Ukraine as@wunu.edu.ua (A. Sachenko); tl@wunu.edu.ua (T. Lendiuk); xrustya.com@gmail.com (Kh. Lipianina- Honcharenko); m.dobrowolski@uthrad.pl (M. Dobrowolski); genaboguta7@gmail.com (G. Boguta); l.bytsyura@wunu.edu.ua (L. Bytsyura) 0000-0002-0907-3682 (A. Sachenko); 0000-0001-9484-8333 (T. Lendiuk); 0000-0002-2441-6292 (Kh. Lipianina-Honcharenko); 0000-0003-0296-9651 (M. Dobrowolski); 0009-0000-9788-1753 (G. Boguta); 0000- 0002-9476-011X (L. Bytsyura) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings or policies, and identify changes in the information space that may indicate new directions of information attacks or disinformation campaigns. The proposed method integrates advanced technologies of natural language processing (NLP), machine learning and linguistic analysis, which allows not only to automate the process of determining the emotional coloring of texts, but also to ensure high accuracy and objectivity of the obtained results. The application of such a complex approach in the conditions of information warfare opens up new opportunities for monitoring the information space, identifying and analyzing information operations, as well as for developing effective strategies for countering disinformation. Given the high dynamics of the information space and the constant change in the tactics and strategies of information warfare, the development and implementation of innovative text analysis methods that allow for prompt monitoring and analysis of attempts at manipulation and disinformation is extremely important. Further materials are presented according to the following structure. In Chapter 2 the analysis of recent related works is fulfilled and in Chapter 3, the proposed method is described. Chapter 4 is dedicated to implementation and case studies, and Chapter 5 summarizes the received outcomes. 2. Related Work The study of text tonality, which involves the integration of natural language processing (NLP), machine learning, and linguistic analysis, occupies an important place in modern scientific research. In [1], the analysis tool for text sentiment was developed based on the fusion of machine learning algorithms, demonstrating the effectiveness of combining different methods to more accurately determine the emotional coloring of texts. Authors [2] emphasized the importance of using TF-IDF and machine learning methods for sentiment analysis of Twitter data, confirming the importance of these techniques in the field of big data processing. In [3], the impact of the lexical richness of the training corpus on the performance of machine learning in analysis tasks was highlighted, emphasizing the need for a careful approach to data selection and processing. Authors [4] presented an innovative approach to the classification of hierarchical comments using BERT and a specialized naive Bayes classifier, opening up new opportunities for advanced text analysis. In [5], a deep learning-based approach to sentiment analysis is proposed for ranking online products using a set of probabilistic linguistic terms, which demonstrates the potential of using these technologies for commercial purposes. Authors [6] developed a technique for analyzing the sentiment of data from Twitter using NLP and machine learning, which emphasizes the importance of an integrated approach to information processing. The authors [7] demonstrate how sentiment analysis on Twitter can be improved using NLP and machine learning methods for emotion categorization and trend visualization. In [8], the advantages of integrating linguistic analysis for the study of human language are revealed, which opens up new opportunities for accurate thematic categorization. Research [9] focuses on the analysis of online product reviews using advanced deep learning and machine learning techniques [23, 24] to improve data classification and extract detailed emotion information [21]. Compared to the analogue [10], which focuses on the sentiment analysis of Twitter data, this work is distinguished by a deep integration of natural language processing (NLP) techniques, machine learning and linguistic analysis. In addition, a distinctive feature of the proposed approach is the development of specialized algorithms for accurate determination of sentiment taking into account contextual semantics and lexical diversity within specific thematic headings. This allows not only to more accurately classify the emotional coloring of texts, but also to detect subtle nuances in emotional expressions, which makes the proposed approach more adaptable to the complexities of natural language and provides a higher accuracy of analysis compared to existing methods. 3. Proposed Method The proposed method for determining the sentiment of text by thematic headings is a comprehensive approach that integrates natural language processing (NLP), machine learning, and linguistic analysis for automatic classification of text data. The method allows to evaluate the emotional color of the text (positive, neutral or negative) and to express it quantitatively in the form of an expanded percentage scale in the range from -100% to +100%. Let us present this method as a set of the following stages and structurally (Fig. 1): Stage 1. Text pre-processing [11]: 1.1. Removal of extra characters, text normalization; 1.2. Detection and removal of stop words; 1.3. Tokenization of text [11]. Stage 2. Classification of the text by thematic headings [12]: 2.1. The use of machine learning methods to determine the thematic rubric of the text [13]; 2.2. Training models on a predefined set of texts with a clear thematic affiliation. Stage 3. Creation and use of dictionaries for each thematic rubric [14]: 3.1. Development of dictionaries with key words and phrases specific to each rubric; 3.2. Determination of the emotional coloring of keywords (positive, negative, neutral). Stage 4. Sentiment analysis [15]: 4.1. Using sentiment computation methods such as dictionary-based estimation or deep learning models; 4.2. Calculation of the sentiment index T for the text based on the formula: P−N T= ×100 % P+ N +Q where P is the number of positive words, N is the number of negative words, Q is the number of neutral words. Stage 5. Inversion of tonality. In the case when the text belongs to a hostile source and does not contain a direct mention of Ukraine, tonality inversion is used. Stage 6. Displaying the results. Development of an interface for visualizing the tonality of the text with the possibility of viewing a detailed analysis by thematic headings. Figure 1: Structure of the method of determining the tonality of the text by thematic rubrics 4. Implementation and Case Study The generalized client-server architecture of the text tonality analysis system is presented in Fig. 2. The client initiates the interaction by sending a request to the server, which in turn processes the received information. After processing, the server sends the necessary information back to the client. This process is cyclic, so after transferring information, the client can initiate interaction with the server again. The diagram shows a typical request-response model that is fundamental to a client-server architecture, where the server acts as a provider of resources and services and the client acts as a consumer of those resources and services. Figure 2: Generalized client-server architecture of the text tonality analysis system To implement the proposed method, a set of data was collected from a wide range of texts of articles from Internet sites of Ukraine, which ensure the representativeness of various styles, genres and topics. The sample covers a variety of emotional contexts to test the algorithm's ability to accurately classify emotional nuances. The analysis of the text will be assigned to one of the following thematic headings, which were determined by experts from the security service of state bodies:  Military and political leadership of Ukraine at all levels;  Law enforcement agencies of Ukraine;  Armed Forces of Ukraine;  Socio-political situation in the regions of Ukraine (attitudes towards mobilization, socio-economic stability, etc.);  Pro-Russian religious organizations on the territory of Ukraine;  Pro-Russian movements, formation of the concept of "Russian peace";  International image of Ukraine in the EU (English, German, Polish, Romanian, French, Hungarian, Ukrainian, Russian languages);  International image of Ukraine in the USA, Canada and Great Britain (English language);  International image of Ukraine in African countries (English, French, Arabic languages);  International image of Ukraine in Asian countries (Chinese, Russian, Turkish, Arabic, Georgian, Kazakh, Farsi, Kyrgyz, Tajik, Uzbek, Japanese, Korean languages);  Ukraine in the information space of the Russian Federation;  Ukraine in the information space of the Republic of Belarus;  Socio-economic, political, military situation in the Russian Federation (attitudes towards the top management, mobilization, deterioration of the economic situation, etc.). The main functionality of the web interface (Fig. 3) [15] allows users to insert url links of news sites for analysis. After submitting the text, the system uses an algorithm to determine the emotional color, displaying the results in the form of an understandable report. The report includes quantitative indicators of the presence of positive and negative words, as well as an overall text tone index. As a result of the analysis of the tonality of the text [16], the rubric "Military and political leadership of Ukraine at all levels" was determined, where a positive tone of +80% was indicated. This reflects a high level of positive emotional coloring of the text of the article. Details are given about military aspects, including the acquisition or use of military equipment (the F-16 and Bayraktar are mentioned), and an optimistic attitude towards events related to the war in Ukraine is reflected. Figure 3: Example of text analysis result The inversion function (Fig. 4) of the results was included for users who wish to analyze the opposite emotional tone of the text. This can be useful, for example, when analyzing texts containing sarcasm or irony, where the literal meaning of the words can be misleading. The inversion allows you to quickly see how the overall assessment of emotional coloring will change if these stylistic figures are taken into account. Figure 4: Field for entering a link and using inversion One of the most important aspects of the web interface (Fig. 5a) is the ability to edit keywords for each heading. Users can add new words to dictionaries, remove existing ones, which affects their significance in the analysis algorithm. This allows you to adapt the system to the specific needs of the user and increase the accuracy of determining the tonality for specific topics or writing styles. In addition, the web interface includes a dictionary management module (Fig. 5c), which allows users to view and update the database of words on the basis of which the analysis is performed. This is especially important to take into account linguistic updates, social and cultural changes that can affect language and its emotional load. a – key words b – dictionary of emotions Figure 5: Example of editing keywords and dictionary A comparative analysis was conducted to assess the effectiveness of the system for determining the tonality of texts (100 news stories) by comparing the automated results of the system with experts' assessments. The analysis took into account such parameters as the assignment of the text to the appropriate rubric by the system and experts, the comparison of the tonality of the texts determined by both the system and experts, and the correspondence of these assessments. Using the URL as a unique identifier for each text ensured accurate tracking of results. In addition to the quantitative evaluations, the experts provided additional notes for a deeper understanding of the reasons for the discrepancies between the expert evaluations and the results of the system, which will contribute to the further improvement of its accuracy and reliability. Based on the results of filling out Table 1, statistical indicators were calculated, namely the simple percentage of matches between the system and experts. Table 1 Evaluation of System Efficiency Statistical indicator value indicator Correspondence of 92% rubrics Average tonality 15% deviation The obtained results (see Table 1) indicate a fairly high efficiency of the system in terms of classification of texts by headings with a correspondence of 92%. This means that in most cases the system correctly identifies the thematic category of the text, which indicates its reliability in determining the context of news. However, the average tonality deviation of 15% is quite significant and may indicate some shortcomings in the work of the algorithms of the system for evaluating the emotional coloring of the text. This may be due to the incompleteness of the dictionaries used for sentiment analysis. However, dictionaries can be constantly updated, which is a significant advantage of the system, since the language is constantly evolving, and the context and use of the vocabulary can change. The ability to supplement dictionaries by users allows the system to adapt more quickly to novelties in the language and changes in the use of terms, especially in the field of news, where it is important to take into account not only the lexical meaning of words, but also their connotative influence. Thus, the system demonstrates a high accuracy in the classification of thematic headings, but needs improvement in determining tonality. Constant addition and updating of dictionaries, with the possibility of making changes from users, is an important process for increasing the accuracy of the analysis of the emotional coloring of texts. 5. Conclusion A method for determining the text sentiment by thematic rubrics is proposed based on an integrated approach that integrates natural language processing, machine learning and linguistic analysis for automatic classification of text data. This allows you to assess the emotional color of the text (positive, neutral or negative) and express it quantitatively in the form of a percentage scale from -100% to +100%. The text sentiment analysis system implemented based on the method has a high accuracy (92%) of classification of thematic headings. This means that in most cases the system correctly identifies the thematic category of the text, which indicates its reliability in determining the context of news. However, the average sentiment deviation of 15% is quite significant and may indicate some shortcomings in the work of the algorithms of the system for evaluating the emotional coloring of the text. This may be due to the incompleteness of the dictionaries used for sentiment analysis. In the future, authors are going to explore the methods [17-20] for improving the quality and performance of text sentiment analysis. References [1] P. Ajitha, A. Sivasangari, R. Rajkumar, & S. Poonguzhali, Design of text sentiment analysis tool using feature extraction based on fusing machine learning algorithms. Journal of Intelligent & Fuzzy Systems 40 (2021), 6375-6383 https://dx.doi.org/10.3233/jifs-189478 [2] S. Singh, K. Kumar, & B. Kumar, Sentiment analysis of twitter data using TF-IDF and machine learning techniques, in: Proceedings of the 2022 IEEE 6th International Conference on Communication and Electronics Systems (ICCES), 2022, pp. 252-255. https://dx.doi.org/10.1109/com-it-con54601.2022.9850477 [3] S. Garg, A. Saini, & N. Khanna, Is sentiment analysis an art or a science? Impact of lexical richness in training corpus on machine learning. in: Proceedings of the 2016 IEEE International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2016, pp. 2729-2735. https://dx.doi.org/10.1109/ICACCI.2016.7732474 [4] M. Dhina, & S. Sumathi, An innovative approach to classify hierarchical remarks with multi-class using BERT and customized naїve bayes classifier. International Journal of Engineering, Science and Technology, 13 (2022), 32-45. https://dx.doi.org/10.4314/ijest.v13i4.4 [5] Z. Liu, H. Liao, M. Li, Q. Yang, & F. A Meng, Deep learning-based sentiment analysis approach for online product ranking with probabilistic linguistic term sets, IEEE Transactions on Engineering Management (2023). https://dx.doi.org/10.1109/tem.2023.3271597 [6] K. Brindha, S. Senthilkumar, A. K. Singh, & P. M. Sharma, Sentiment analysis with NLP on Twitter data, in: Proceedings of the IEEE International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), 2022, pp. 1-5. https://dx.doi.org/10.1109/SMARTGENCON56628.2022.10084036 [7] K. Darshan, J. Samuel, M. Swamy, P. Koparde, & N. Shivashankara, NLP - Powered sentiment analysis on the Twitter, Saudi Journal of Engineering and Technology 9, (2024) 1-11. https://dx.doi.org/10.36348/sjet.2024.v09i01.001 [8] P. Srinivas, K. Gayathri, K. Bhavitha, Jahnavi & K. D. Sarath, BLIP-NLP model for sentiment analysis, in: Proceedings of the 2023 2nd International Conference on Edge Computing and Applications (ICECAA), 2023, pp. 468-475. IEEE https://dx.doi.org/10.1109/ICECAA58104.2023.10212253 [9] L. Bharadwaj, Sentiment analysis in online product reviews: mining customer opinions for sentiment classification 5, (2023). https://dx.doi.org/10.36948/ijfmr.2023.v05i05.6090 [10] S. Voloshyn, V. Vysotska, O. Markiv, I. Dyyak, I. Budz and V. Schuchmann, Sentiment analysis technology of English newspapers quotes based on neural network as public opinion influences identification tool, in: Proceedings of the 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), 2022, pp. 83-88, doi: 10.1109/CSIT56902.2022.10000627. [11] R. Gramyak, H. Lipyanina-Goncharenko, A. Sachenko, T. Lendyuk, & D. Zahorodnia, Intelligent Method of a competitive product choosing based on the emotional feedbacks coloring, in: Proceedings of the IntelITSIS, 2021, pp. 246-257. [12] H. Lipyanina, S. Sachenko, T. Lendyuk, & A. Sachenko, Targeting model of HEI video marketing based on classification tree, in: Proceedings of the ICTERI Workshops, October 2020, pp. 487-498. [13] H. Lipyanina, S. Sachenko, T. Lendyuk, V. Brych, V. Yatskiv, & O. Osolinskiy, Method of detecting a fictitious company on the machine learning base, in: International Conference on Computer Science, Engineering and Education Applications, January, 2021, pp. 138-146. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-80472-5_12 [14] K. Lipianina-Honcharenko, T. Lendiuk, A. Sachenko, O. Osolinskyi, D. Zahorodnia, & M. Komar, An intelligent method for forming the advertising content of higher education institutions based on semantic analysis, in: International Conference on Information and Communication Technologies in Education, Research, and Industrial Applications, September 2021, pp. 169-182. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-14841-5_11 [15] Streamlit. URL: https://nelczgkwcsghtwdkw7buxn.streamlit.app/ [16] Ukraine can receive F-16 fighter jets in the first half of 2024, - the premier of Denmark, 2024. URL: https://www.0352.ua/news/3738470/ukraina-moze-otrimati- vinisuvaci-f-16-u-persomu-pivricci-2024-roku-premerka-danii [17] M. Komar, V. Golovko, A. Sachenko and S. Bezobrazov, Development of neural network immune detectors for computer attacks recognition and classification, in: Proceedings of the 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS), Berlin, Germany, 2013, pp. 665-668, doi: 10.1109/IDAACS.2013.6663008. [18] Zhengbing Hu, Yevgeniy V. Bodyanskiy, Nonna Ye. Kulishova, Oleksii K. Tyshchenko, A multidimensional extended neo-fuzzy neuron for facial expression recognition, International Journal of Intelligent Systems and Applications (IJISA) 9 (2017) 29-36. DOI: 10.5815/ijisa.2017.09.04 [19] V. Vysotska, O. Markiv, S. Tchynetskyi, B. Polishchuk, O. Bratasyuk, V. Panasyuk, Sentiment analysis of information space as feedback of target audience for regional e-business support in Ukraine, in: CEUR Workshop Proceedings, vol-3426, 2023, 488-513. [20] Sutriawan, S., P. N. Andono, M. Muljono, & R. A. Pramunendar, Performance evaluation of classification algorithm for movie review sentiment analysis, International Journal of Computing 22 (2023) 7-14. https://doi.org/10.47839/ijc.22.1.2873 [21] S. Voloshyn, O. Markiv, V. Vysotska, I. Dyyak, L. Chyrun and V. Panasyuk, Emotion recognition system project of English newspapers to regional e-business adaptation, in: Proceedings of the 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), 2022, pp. 392-397, doi: 10.1109/CSIT56902.2022.10000527. [22] K. Mehta, & S. P. Panda, Sentiment analysis on e-commerce apparels using convolutional neural network, International Journal of Computing, vol. 21, issue 2, pp. 234-241, 2022. https://doi.org/10.47839/ijc.21.2.2592 [23] R. Lynnyk, V. Vysotska, Y. Matseliukh, Y. Burov, L. Demkiv, A. Zaverbnyj, A. Sachenko, I. Shylinska, I. Yevseyeva, and O. Bihun, DDOS attacks analysis based on machine learning in challenges of global changes, in: 2020 CEUR Workshop Proceedings, 2020 vol. 2631, pp. 159-171. [24] O. Soprun, M. Bublyk, Y. Matseliukh, V. Andrunyk, L. Chyrun, I. Dyyak, A. Yakovlev, M. Emmerich, O. Osolinsky, A. Sachenko, Forecasting temperatures of a synchronous motor with permanent magnets using machine learning, in: 2020 CEUR Workshop Proceedings, 2020, vol. 2631, pp. 95-120.