1. Introduction

Neural network detection of digital fatigue and burnout with interpretable thematic segmentation

Olexander Mazurets

Roman Vit

vit.roman.vit@gmail.com 0

Maryna Molchanova

m.o.molchanova@gmail.com 0

Olena Sobko

olenasobko.ua@gmail.com 0

Adam Wierzbicki

adamw@pjwstk.edu.pl

Dmytro Chumachenko

dichumachenko@gmail.com 1 0 Polish-Japanese Academy of Information Technology , Koszykowa 86 str. 02-008, Warsaw , Poland 1 University of Waterloo , Waterloo, ON N2L 3G1 , Canada

2025

The rapid expansion of remote work and digital communication has intensified the prevalence of digital fatigue and professional burnout, yet existing automated detection methods often lack the interpretability required for clinical and organizational trust. A significant gap remains in efectively distinguishing between topical discussions of fatigue and the actual psycho-emotional state of the user. In this work, we propose a novel interpretable approach combining thematic segmentation, communication object identification, and a BERT-based neural network to detect digital fatigue with high contextual sensitivity. On validation data, the proposed model achieved an Accuracy of 0.83, Precision of 0.87, Recall of 0.88, and an 1-score of 0.87. The study demonstrates that integrating thematic analysis with deep learning allows for a multi-level assessment of cognitive load, enabling the identification of both local overload centers and overall fatigue levels. This approach directly contributes to the Sustainable Development Goals by promoting mental well-being (SDG #3) and decent work environments (SDG #8) through healthier digital practices.

eol>Digital fatigue digital burnout communication object neural network interpretable thematic segmentation

1. Introduction

With the rise of remote work, digital communication, and online education, more and more people are experiencing digital fatigue [ 1 ] and burnout [ 2 ]. This is a condition where mental resources are depleted due to the constant demand to be “connected,” to respond, and to adapt to the dynamics of the digital environment [ 3 ]. The ethical dimension of the application of artificial intelligence for decision-making in the field of medical law is also a relevant topic for research [ 4 ].

This problem has become especially relevant after the global changes caused by the COVID-19 pandemic [ 5 ], when remote work and learning have become the norm from the exception [ 6 ]. The lack of physical boundaries between personal and professional life, prolonged interaction through screens, high intensity information flows, and the loss of familiar social contacts create ideal conditions for the development of chronic stress, mental exhaustion, and a deterioration in the quality of life [ 7 ].

The relevance of addressing digital fatigue is also emphasized within the framework of the United Nations Sustainable Development Goals, as it is directly connected with the promotion of mental health and well-being under SDG #3 and the advancement of decent work and sustainable economic growth under SDG #8. In this context, the development of interpretable neural network approaches to monitoring digital fatigue not only contributes to individual psychological resilience, but also supports the creation of healthier, more sustainable, and human-centered digital ecosystems that are aligned with global development priorities [ 8 ].

The aim of the research is to develop and test a neural network interpretable approach to automated detection of digital fatigue and professional burnout through thematic segmentation of text messages and analysis of communication objects, which allows to detect hidden patterns of psycho-emotional exhaustion in the digital environment.

The main contributions of the paper are: • A new multi-stage framework for detecting digital fatigue is proposed, which combines thematic segmentation, communication object analysis, and fatigue index modeling at both the segment and profile levels. • The Neural Network Detection of Digital Fatigue and Burnout Using Thematic Segmentation and Communication Object Analysis method is implemented, which allows for targeted assessment of fatigue within the content of communication. • A multi-indicator fatigue assessment model is developed, which allows for calculating both localized and aggregated levels of digital fatigue across all user profiles. • A visualization module is created to interpret user-specific digital fatigue maps, which will allow for identifying problematic communication segments.

2. Related works

The literature review in this area aims to identify key determinants of digital fatigue, its manifestations in the work context, as well as existing methodologies for its detection.

The article [ 9 ] is devoted to a comprehensive analysis of the phenomenon of digital fatigue as a current challenge for the professional environment. The authors conducted a review of the scientific literature for the period 2010–2025, focusing on identifying key factors, consequences and strategies for overcoming digital fatigue among employees. The study finds that excessive use of digital tools leads to cognitive overload, increased stress levels and reduced productivity. The paper focuses on the complexity of the interaction of synchronous and asynchronous communication formats, as well as on the blurring of boundaries between professional and personal life. The authors emphasize the need to implement contextualized organizational approaches to digital communication and adhere to the principles of digital well-being.

The study [ 10 ] focused on the growing threat of digital fatigue among young people caused by excessive use of screens in everyday life. Particular attention was paid to Computer Vision Syndrome (CVS) and its associated postural strain in the 18–35 age group. Based on a survey of 160 respondents and analysis of working postures using the OWAS system, a high prevalence of symptoms of visual and musculoskeletal discomfort was found, including dry and burning eyes, headaches, neck stifness and shoulder tension. A significant proportion of participants demonstrated a medium to high risk of postural strain, which is exacerbated by poor ergonomics and low awareness of digital hygiene.

The authors conducted a study on the use of natural language processing and machine learning methods to identify burnout indicators based on text content [ 11 ]. For the analysis, a corpus of 13,568 anonymized text messages obtained from the social platform Reddit was formed, among which 352 messages were identified as related to burnout and 979 to depression. Ensemble approaches to classification based on subreddit-based data separation strategies and random batching were proposed and implemented. The results obtained demonstrate that ensemble models significantly outperform basic classifiers in terms of balanced Accuracy (0.93), test 1-score (0.43), and test Recall (0.93). The study confirms the efectiveness of using NLP methods for early detection of symptoms of professional burnout using text data, which opens up prospects for the further development of automated systems for monitoring psycho-emotional state.

A stress detection methodology for preventing burnout based on speech and written expression analysis using natural language processing methods is presented in [ 12 ]. The authors combine knowledge from psychology and data science to create a knowledge base that allows comparing speech characteristics with objective stress indicators, such as heart rate variability, cortisol levels, and blood pressure. The tool operates autonomously and passively, identifying both cognitive and emotional manifestations of stress [ 13 ]. Its accuracy, confirmed by biomedical data, reaches 83% according to the 1-measure metric. The article emphasizes the importance of an interdisciplinary approach and the potential for implementing this technology in a professional environment for monitoring employee well-being with the aim of early detection and prevention of burnout.

The article [ 14 ] is devoted to studying the phenomenon of emotional burnout among higher education students during the COVID-19 pandemic, which arose due to sharp changes in the learning format and social environment. Based on a qualitative narrative review of 38 peer-reviewed scientific publications, the authors analyze the main factors that have caused a decrease in academic motivation, engagement and success. Particular attention is paid to the impact of financial instability, mental health problems, social isolation and digital fatigue from distance learning. At the same time, the study considers university strategies to mitigate the negative consequences, in particular, the introduction of flexible academic policies, hybrid learning models and psychological support. The authors also note the role of artificial intelligence – chatbots and teaching assistants, as scalable tools for emotional and academic assistance in the online environment.

The analysis of scientific sources indicates the growing relevance of research on digital fatigue and professional burnout as complex multifactorial phenomena manifested in cognitive, physiological and behavioral symptoms. Modern approaches to detecting these states are based on a combination of natural language processing methods, postural load analysis, biophysiological monitoring and cognitiveafective diagnostics. The scientific literature shows a trend towards interdisciplinarity: the combination of data from digital behavior, language patterns and psycho-emotional indicators allows to expand the possibilities of early detection of burnout and digital overload in educational and professional environments. At the same time, there is a lack of research that would combine thematic segmentation of communications with the analysis of digital interaction objects for the purposes of automated fatigue detection.

3. Problem statement

Despite the growing interest in detecting digital fatigue using natural language processing methods, there is a conflict between the need for accurate recognition of psycho-emotional states and the linguistic ambiguity of texts, where the topic of digital fatigue and burnout may not correspond to the real internal state of the author. This complicates automated classification and requires a deeper analysis of speech patterns, context and author intentions, which goes beyond traditional thematic or emotional modeling.

On the one hand, human speech is an important source of detecting psycho-emotional states, in particular digital fatigue and burnout, however, on the other hand, the presence of semantic topics related to fatigue or negative events is not yet a direct indicator of digital fatigue as a psychological phenomenon.

This creates methodological complexity: the text may contain signs of emotional exhaustion without explicit mention of the digital context, or, conversely, may contain mentions of fatigue, but only descriptively, without the presence of symptoms (for example, information messages, news, discussion of the topic). Thus, there is a need to distinguish the thematic content of the message from latent psychological markers of the human condition. Therefore, it is necessary to develop a method that would take into account all the above-described aspects and be interpretable, as well as create software to study its efectiveness.

4. Method design

The proposed method for neural network detection of digital fatigue and burnout using interpretable thematic segmentation and communication object analysis can be schematically presented as a sequential execution of the stages shown in Figure 1. The input data are all entries (posts, comments, notes, etc.) of a certain author or group of authors. Each entry in this block is marked with an index (1, 2, . . . ) and forms a set of text content, from which an “author profile” is then generated. The profile refers to metadata and summary characteristics: timestamps, length of entries, general topic, frequency of updates.

At stage 1, all records undergo thematic analysis using thematic modeling models [ 15 ]. The result is that each record belongs to one of the identified topics. The LDA thematic modeling algorithm [ 16 ] was used in the study. Communication segments mean conditional groups of texts united by a common topic ( 1, 2, . . . ). Each record is mapped to the topic that best describes its content with “arrows”. This stage allows you to break the entire text stream into semantic blocks, preparing the data for a deeper analysis.

At stage 2, the search for target communication objects takes place. Target communication objects are a combined set of keywords found by diferent methods (TF-IDF [ 17 ], YAKE! [ 18 ] and Dispersive Estimation [ 19 ]) without repetitions and a set of NER [ 20 ], grouped by lemmatization. In this study, the choice of approach to identifying target communication objects is due to the need to ensure high semantic sensitivity and resistance to thematic shifts inherent in diferent styles of digital communication. Combining statistical, linguistic and dispersion features allows us to obtain a representative set of keywords relevant to the context of each thematic segment. In addition, supplementing automatically detected key terms with named entities that have undergone the lemmatization procedure allows us to increase the conceptual integrity of the resulting set of objects, ensuring more accurate identification of semantic cores of digital interaction.

At stage 3, neural network detection of digital fatigue and burnout occurs using the “BERTForSequenceClassification” neural network [ 21 ]. Such a determination occurs both in the general set of text representations of the author’s digital profile and within each of the identified topics. The choice of the “BERTForSequenceClassification” model [ 22 ] is due to its ability to form contextualized vector representations of text sequences, which ensures high quality classification even with limited input length. Due to pre-training on a large corpus of texts, the BERT architecture efectively takes into account semantic and syntactic relationships between tokens, which is critical for detecting complex cognitive and emotional states, such as digital fatigue and burnout.

The output data is a digital fatigue map by profile. The results are aggregated into a two-level visualization [ 23 ]. By segments, for each topic, a digital fatigue indicator and a graphic interpretation in the form of a word cloud (target objects of communication) are displayed that characterize the communication segment. Across the entire profile: a generalized indicator with the appropriate level and visualization is formed, which allows you to quickly assess the overall state of the authors “digital health”.

A structured approach to detecting digital fatigue and professional burnout based on a neural network is proposed, which combines thematic segmentation of text records and analysis of target objects of communication. The methodology includes the sequential application of thematic modeling, semantic keyword extraction, and contextualized classification using the BERT architecture, which provides a multi-level assessment of cognitive load and emotional exhaustion in the digital environment.

5. Experiment 5.1. Experimental datasets

The “Healthcare Workers Burnout Tweets” dataset [ 24 ] on Kaggle [ 25 ] contains a collection of public tweets collected from the profiles of healthcare workers, mostly nurses, who are expressing their experiences working at the epicenter of the COVID-19 pandemic. Each entry is represented by the main tweet attributes: text content, publication timestamp, user metadata, and a label indicating the presence or absence of burnout.

The “Mental Health Social Media” dataset [ 26 ] from Kaggle is a corpus of English-language tweets collected using the oficial Twitter API [ 27 ] in the first half of 2019. It consists of about 20k records, each of which contains: a unique tweet and user ID, as well as a publication timestamp, the message text, cleaned of unnecessary characters and filtered by language (only English tweets), and interaction metadata (number of retweets, likes, hashtags and mentions by other users). The dataset also contains features: tone and subjectivity, frequency of use of personal pronouns, service words and emojis, tweet length (number of words/characters), lexical indicators (readability level, distribution of speech parts, etc.).

Each category contains an identical number of images, ensuring balanced representation of the target classes and minimizing model bias throughout the training process [ 28 ]. Therefore, the “Healthcare Workers Burnout Tweets” dataset will be used to train the BERT neural network model for the digital fatigue detection method, and the “Mental Health Social Media” dataset will be used to validate the digital fatigue detection method with deep learning neural network models.

5.2. Experiment and setup description

The software architecture is implemented in the Google Colab cloud environment [ 29 ] and includes three functional modules: (1) topic modeling, (2) detection of target communication objects, and (3) classification of digital fatigue using deep learning. Thematic analysis was implemented using the LDA algorithm, which was applied to previously cleaned and lemmatized records. Cleaning included noise removal, tokenization, normalization, and stop-word removal, after which a corpus dictionary was formed. The optimal number of topics was determined using model coherence. To construct a set of target communication objects, a combined approach was implemented that integrates the results of three independent keyword detection methods – TF-IDF, YAKE!, and variance analysis [ 30 ]. Named entities were determined using the Stanza library, which allows for linguistically accurate entity recognition with subsequent lemmatization. All the obtained objects were combined into a single set, cleaned of repetitions, which guarantees consistency between thematic afiliation and the semantic core of each segment.

Neural network detection of digital fatigue was carried out based on the “BERTForSequenceClassification” model, implemented in the “HuggingFace Transformers” framework [ 31 ]. Input text data was fed to the built-in tokenizer, which converted them into tokens sequences, trimmed them to a maximum length of 32 tokens and supplemented them with special characters [PAD] to a fixed size. The “Healthcare Workers Burnout Tweets” corpus was used for training, it was stratified into training (80%) and validation (20%) parts.

An independent “Mental Health Social Media” set was used for testing the model, which allowed for validation on new data, since it is possible to select messages within the selected user ID. The classification results were aggregated both at the level of individual thematic segments and within the overall digital profile of the author. The results were visualized using the “matplotlib” [ 32 ], “seaborn” [ 33 ] and “wordcloud” [ 34 ] libraries, which allow building interactive digital fatigue maps and semantic clouds of communication objects.

6. Results and discussion

As a result of training the BERT neural network architecture, the results according to the metrics shown in Table 1 were obtained. At the same time, the confusion matrix [ 35 ] showed the result shown in Figure 2.

The classification results performed using the BERT neural network for digital fatigue detection demonstrate a balance between two classes – the presence of digital fatigue (class 1) and its absence (class 0). The model shows relatively equal performance for both classes, which indicates its ability to accurately distinguish between these categories.

In the case of positive predictions (class 1), the model correctly identified 175 cases of digital fatigue as the presence of this condition (True Positives, TP). This indicates that the model does a good job of detecting digital fatigue, as most of the cases were correctly classified. However, there were also errors: 13 cases of digital fatigue were incorrectly classified as the absence of fatigue (False Negatives, FN), which indicates the model’s possible dificulties in identifying some cases of fatigue. This may be due to the fact that the text messages in these cases had more vague or less obvious signs of fatigue.

Regarding negative predictions (class 0), the model also demonstrated good discrimination. It correctly classified 171 cases of absence of digital fatigue as the absence of this condition (True Negatives, TN), which indicates that the model is able to clearly determine when fatigue is absent. At the same time, there were 17 cases where the model incorrectly classified the absence of digital fatigue as the presence of fatigue (False Positives, FP). These errors may be related to the presence of similar patterns or characteristics in the texts that were incorrectly interpreted as signs of digital fatigue. The created method with a trained neural network model was tested on the training dataset (the “burnout” category). The results are shown in Table 2.

According to Table 2, LDA identified 5 communication segments within which the proportion of texts with burnout is over 94%. However, this is fully correlated with confusion matrix and obtained accuracy, since texts marked as containing burnout were analyzed.

The next experiment selected a user for whom a digital profile was built in the form of his posts (“Mental Health Social Media” dataset). After applying thematic modeling, two communication segments were extracted. According to the identified communication segments, target communication objects were identified and the digital fatigue and burnout index was determined for the identified segments (Figure 3).

The first theme contains words that are likely to be related to problems that may be related to digital fatigue or burnout, such as: “therapy”, “treatment”, “disorders”, “depressive”, “work”, “boss”. These words indicate stress, anxiety, fatigue, as well as interaction with work responsibilities, which are important aspects for a theme describing digital fatigue. For example, the words “therapy” and “treatment” may indicate the search for solutions or support in dealing with this problem, while “depressive” and “disorders” link the problem to psychological aspects.

At the same time, words like “work” and “boss” indicate potential stressors in the workplace, which are relevant in the context of digital fatigue, since people working in conditions of constant digital interaction often feel overloaded due to high demands, in particular from superiors.

The second theme is more general, and includes words that are mainly related to life, household chores and family interactions: “talk”, “business”, “article”, “help”, “life”, “family”, “wife”, “home”. This may be less specific to digital fatigue, as these words are more related to the social aspects of life. However, some of them, in particular “help”, “business” or “life”, can be interpreted as a connection to stress or dificulties that can arise due to overload in work or personal life, which can also lead to emotional exhaustion. For the first theme, the high percentage of burnout (78%) seems logical, since the words from this theme are indeed more related to stressful situations that can cause digital fatigue: work, depression, therapy, etc. The high frequency of such words indicates that this is a theme that covers most of the factors that can lead to burnout.

The second theme, with a burnout rate of 24%, contains words that are more related to social aspects of life, which may be less closely related to digital fatigue or stress caused by excessive use of technology. For example, the words “family”, “home”, “wife” have a lower connection with digital fatigue, which justifies the lower percentage.

The proposed method has confirmed its viability as a tool for automated monitoring of psychoemotional state based on text analysis. Its application contributes to improving the quality of digital fatigue diagnostics, opens up opportunities for integration into systems for adapting digital services to the mental state of users, and creates a basis for further research in the field of personalized interfaces and digital hygiene.

The proposed approach has a number of limitations, including working only with English-language data and limiting the length of the input sequence to 32 tokens. Further research will be aimed at improving the method, which involves adapting it to multilingual analysis, taking into account the peculiarities of the syntax and vocabulary of the Ukrainian language, by using multilingual models. Also, a separate direction is to take into account the temporal dynamics of the user’s digital activity.

7. Conclusion

This paper presents a new approach to the automated detection of digital fatigue and professional burnout based on text analysis, which uniquely combines interpretable thematic segmentation of content with the identification of target objects of communication and subsequent neural network classification using the BERT architecture. The proposed method allows for a comprehensive, multi-level assessment of the user’s cognitive and emotional load in the digital environment, providing both contextual and semantic sensitivity to signs of digital fatigue. The neural network trained to detect digital fatigue achieved robust metrics on validation data, with an Accuracy of 0.83, Precision of 0.87, Recall of 0.88, and an 1-score of 0.87.

Unlike traditional models that operate with a global text profile or aggregated metadata, this paper implements a method of thematic classification of text records, which allows for the isolation of semantic segments of communication. Further analysis within each segment is carried out through the identification of target communication objects, which provides a contextualized interpretation of digital activity. In contrast to unidimensional assessments focused mainly on cognitive or visual symptoms (e.g., computer vision syndrome), a formalized model for assessing digital fatigue is proposed, which includes a number of indicators – at the topic, post, and user profile levels. Such a model allows one to identify both local centers of overload and the overall level of digital fatigue.

However, the current study presents certain limitations. The model was trained and validated exclusively on English-language data, and the input sequence length was restricted to 32 tokens to optimize processing eficiency, potentially limiting the analysis of longer, more complex narratives. Future research will address these constraints by adapting the methodology for multilingual analysis, specifically incorporating the syntax and vocabulary of the Ukrainian language through multilingual models. Additionally, a significant direction for future work is the integration of temporal dynamics into the user’s digital activity profile to monitor the progression of fatigue over time.

Acknowledgments

The authors are grateful to Prof. Tetiana Hovorushchenko, Prof. Olexander Barmak, Prof. Iurii Krak and other Program Committee members for organizing and conducting the workshop ExplAI-2025: Advanced AI in Explainability and Ethics for the Sustainable Development Goals.

Declaration on Generative AI

The authors have not employed any Generative AI tools.

[1]

Supriyadi ,

Sulistiasih ,

K. H.

Rahmi ,

Fahrudin ,

Pramono , The impact of digital fatigue on employee productivity and well-being: A scoping literature review , Environment and Social Psychology 10 ( 2025 ). doi: 10 .59429/esp.v10i2. 3420 .

[2]

K. H.

Rahmi ,

Fahrudin ,

Supriyadi , E. Herlina,

Rosilawati ,

S. R.

Ningrum , Technostress and cognitive fatigue: Reducing digital strain for improved employee well-being: A literature review , Multidisciplinary Reviews 8 ( 2025 ) 2025380 . doi: 10 .31893/multirev.2025380.

[3]

Maetzler ,

L. C.

Guedes ,

K. N.

Emmert ,

Kudelka ,

H. L.

Hildesheim , E. Paulides,

Connolly ,

Davies ,

Dilda ,

Ahmaniemi , et al., Fatigue-related changes of daily function: Most promising measures for the digital age , Digital Biomarkers ( 2024 ) 30 - 39 . doi: 10 .1159/000536568.

[4]

Hovorushchenko ,

Herts ,

Hnatchuk , Concept of intelligent decision support system in the legal regulation of the surrogate motherhood , in: CEUR Workshop Proceedings , volume 2488 , 2019 , pp. 57 - 68 . URL: https://ceur-ws. org/ Vol- 2488 /paper5.pdf.

[5]

Sharma ,

Anand ,

Ahuja ,

Thakur , I. Mondal ,

Singh ,

Kohli ,

Venkateshan , Digital burnout: Covid-19 lockdown mediates excessive technology use stress , World Social Psychiatry 2 ( 2020 ) 171 . doi: 10 .4103/wsp.wsp_ 21 _ 20 .

[6]

R. K.

Sarangal ,

Nargotra , Digital fatigue among students in current covid-19 pandemic: A study of higher education , Gurukul Business Review 18 ( 2022 ). doi:10.48205/gbr.v18.5.

[7]

Byrne , Special topic on reducing technology related stress and burnout: Digital compassion fatigue as an emerging phenomenon for registered nurses experiencing techno-stress, Applied Clinical Informatics ( 2025 ). doi: 10 .1055/a-2564-8809.

[8]

Radiuk ,

Kovalchuk ,

Slobodzian ,

Manziuk ,

Barmak , I. Krak , Human-in-the-loop approach based on MRI and ECG for healthcare diagnosis , in: Proceedings of the 5th International Conference on Informatics & Data-Driven Medicine (IDDM 2022 ), volume 3302 , CEUR-WS.org, Aachen, 2022 , pp. 9 - 20 . URL: https://ceur-ws. org/ Vol- 3302 /paper1.pdf.

[9]

Supriyadi ,

Sulistiasih ,

K. H.

Rahmi ,

Fahrudin ,

Pramono , The impact of digital fatigue on employee productivity and well-being: A scoping literature review , Environment and Social Psychology 10 ( 2025 ). doi: 10 .59429/esp.v10i2. 3420 .

[10]

Bagaji ,

Rao , Digital fatigue in the age of screens: eye and postural strain among 18-35-year-old screen users , International Journal of Research - GRANTHAALAYAH 13 ( 2025 ). doi: 10 .29121/ granthaalayah.v13. i5 . 2025 . 6191 .

[11]

Merhbene ,

Nath ,

A. R.

Puttick ,

Kurpicz-Briki , Burnoutensemble: Augmented intelligence to detect indications for burnout in clinical psychology , Frontiers in Big Data 5 ( 2022 ). doi: 10 . 3389/fdata. 2022 . 863100 .

[12]

Mendula ,

Gabrielli ,

Finazzi , C. Dompe',

Delucis , Unveiling mental health insights: A novel nlp tool for stress detection through writing and speaking analysis to prevent burnout , in: AHFE International, 2024 , pp. 167 - 174 . doi: 10 .54941/ahfe1004653.

[13]

Kalyta ,

Barmak ,

Radiuk , I. Krak , Facial emotion recognition for photo and video surveillance based on machine learning and visual analytics , Applied Sciences 13 ( 2023 ) 9890 . doi: 10 .3390/ app13179890.

[14]

P. D.

Deep ,

Chen , Student burnout and mental health in higher education during covid-19: Online learning fatigue, institutional support, and the role of artificial intelligence , Higher Education Studies 15 ( 2025 ) 381 . doi: 10 .5539/hes.v15n2p381.

[15]

Krak ,

Didur ,

Molchanova ,

Mazurets ,

Sobko ,

Zalutska ,

Barmak , Method for political propaganda detection in internet content using recurrent neural network models ensemble , in: CEUR Workshop Proceedings , volume 3806 , 2024 , pp. 312 - 324 . URL: https://ceur-ws. org/ Vol-3806/S_36_Krak.pdf.

[16]

J. H.

Lee ,

M. J.

Ostwald , Latent dirichlet allocation (lda) topic models for space syntax studies on spatial experience , City, Territory and Architecture 11 ( 2024 ). doi:10.1186/s40410-023-00223-3.

[17]

C. A. N.

Agustina ,

Novita , Mustakim, N. E. Rozanda, The implementation of tf-idf and word2vec on booster vaccine sentiment analysis using support vector machine algorithm , in: Procedia Computer Science , volume 234 , 2024 , pp. 156 - 163 . doi: 10 .1016/j.procs. 2024 . 02 .162.

[18]

Gupta ,

Chadha ,

Tewari , A natural language processing model on bert and yake technique for keyword extraction on sustainability reports , IEEE Access 1 ( 2024 ). doi: 10 .1109/access. 2024 . 3352742 .

[19]

Krak ,

Barmak , O. Mazurets, The practice implementation of the information technology for automated definition of semantic terms sets in the content of educational materials , in: CEUR Workshop Proceedings , volume 2139 , 2018 , pp. 245 - 254 . URL: https://ceur-ws. org/ Vol- 2139 / 245 - 254 .pdf.

[20]

Kalia ,

Singh ,

Kumar , Domain adaptation for ner using mbert , in: Lecture Notes in Networks and Systems , Springer Nature Singapore, 2024 , pp. 171 - 181 . doi: 10 .1007/ 978 -981-97-6992-6_ 14 .

[21]

Iazykova , Bert-for-sequence- classification , 2025 . URL: https://pypi.org/project/ bert -for-sequence-classification/.

[22]

Xu ,

Xie ,

Zhao ,

Yu ,

Feng , Bert-sirna: sirna target prediction based on bert pre-trained interpretable model , Gene 910 ( 2024 ) 148330 . doi: 10 .1016/j.gene. 2024 . 148330 .

[23]

Kovalchuk ,

Slobodzian ,

Sobko ,

Molchanova ,

Mazurets ,

Barmak , I. Krak,

Savina , Visual analytics-based method for sentiment analysis of covid-19 ukrainian tweets , in: Lecture Notes on Data Engineering and Communications Technologies , volume 149 , 2023 , pp. 591 - 607 . doi: 10 .1007/978-3- 031 -16203-9_ 33 .

[24] M. NG , Healthcare workers burnout, 2025 . URL: https://www.kaggle.com/datasets/mindyng/ healthcareworkersburnout/data.

[25]

Goldbloom , Kaggle, 2025 . URL: https://www.kaggle.com/.

[26] InFamousCoder , Mental health social media, 2025 . URL: https://www.kaggle.com/datasets/ infamouscoder/mental -health-social-media.

[27]

Corp ., Twitter api documentation , 2025 . URL: https://developer.x.com/en/docs/x-api.

[28]

Sobko ,

Mazurets ,

Molchanova ,

Krak ,

Barmak , Method for analysis and formation of representative text datasets , in: CEUR Workshop Proceedings , volume 3899 , 2024 , pp. 84 - 98 . URL: https://ceur-ws. org/ Vol- 3899 /paper9.pdf.

[29] G. LLC. , Google colab , 2025 . URL: https://colab.research.google.com/.

[30]

Barmak ,

Mazurets , I. Krak ,

Kulias ,

Smolarz ,

Azarova ,

Gromaszek ,

Smailova , Information technology for creation of semantic structure of educational materials , in: Proceedings of SPIE - The International Society for Optical Engineering , volume 11176 , 2019 , p. 1117623 . doi: 10 .1117/12.2537064.

[31]

Wolf ,

Debut ,

Sanh ,

Chaumond ,

Delangue ,

Moi ,

Cistac ,

Rault ,

Louf ,

Funtowicz , et al., Transformers: State-of-the-Art Natural Language Processing , in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics , 2020 , pp. 38 - 45 . URL: https: //www.aclweb.org/anthology/2020.emnlp-demos.6. doi: 10 .18653/v1/ 2020 .emnlp-demos. 6 .

[32]

J. D.

Hunter , Matplotlib: A 2d graphics environment , Computing in Science & Engineering 9 ( 2007 ) 90 - 95 . doi: 10 .1109/ MCSE . 2007 . 55 .

[33] M. L. Waskom , seaborn: statistical data visualization , Journal of Open Source Software 6 ( 2021 ) 3021 . URL: https://doi.org/10.21105/joss.03021. doi: 10 .21105/joss.03021.

[34]

Mueller , Wordcloud, 2025 . URL: https://pypi.org/project/wordcloud/.

[35]

Yang , G. Berdine, Confusion matrix, The Southwest Respiratory and Critical Care Chronicles 12 ( 2024 ) 75 - 79 . doi: 10 .12746/swrccc.v12i53. 1391 .