<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>May</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Automatic psychoemotional state analysis of text messages⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksii Bychkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tatyana Obrusnik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kseniia Dukhnovska</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>Bohdan Hawrylyshyn str. 24, Kyiv, UA-04116</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>0</volume>
      <fpage>1</fpage>
      <lpage>05</lpage>
      <abstract>
        <p>This research focuses on developing new methods for the automatic detection of psycho-emotional disorders, such as depression and anxiety, through the analysis of textual data. By utilizing modern natural language processing models, including BERT and GPT, a comparison of their effectiveness with classical methods, such as BoW and TF-IDF, was conducted. The results demonstrated that deep learning models exhibit significantly higher accuracy and recall in identifying signs of disorders, especially when integrating emotional dynamics analysis. This study represents the first comprehensive comparison of classical and transformer-based approaches in this field, and also introduces a novel method for integrating semantic and emotional data. The obtained results can be used for the creation of online monitoring and early diagnosis services for psycho-emotional disorders.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;NLP</kwd>
        <kwd>text analysis</kwd>
        <kwd>Bag-of-Words</kwd>
        <kwd>TF-IDF</kwd>
        <kwd>BERT</kwd>
        <kwd>GPT</kwd>
        <kwd>emotional patterns</kwd>
        <kwd>semantic patterns</kwd>
        <kwd>psychoemotional disorders</kwd>
        <kwd>depression</kwd>
        <kwd>anxiety</kwd>
        <kwd>1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Text remains one of the most important means of communication in the modern world, and the
development of internet technologies has significantly increased the volume of written
communication. Social networks, chats, forums, and personal blogs all represent vast sources of data
that can reflect not only factual information but also the emotional and psychological state of their
authors. Automated detection of psycho emotional disorders in such data has become a promising
area of research in Natural Language Processing (NLP). Modern machine learning methods allow us
to search for deep patterns in large-scale texts, revealing hidden emotional and semantic signals. This
underpins the development of effective mental health support systems that can anticipate and
promptly diagnose the risk of psycho emotional disorders.</p>
      <p>
        Psycho emotional disorders (depression, anxiety, etc.) are becoming increasingly common, posing
a serious public health concern. According to the World Health Organization, in the first year of the
COVID-19 pandemic, the global prevalence of anxiety and depressive conditions rose by 25%. Timely
detection of these disorders is critically important for providing timely assistance, yet traditional
diagnostic methods (clinical surveys, psycho diagnostic tests) require substantial resources and can
be subjective. A promising solution to this problem involves using NLP techniques to automatically
analyze texts generated by users (e.g., social media posts, diary entries, or survey responses).
Linguistic features and writing style can reveal a person’s internal psychological state. Research
shows that certain linguistic markers—frequent use of first-person pronouns, negatively charged
vocabulary, or emotionally intense words—correlate with depressive and anxious conditions. Recent
progress in NLP, particularly breakthroughs in deep learning, offers new possibilities for a deeper
analysis of semantic (meaning-based) and emotional patterns in text to uncover hidden indicators of
psycho emotional disorders. As such, combining NLP methods and psycholinguistic analysis for
automated detection of psychoemotional pathologies is highly relevant, both scientifically and
practically.
2. Literature Review and Problem Statement
In the world of artificial intelligence, machine learning is a powerful tool that allows computers to
learn from data without explicit programming. One of the fundamental tasks of machine learning is
classification, the process of assigning objects to specific categories or classes.Study [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] aimed to
increase the classification speed of data obtained from a Brain-Computer Interface without
significantly impacting the accuracy of processing and classification. Study 2[] examined the
classification accuracy of electroencephalogram signals obtained from brain-computer interface
devices using classifiers based on AdaBoost, Decision Tree, k-Nearest Neighbor, Gaussian SVM,
Linear SVM, Polynomial SVM, Random Forest, and Random Forest Regression.
      </p>
      <p>Initiating an analysis of scientific research in the field of Natural Language Processing, it is
important to note its intensive development as one of the leading directions of Artificial Intelligence,
comprehensively encompassing an increasing number of areas of our activity. Currently, a
significant number of diverse scientific works dedicated to this topic can be observed.Article [3]
focuses on the calculation of statistical characteristics of text, the identification of functional
relationships between different statistical characteristics, and the verification of distribution laws.
Empirical calculations were performed on the basis of Ukrainian text, and the following were
obtained: a statistical dependence between the frequency of word usage and its rank in a frequency
dictionary (Zipf's law, or rank distribution), the dependence of the number of words on the frequency
of their usage in the text (Heaps' law, or spectral distribution), and the distribution of sentences in the
text by length was calculated. This work also analyzed models that explain the regression
relationships between the quantitative characteristics of the text.</p>
      <p>Algorithms designed for processing and analyzing Ukrainian language texts deserve special
attention, as their development has not yet reached a sufficient level. In this area, researchers employ
a wide range of tools and methodologies. For instance, paper [4] presents an algorithm for the
identification of tongue twisters in the Ukrainian language. The authors interpret sentences as a
unique geometric model in a four-dimensional space, which they term a twistor. They decompose
this geometric model into simpler components, thus forming a specific grid. For this grid, an invariant
is determined – a value that retains its significance under certain modifications of the structure.
Subsequently, this invariant serves as input data for a classifier, the training objective of which is to
distinguish between tongue twisters and ordinary sentences. In study [5], using the mathematical
tools of topology, the complexity of the phonological organization of tongue twisters is examined,
specifically the number of different sound groups and their interconnections. It is anticipated that
such research will contribute to the creation of more advanced models for speech recognition.</p>
      <p>The article [6] examines the support vector machine method using a kernel based on a term
cooccurrence matrix in a corpus of text documents. From fuzzy set theory, the relationship between two
terms in a collection of text documents can be defined. From this, it is possible to construct a kernel
for the support vector method. The study proves that the co-occurrence matrix can be a kernel for the
support vector method. The work demonstrates that the classification quality for the support vector
machine method with a kernel of the term co-occurrence matrix in a collection of text documents
exceeds the quality of the standard SVM method. The article [7] introduces a novel dynamic system
for the analysis of Ukrainian-language texts aimed at network information monitoring. The proposed
system addresses the limitations of existing Natural Language Processing tools. The operational
principles of the system, which are based on dynamic N-grams, are described. The results presented
demonstrate the promising nature of the developed approach and pave the way for further research
in the field of Ukrainian NLP. The article [8] considers mathematical models of text documents,
taking into account the time factor, which can be used in search algorithms. Emphasis is placed on
the exponential increase in the amount of information both in electronic storage and in real life, as
well as the importance of the dynamic characteristics of information for search engines. In this work,
a dynamic model of a text document was developed based on TF-IDF metrics, and experiments were
conducted regarding its application in search algorithms. The simulation results demonstrated an
increase in search efficiency due to the proposed dynamic TF-IDF model.</p>
      <p>Recent years’ studies show a growing interest in NLP methods for assessing a person’s mental
state based on text. Speech and writing can reflect cognitive and emotional processes linked with
psycho emotional disorders, such as depression.</p>
      <p>Researchers address varied tasks: from automatically classifying the presence of a disorder
(distinguishing depressive vs. non-depressive text) to evaluating the severity of a disorder and
predicting risks (e.g., suicidal tendencies) based on linguistic patterns [9]. Early work in this field
focused primarily on sentiment analysis (emotional coloring) and the usage of linguistic markers. For
example, psycholinguistic lexicons like LIWC have been used to tally words associated with
emotions, cognitive processes, etc. [10]. These approaches revealed correlations between language
features and depressive states, for instance, an increased occurrence of negative words or first-person
singular pronouns in texts from people with depression [11]. One study found that analyzing the
sentiment of social media posts can detect signs of depression, though the approach’s accuracy was
limited (R^2 ~ 0.10). This suggests that a superficial analysis of emotional tone without deeper
contextual consideration has restricted diagnostic value [12].</p>
      <p>In response to these challenges, modern research has introduced increasingly sophisticated NLP
models, accounting for semantic context and nuanced language features. For instance, analyzing
social network texts has shown the effectiveness of deep learning models in classifying mental
disorders from users’ language [13]. Some research now encompasses a wide range of disorders—
from depression and anxiety to post-traumatic stress disorder—and attempts to identify
characteristic emotional patterns (sequences of emotional cues in text) and semantic patterns (word
choice and phrasing) for each. Specifically, combining multiple transformer models has proven
promising: for instance, Hegde et al. employed XLNet, RoBERTa, and ELECTRA simultaneously with
Bayesian parameter optimization to classify 15 types of mental disorders based on Reddit posts,
achieving an accuracy of ~78% [14], surpassing each model individually. This underscores a trend
toward combining various NLP models to improve the reliability of identifying psycho emotional
states.</p>
      <p>Methods for text analysis to detect emotional and semantic features have evolved from traditional
approaches to modern deep learning models. Traditional approaches include Bag-of-Words and
TFIDF. In these models, a document is represented as a vector of word frequencies without considering
word order [15]. Bag-of-Words/TF-IDF, together with classical machine learning algorithms (e.g.,
SVM, logistic regression), have served as baseline solutions for text sentiment analysis and early
attempts to classify mental disorders. Although these methods can detect the general emotional tone
of text, they ignore context—i.e., word order and ambiguity remain unaccounted for [16]. To enhance
their performance, additional lexical and linguistic features were introduced, such as counting
certain word categories using LIWC. Combining these features with decision trees or naive Bayes
classifiers helped capture certain stylistic markers (e.g., the frequency of words indicating cognitive
distortions), improving the results of social media text analysis (F1 ~ 0.7 for recognizing depressive
comments). However, limited contextual representation remained a drawback: dictionary-based
models cannot handle polysemy or sarcasm, often failing to distinguish the nuanced contexts of a
single word. The next step involved semantic word embeddings (e.g., word2vec, GloVe) and deep
neural networks. Word embedding models learn from large text corpora and produce dense vector
representations of words reflecting their semantic closeness. This allows for semantic context: words
used in similar contexts have similar vectors. Combined with classical classifiers, such features
provide a noticeable improvement over Bag-of-Words. Reports indicate that employing word2vec
embeddings with SVM for early depression detection (eRisk) yielded an F1 score ~0.63, rising to ~0.73
with pretrained GloVe embeddings. Subsequent developments include deep neural networks, notably
recurrent (RNN) and convolutional (CNN).</p>
      <p>Based on the literature, there is a need for new text analysis methods that simultaneously consider
semantic context and emotional patterns to more accurately detect psycho emotional disorders. The
limitations of existing methods—insufficient context awareness and interpretational complexity—
may be addressed through integrating transformer models and emotional analysis methods. Such a
comprehensive approach, combining content and tone analysis, should enhance accuracy and
sensitivity in detecting concealed psycho emotional states. The objective of this study is to develop a
novel method capable of effectively identifying psycho emotional disorders from textual data,
marking a crucial step toward creating more refined mental health support systems.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Aim of the Study</title>
      <p>This study aims to improve the accuracy and reliability of automatically detecting
psychoemotional disorders from textual data by developing new approaches to analyzing semantic
and emotional patterns in user messages. In other words, the research seeks to create a method for
effectively recognizing hidden psychological markers in text and promptly diagnosing potential
mood or emotional disturbances.</p>
      <p>To achieve the stated goal, the following tasks must be addressed:




</p>
      <p>Analyze the current state of research and existing methods for automatically detecting
psychoemotional disorders in textual data.</p>
      <p>Identify the key semantic and emotional features (patterns) of language that are statistically
related to a person’s psychoemotional state.</p>
      <p>Develop a methodology that integrates both semantic content analysis and emotional tone
analysis of text to reveal signs of psychoemotional disorders.</p>
      <p>Implement and test model prototypes—both classical NLP algorithms and deep learning
models—to classify psychoemotional states based on text.</p>
      <p>Conduct a comparative experimental analysis of the results from classical and deep NLP
approaches and assess their effectiveness using metrics such as accuracy, recall, F1-score, and
ROC-AUC.</p>
    </sec>
    <sec id="sec-3">
      <title>4. Research Methods</title>
      <p>This work employs a combination of text-processing methods. Classical NLP techniques—
statistical text analysis (TF-IDF feature models), linguistic analysis with specialized dictionaries of
emotional vocabulary, and topic modeling—were utilized to detect semantic themes. Simultaneously,
deep learning methods (recurrent and convolutional neural networks, and transformer models such
as BERT) were used for automatically extracting hidden semantic connections and assessing the
emotional tone of text. The textual data underwent preprocessing including tokenization,
normalization (lemma conversion), and removal of noise and stopwords. For training and testing the
models, we prepared a dataset split into training and test subsets, also using cross-validation to
enhance reliability. We evaluated model performance using standard classification metrics (accuracy,
recall, F1-score) and ROC-curve analysis. We also performed a statistical analysis of the results
(hypothesis testing concerning differences in metrics) to confirm the advantages of deep approaches
over classical methods.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Methodology and Research Results</title>
      <p>The proposed system for detecting psychoemotional disorders is constructed as a multi-stage
textprocessing pipeline. Its architecture consists of several core modules: text preprocessing, semantic
feature analysis, emotional color (tone) analysis, and psychoemotional state classification. In the
preprocessing stage, we clean the text—removing stopwords and punctuation, converting words to
lemmas (lemmatization), and performing tokenization. This normalization aims to eliminate noise
and enhance subsequent analytical steps.</p>
      <p>The next key component is extracting semantic patterns. Text is transformed into vector
representations that capture the content features of messages. For instance, word embedding models
(like Word2Vec or integrated transformer models such as BERT) encode text into numerical features,
thereby accounting for semantic word relationships and contexts. Simultaneously, a set of specialized
features reflect the text’s emotional patterns—such as sentiment polarity (positive/negative),
sentiment index, and frequencies of emotionally charged words. We employed a lexicon-based
sentiment analysis as well as an emotion classifier capable of identifying basic emotional states (joy,
sadness, anger, etc.) within each text fragment. The result of this stage is a combined feature vector
representing both semantic (theme, meaning) and emotional (mood, tone) information for each text
document.</p>
      <p>The system’s final module—psychoemotional state classification—takes these semantic and
emotional features as input to a machine learning algorithm that determines whether the text’s
author shows signs of a psychoemotional disorder. Several approaches were tested: classical machine
learning (logistic regression, SVM) for computed features, along with deep neural networks
(recurrent LSTM and transformer-based models) for automatically extracting patterns from raw text.
The highest performance was achieved by a hybrid model integrating semantic and emotional
analysis. For example, an ensemble including BERT (for context) and an additional dense layer
receiving emotional-state indicators can recognize hidden semantic signs of disorders (e.g., themes of
self-devaluation or hopelessness) alongside emotional indicators (negative vocabulary, expressions
of sadness, anger). Figure 1 illustrates the system’s architecture, from input text through modules for
cleaning, semantic/emotional analysis, and finally to the classifier that estimates the likelihood of a
psychoemotional disorder. In this diagram, we see the principal modules that guide the process from
raw text to a psychoemotional state classification.</p>
      <sec id="sec-4-1">
        <title>The algorithm comprises seven steps:</title>
        <p>



</p>
        <p>Input Text: The system receives messages, posts, or documents as input.</p>
        <p>Cleaning Module: Stopwords and unwanted symbols are removed, followed by tokenization
and lemmatization. This reduces data noise and normalizes text before further processing.
Semantic Analysis (using Word2Vec, BERT, etc.): Meaningful patterns are uncovered; the text
is represented as numerical features reflecting semantic relationships and context.
Emotional Analysis: Detects the text’s emotional tone (positive/negative) and specific
emotional states like sadness, joy, anger. This involves lexicon-based approaches or
pretrained sentiment models.</p>
        <p>Combining Semantic and Emotional Features: The model concurrently considers both
semantic and emotional aspects of messages.</p>
        <p>Psychoemotional State Classification: SVM, logistic regression, LSTM, or transformer-based
models take in the combined features.</p>
        <p>Output Results: The system outputs the probability of a psychoemotional disorder, or a
“normal/disorder” conclusion, along with a confidence measure.</p>
        <p>The described architecture is implemented in Python using NLP libraries such as NLTK, Torch,
and Transformers for semantic analysis, as well as specialized lexicons for tonality. The
corresponding source code is provided in the appendices. The architecture is flexible—if needed, one
can adapt it to detect various kinds of disorders by selecting appropriate training data and adjusting
each module.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Forming “Emotional Dynamics” in Large Text Corpora</title>
      <p>Emotional dynamics involves tracking changes in emotional states in large sequential text
datasets. The underlying concept is to treat text not in isolation, but rather as a time series of mood
indicators [18], allowing the system to identify trends and anomalies in an author’s or community’s
emotional state. To implement this, an extensive text corpus is segmented into sub-sequences (e.g.,
by message publication date or chapters in a long document), and an integrated emotional indicator is
calculated for each. We used a tonality index—the difference between the fraction of positively and
negatively charged words—and other more advanced metrics, such as neural sentiment analysis.</p>
      <p>Figure 2 illustrates an example of emotional dynamics for two social media users: one suffering
from depression and one presumed healthy. The x-axis shows a temporal sequence of texts (days or
post numbers), and the y-axis represents mood values (lower values mean more negative tone). It is
clear that the depressed user’s curve remains in the negative range the entire time and shows a
downward trend—e.g., from moderately negative tonality at the beginning (approximately –0.2 to –
0.3) to significantly negative (around –0.8) at the end. This correlates with worsening depressive
symptoms in the user’s posts (more references to hopelessness, sadness, loneliness, or suicidal
thoughts). The other user’s emotional curve oscillates near zero or above, featuring both negative and
positive spikes, but overall remains neutral or optimistic.</p>
      <p>In Table 2, it is shown which actual words were hidden under the conditional labels (T1–T5)
during the construction of the histograms (see Fig. 3). Notably, words such as ‘nenavydzhu’ (hate),
‘bil’ (pain), and ‘zhyttia’ (life)—all of which carry negative connotations—were more frequently used
by depressed users, whereas words like ‘shchaslyvyi’ (happy), ‘liubliu’ (love), and ‘usmishka’ (smile)
were more common in the control group.</p>
      <p>The entire process of forming “emotional dynamics” and performing lexical analysis is automated
via Python scripts (see the code at the end), enabling large-scale data processing and generating
graphs from the calculated metrics.</p>
    </sec>
    <sec id="sec-6">
      <title>7. Validation of Results on Real Open Datasets</title>
      <p>We carried out experimental validation on open datasets to verify the effectiveness of the
proposed approaches. Specifically, we used a social media dataset from Reddit for detecting
depression (“The Depression Dataset”), containing posts from r/depression and a control group. We
chose this dataset for its realism: it comprises genuine texts where the presence or absence of a
depressive disorder was determined by user self-report or expert annotation. The total dataset
includes over 10,000 textual posts (after cleaning), evenly split between “depression” and “normal.”
Control posts were sourced from subreddits unrelated to psychological problems, ensuring
differences in content and tone. The model was trained on 70% of the data (training set) and evaluated
on the remaining 30% (test set), which was never used during training. For a robust estimate, a 5-fold
cross-validation was also applied to the training data.</p>
      <p>We compared several classification models: logistic regression with Bag-of-Words text
representation; SVM with an RBF kernel on TF-IDF features; a deep recurrent neural network (LSTM)
on word sequences; a transformer model (BERT) fine-tuned on our training set; and an ensemble
approach combining semantic and emotional features (hereafter “Proposed”). Table 3 shows the
primary quality metrics for each model: accuracy, recall (for the “disorder” class, i.e., the model’s
ability to detect it), F1-score, and the area under the ROC curve (ROC-AUC).</p>
      <sec id="sec-6-1">
        <title>Logistic Regression</title>
      </sec>
      <sec id="sec-6-2">
        <title>SVM (RBF kernel)</title>
      </sec>
      <sec id="sec-6-3">
        <title>LSTM (neural network)</title>
      </sec>
      <sec id="sec-6-4">
        <title>BERT (transformer)</title>
        <p>Proposed
(semantic+emotional)
75.4
78.9
82.5
85.0
88.2
70.2
75.0
79.1
82.1
85.4
72.7
76.8
80.7
83.5
86.7
80.1
83.5
88.4
90.0
92.1</p>
        <p>As seen in Table 3, the proposed model, which accounts for both semantic and emotional patterns,
gave the best results on the test data. Its classification accuracy reached 88.2%, which is 3–5
percentage points higher than the nearest competitor (BERT at 85.0%). Notably, the improvement is
primarily due to higher recall (85.4% vs. 82.1% for BERT): the model is less likely to miss cases of
disorder, i.e., it is more sensitive to depressive cues in text. The F1-score, the harmonic mean of
precision and recall, also peaks with the ensemble (86.7%), confirming the model’s balanced
performance—it detects most people with disorders while making few false alarms about healthy
users.</p>
        <p>All models had ROC curves plotted; the AUC metric summarizes performance over various
classification thresholds. The combined-features model achieved a 92.1% ROC-AUC, indicating
excellent class separation. BERT likewise outperforms classical algorithms because it incorporates
deep context and multi-layer representations of language. Yet it is still slightly behind our proposed
approach, which integrates emotional features. Hence, leveraging emotional indicators improves
recognition of hidden psychoemotional patterns in cases where purely semantic models (even BERT)
might struggle due to missing explicit keywords.</p>
        <p>Validating on real data confirmed the hypothesis that combining semantic text analysis and
emotional tone boosts the accuracy of automatic psychoemotional disorder detection. The results are
representative, as testing was conducted on an open dataset that did not overlap with the training
data. We also repeated the experiments on another dataset—e.g., short social media messages with
mood labels. The ensemble model again outperformed baseline approaches by about the same 3–5%.
This underscores the method’s robustness: incorporating “emotional dynamics” and lexical
indicators of emotion can detect a disorder even when the topic or text style changes.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>8. Conclusions</title>
      <p>This research was devoted to developing and testing approaches for detecting psychoemotional
disorders based on analyzing semantic and emotional patterns in text. We provided an integrated
perspective, describing the detection system’s architecture (modules for text cleaning,
semantic/emotional analysis, and classification), examining the “emotional dynamics” method for
tracking long-term changes in mood across text sequences, and demonstrating validation results on
real open social media datasets.</p>
      <p>Key experimental findings indicate the superior performance of transformer-based models (BERT,
GPT) and composite solutions (that combine semantic and emotional features) over classical methods
(Bag-of-Words, TF-IDF) and basic neural networks (LSTM). The best accuracy (Accuracy), recall, and
F1-score were achieved by the ensemble model that jointly accounts for semantics and tonality,
surpassing the individual models by about 3–5% on average.</p>
      <p>Emotional dynamics proved its practical usefulness, enabling the detection of not only isolated
negative/positive messages but also longer-term shifts in emotional state. A persistently negative
curve, trending downward, emerged as an additional indicator of potential psychoemotional issues,
particularly depression.</p>
      <p>Statistical and lexical analysis supports the observation that individuals with signs of
psychological disorders favor different vocabulary: e.g., more negative words and references to
hopelessness or suicidal themes, along with more frequent first-person singular pronouns, indicating
a deeper self-focus. Meanwhile, “healthy” users exhibit a broader range of positive, socially oriented
words.</p>
      <p>The proposed system can be used to monitor and automatically detect at-risk users in large text
corpora (e.g., in social networks or forums). The findings confirm the potential for early diagnosis of
psychoemotional conditions, contributing to more timely psychological intervention and prevention
of more severe outcomes.</p>
      <p>Study limitations primarily concern dataset size and type: many open datasets contain only
English or domain-specific data. Questions also remain about scalability to entirely different
platforms and conditions, as well as in-depth clinical validation.</p>
      <p>Enhancing detection approaches for complex emotional states (e.g., mixed or multidimensional
feelings), leveraging transfer learning for multilingual text corpora, and integrating these methods
into online psychological consulting systems. Additionally, evolving interpretable models
(Explainable AI) could shed more light on how such systems arrive at decisions, which is key for
clinical and psychological settings.</p>
      <p>Hence, the text analysis approaches for semantic and emotional patterns described here present
promising avenues for early, fairly accurate diagnostics of psychoemotional disorders. Deploying
such a system would enable large-scale automatic processing of textual data to identify potentially
vulnerable user groups and facilitate providing timely psychological support.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used service Gemini in order to: Grammar and
spelling check. After using those service, the authors reviewed and edited the content as needed and
take full responsibility for the publication’s content.
[2] Dimitrov, G., Panayotova, G., Jekov, B., Petrov, P., Kostadinova, I., Petrova, S., ... &amp; Parvanov, A.
(2021). Algorithms for classification of signals derived from Human Brain. In has been published
in the International Journal of Circuits, Systems and Signal Processing (Vol. 15, pp. 1521-1526).
[3] Vasiliev, O. M., Chaly, O. V., &amp; Vasilyeva, I. V. (2013). On exotic problems of physics, Winnie the</p>
      <p>Pooh and Zipf's law. Journal of Physical Research, (17, Issue 1), 1001-1.
[4] Yurchuk I., Gurnik O. Tongue twisters detection in Ukrainian by using TDA //CEUR Workshop</p>
      <p>Proceedings, 2023. Vol. 3396. pp.163-172.
[5] Kovaliuk T., Yurchuk I., Gurnik O. Topological structure of Ukrainian tongue twisters based on
speech sound analysis. Proceedings of the Modern Data Science Technologies Workshop
(MoDaST-2024), Lviv, Ukraine, May 31 - June 1, 2024, Vol. 3723. pp. 328-339.
[6] Kovaliuk, T., Yurchuk, I., Dukhnovska, K., Kovtun, O., &amp; Nikolaienko, A. (2023, December). Text
classification using term co-occurrence matrix. In XX International Scientific Conference"
Dynamic System Modeling and Stability Investigation"(DSMSI-2023).
[7] Dukhnovska K., Leshchenko O., Kovtun O., Krasnopjorov P., Perehuda O. The dynamic N-Gram
system for monitoring Ukrainian-language content in social networks. In 2024 IEEE 5th
International Conference on Advanced Trends in Information Theory
[8] Yurchuk, I., Dukhnovska, K., Kovtun, O., Martsafei, A., &amp; Kushnir, A. (2024). Mathematical
model of a text document with consideration of time for search algorithm.
https://ceur-ws.org/Vol-3909/Paper_38.pdf
[9] “NLP techniques are used to model cognitive states associated with depression,” National Center
for Biotechnology Information, U.S. National Library of Medicine, 2023.
https://pmc.ncbi.nlm.nih.gov/articles/PMC11574504/#:~:text=NLP%20techniques%20are%20use
d%20to,cognitive%20states%20associated%20with%20depression
[10] “The primary task addressed in NLP-based mental health research,” National Center for
Biotechnology Information, U.S. National Library of Medicine, 2023.
https://pmc.ncbi.nlm.nih.gov/articles/PMC11574504/#:~:text=The%20primary%20task%20addre
ssed%20in,an%20individual%E2%80%99s%20mental%20health%20status
[11] “Convolutional neural networks and recurrent neural networks (long short-term memory 87)
for mental health text classification,” National Center for Biotechnology Information, U.S.
National Library of Medicine, 2023.
https://pmc.ncbi.nlm.nih.gov/articles/PMC11574504/#:~:text=Convolutional%20neural%20netw
orks%20and%20recurrent,term%20memory%2087
[12] Lee, M. (2018). Automatic detection of mental health issues on social media using natural
language processing. University of North Carolina, Charlotte.
https://webpages.charlotte.edu/mlee173/pdfs/aiccsa18.pdf#:~:text=A,not%20be%20recognized%
20as%20similar
[13] Lee, M. (2018). Investigating semantic similarity and depression-related content in social media
posts. University of North Carolina,
Charlotte.https://webpages.charlotte.edu/mlee173/pdfs/aiccsa18.pdf#:~:text=A,not%20be%20rec
ognized%20as%20similar
[14] Devlin, J., Chang, M.-W., Lee, K., &amp; Toutanova, K. (2019). BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of
the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, 4171–4186.
[15] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … &amp; Polosukhin, I.
(2017). Attention is All You Need. Advances in Neural Information Processing Systems, 30,
5998–6008.
[16] Pennebaker, J. W., Boyd, R. L., Jordan, K., &amp; Blackburn, K. (2015). The Development and</p>
      <p>Psychometric Properties of LIWC2015. University of Texas at Austin, 1–22.
[17] World Health Organization. (2022). Depression and Other Common Mental Disorders: Global
Health Estimates. Retrieved from
https://www.who.int/publications-detail/depression-andother-common-mental-disorders
[18] Losada, D. E., &amp; Crestani, F. (2016). A Test Collection for Research on Depression and Language
Usage. International Conference of the Cross-Language Evaluation Forum for European
Languages (CLEF), 28–39.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Panayotova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Kovatcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Garvanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Petrova</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , ... &amp;
          <string-name>
            <given-names>S.</given-names>
            <surname>Bychkov</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          (
          <year>2021</year>
          , March).
          <article-title>Decrease the time for classification of the incoming signals from BCI</article-title>
          .
          <source>In Proceedings of the 2021 3rd International Symposium on Signal Processing Systems</source>
          (pp.
          <fpage>7</fpage>
          -
          <lpage>13</lpage>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>