<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S. G. Burdisso, et al., A text classification framework for simple and effective early depression
detection over social media streams, Expert Systems with Applications</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/ICCCNT56998.2023.10308056</article-id>
      <title-group>
        <article-title>Andriy Kovalenko†, Igor Ruban†, Olesia Barkovska*, † and Vladyslav Kholiev†</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>Nauky Ave. 14, Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>133</volume>
      <issue>2019</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The analysis of the emotional state of participants in a digital scientific community is a crucial task that significantly impacts the productivity of scientific discussions, the quality of collective decision-making, and the level of researcher engagement. Scientific discourse differs from general communication texts due to its specific characteristics, including formality, structured presentation, specialized terminology, and a predominantly neutral emotional tone. Traditional sentiment analysis and topic modeling methods, which perform effectively in social media contexts, are not always well adapted to the peculiarities of the academic environment. This study explores approaches to analyzing the emotional state of scientific discussions based on textual data using natural language processing (NLP) techniques, including topic modeling, sentiment analysis, and named entity recognition (NER). The OntoNotes dataset was modified and expanded with annotated texts from scientific forums and article comments to better align NLP methods with the characteristics of academic discourse. A comparative analysis of state-of-the-art machine learning models (BERT-Base, RoBERTa, DistilBERT) was conducted for the automatic analysis of scientific texts. The results demonstrate that transformer-based models significantly improve the accuracy of topic modeling, sentiment analysis, and NER in scientific discussions. An integrated system for analyzing the emotional tone and structure of academic discourse is proposed. Future research will focus on multimodal analysis of scientific communication, incorporating audio and video processing to achieve a deeper understanding of the emotional context within academic interactions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;natural language processing</kwd>
        <kwd>sentiment analysis</kwd>
        <kwd>topic modeling</kwd>
        <kwd>named entity recognition</kwd>
        <kwd>scientific discourse</kwd>
        <kwd>BERT</kwd>
        <kwd>digital scientific community</kwd>
        <kwd>text analysis 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Nowadays, there is a significant increase in the volume of digital scientific communication, including
chats, forums, conferences, and platforms for discussing scientific ideas. While the use of natural
language processing (NLP) for emotion analysis in commercial and social networks (e.g., Twitter,
Facebook) is already well-developed [
        <xref ref-type="bibr" rid="ref1 ref2">1-2</xref>
        ], such approaches remain largely unsystematized in the
academic domain. Identifying the emotions of speakers based on textual data in digital scientific
communities is a complex task that involves NLP, machine learning, and psycholinguistics. This
research direction is crucial for understanding participant engagement levels, discussion
productivity, and, ultimately, the success of scientific interactions.
      </p>
      <p>The analysis of the emotional state of participants in digital scientific communities based on
textual information relies on named entity recognition (NER), automated topic modeling, and
sentiment analysis. Research in this area will enable:
tracking the correlation between emotions and the productivity of scientific discussions by
measuring the relationship between the emotional background of communication and its
efficiency. Future studies will focus on developing an "emotional profile" of successful
discussions. The formation of such an emotional profile can be achieved through intelligent
monitoring of scientific chats, forums, and discussions;
preventing conflicts and researcher burnout by assessing the emotional state of participants,
thereby improving communication and collaboration in scientific communities.</p>
      <p>The implementation of emotion analysis methods in text processing is applied across various
practical fields, some of which are illustrated in Figure 1. This technology enhances communication,
increases team efficiency, prevents conflicts, and optimizes business and social processes.</p>
      <p>The proposed review of sentiment, reaction, and emotion analysis based on textual information
highlights the relevance of applying these methods to digital scientific communities and academic
discussions. Despite the vast body of research on natural language processing (NLP) techniques, their
adaptation to academic discourse remains underexplored. This is due to the complexity and
specificity of scientific terminology, sentence structures, and the frequent use of modal verbs, all of
which may influence the perception of emotions in text.</p>
      <p>The relevance of this research topic is further reinforced by the increasing transition of scientific
communication to digital formats. This shift is particularly significant for Ukrainian researchers,
who are currently operating under the conditions of a full-scale war. Scientists and academics often
experience high cognitive workloads, which can lead to professional burnout. At the same time, the
productivity of scientific discussions is largely influenced by the emotional state of researchers. Thus,
the early detection of negative emotional states, which may enhance engagement, improve the
effectiveness of academic communication, and prevent burnout, represents a highly relevant
research challenge.</p>
      <p>A classification of emotion analysis methods is presented in Figure 2.</p>
      <p>The objective of this study is to analyze the emotional reactions of participants in digital scientific
communities based on textual information to determine their level of engagement and interest in the
discussed topics, as well as to prevent potential conflicts and the pursuit of irrelevant research
directions.</p>
      <p>To achieve this goal, the following tasks must be addressed:



justification of the necessity of analyzing scientific discussions to determine topics, sentiment
analysis, and named entity recognition (NER) for further generalization;
preparation of a training dataset by updating and annotating an existing textual dataset;
evaluation of the impact of fine-tuning neural network models on the accuracy of named
entity recognition, topic classification, and sentiment analysis.</p>
      <p>A further extension of this research involves the creation of a multimodal dataset that includes
not only textual data but also audio and video information, such as recordings of scientific
conferences.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        The systematization of key areas of natural language processing (NLP) and their practical
applications, presented in Figure 3, is based on the categorization proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and expanded upon
by the authors of this article.
      </p>
      <p>
        The development of NLP methods allows the creation of effective applications in computational
linguistics [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5-7</xref>
        ]. In particular, NLP methods have been used in new types of chatbots and translation
systems, which allow text aspect analyzing and determining the emotional mood of text in social
networks and open communication channels of many people. Methods of analyzing emotions and
sentiments in social networks help assess public opinion, improve customer experience, detect fake
news [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], prevent cyberbullying, and support users' psycho-emotional well-being. For example, one
of the problems of sentiment analysis based on digital text analysis and classification is the detection
of depressive states and PTSD (post-traumatic stress disorder) [9-10]. As was shown in studies
[1113] one of the key tasks within NLP is text content analysis, the process of discovering, classifying,
and interpreting information contained in text data.
      </p>
      <p>However, traditional NLP methods, such as rule-based approaches and statistical models, often
struggle with context understanding, semantic nuances, and implicit meanings in textual data. To
overcome these limitations, modern deep learning approaches, particularly neural network-based
models, have been developed. Deep learning techniques, including convolutional neural networks
(CNNs), recurrent neural networks (RNNs), and transformers, have demonstrated high efficiency in
text classification, sentiment analysis, and emotion recognition.</p>
      <p>Recent advancements in neural architectures, such as Long Short-Term Memory (LSTM), Gated
Recurrent Units (GRU), and Transformer-based models (e.g., BERT, GPT, RoBERTa), allow for a more
sophisticated understanding of linguistic structures, capturing dependencies across long text
sequences. These models have shown superior performance in sentiment classification, intent
recognition, and detection of psychological states, including stress and PTSD. The next section
provides an overview of the most widely used neural models for emotion and sentiment analysis.</p>
      <p>Existing sentiment analysis systems, such as Amazon Comprehend or Google Cloud Natural
Language, often suffer from limitations in flexibility, adaptability to specific tasks, and transparency,
making them unsuitable for domains requiring specialized emotional analysis.</p>
      <p>Numerous studies have focused on NLP methods in academic and scientific research, particularly in
the areas of automated classification of scientific articles, citation analysis, and sentiment evaluation in
peer reviews. In particular, the classification of scientific documents using deep learning has been
examined in [14]. This study approached the analysis of emotions in text and scientific publications by
combining machine learning and deep learning techniques, highlighting the need for more advanced
methods for detecting and assessing emotional nuances. A critical review of citation classification
methods is presented in [15-16], where the authors explore context-dependent and context-independent
citation analysis techniques. These approaches rely on citation placement in specific sections of a
document and incorporate deep learning methods and transformer architectures. One of the main
challenges of this research is the complexity of dataset annotation, as even with human intelligence, it is
often difficult to accurately determine the sentiment of a citation. The findings reported in [17] further
emphasize advancements in citation analysis, citation sentiment classification, citation summarization,
and citation-based recommendations. These improvements have been facilitated by the availability of
citation databases such as Web of Science, Scopus, Google Scholar, Microsoft Academic, and Dimensions,
which support machine learning-based citation analysis.</p>
      <p>An analysis of recent studies also demonstrates a growing interest in topic modeling (TM) of scientific
publications [18-19]. Topic models have been applied to analyze scientific publications, evaluate
researcher influence, and track the evolution of research topics over time, which is particularly useful for
monitoring trends in academic literature. The review of existing research presented in this section
indicates that topic modeling is an effective method for uncovering latent themes in scientific articles.
However, traditional models such as Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation
(LDA) have limitations in accurately representing different scientific fields, as they fail to clearly delineate
four distinct academic disciplines. At the same time, the findings suggest that BERT-based topic modeling
can enhance the analysis of scientific literature, though further validation is required for its broader
application in both academic and industrial domains.</p>
      <p>These studies highlight the significance of natural language processing (NLP) methods in improving
the accessibility, organization, and analysis of scientific information. However, in the reviewed works,
these approaches are presented as independent studies, which does not allow for a comprehensive
assessment within a unified system for text information analysis.</p>
      <p>This research proposes a novel system for a comprehensive understanding of the text by combining
multiple dimensions of analysis, integrating emotional classification, named entity recognition, thematic
analysis, and sentiment analysis into a unified solution.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods and Materials</title>
      <p>Any machine learning model requires a training dataset, regardless of the specific analysis task—be
it topic modeling, named entity recognition (NER), or sentiment analysis. However, dataset creation
and annotation present unique challenges across different domains, particularly in labeling
emotional states. Among existing datasets, multimodal datasets enable classification based on audio,
text (transcription), and visual data (facial expressions, gestures).</p>
      <p>An example of a multimodal dataset is MELD (Multimodal EmotionLines Dataset), which provides
video-recorded conversations annotated for emotion recognition. Another well-structured dataset is
IEMOCAP (Interactive Emotional Dyadic Motion Capture), which includes dyadic session recordings
between actors, annotated with emotions such as happiness, anger, sadness, frustration, and
neutrality. Additionally, the SEMAINE dataset contains audiovisual recordings of human-agent
interactions, with annotations for anger, happiness, fear, disgust, sadness, contempt, and amusement.</p>
      <p>The analysis of scientific communication is a less explored area and has distinct characteristics—
academic discourse is characterized by formality and neutrality. To address this gap, in this study,
the OntoNotes dataset was modified by incorporating new data from non-personalized discussions
on scientific forums and article comments. During data preparation, OntoNotes was adapted for topic
modeling by adding topic annotations to each dialogue fragment and categorizing discussions into
major academic fields (e.g., Machine Learning, NLP, Physics, Medicine) (Figure 4a). For sentiment
analysis, annotations included positive, negative, and neutral sentiment labels (Figure 4b). The
processed data was converted into a machine learning-friendly format (JSON/CSV files) and stored
as OntoNotes_mod, a structured dataset ready for further training (Figure 4).
а)</p>
      <p>To evaluate and compare different architectures based on their reported performance on widely
accepted benchmark datasets, such as GoEmotions, CoNLL-2003, and SST-2/SST-5 a comparative
analysis of various deep learning models used for Emotion Classification, Named Entity Recognition
(NER), and Sentiment Analysis was proposed in the article (Table 1).</p>
      <p>This analysis demonstrates that Transformer-based models, particularly BERT and DistilBERT,
provide state-of-the-art performance across multiple NLP tasks. DistilBERT, in particular, offers a
trade-off between accuracy and computational efficiency, making it a suitable choice for applications
requiring faster inference. BERT, on the other hand, remains a robust baseline for NER and Sentiment
Analysis, ensuring high accuracy and generalization capabilities.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and results</title>
      <p>In study [20], the authors present a general model of a system for comprehensive text analysis aimed
at understanding the emotions of discussion participants.</p>
      <sec id="sec-4-1">
        <title>4.1. The named entity recognition block research</title>
        <p>The NER block aggregates solutions to two subtasks – identifying and classifying entities in the text
(NER tags) and tagging the part of speech in each token (POS tags).</p>
        <p>This block uses the OntoNotes_mod dataset, a reference for named object recognition tasks,
which has well-annotated entities, including persons (PER), locations (LOC), organizations (ORG),
and miscellaneous entities (MISC).</p>
        <p>On the base of comparative Table 1 for named entity recognition BERT-base-cased was chosen
for its high F1-score (94.82%) while maintaining a reasonable computational cost. Additionally,
BERT-CRF achieved competitive results (92.29%), demonstrating the effectiveness of combining
BERT with Conditional Random Fields (CRF) for structured prediction tasks.</p>
        <p>The experiments conducted in the paper consisted of configuring the entity classification model
using TrainingArguments:



output_dir – specifies the directory where the model checkpoints will be stored;
evaluation_strategy – specifies when to evaluate the model (e.g., after each epoch);
learning_rate – sets the learning rate for the optimizer, which controls how much to adjust
the model weights relative to the gradient;
per_device_train_batch_size – specifies the number of samples per batch for training;
per_device_eval_batch_size – specifies the number of samples per batch for evaluation;
num_train_epochs – sets the number of epochs to train the model;
weight_decay – applies regularization to the model weights to prevent over-training.</p>
        <p>From the graphs, it is clear that Epoch = 3, Batch Size = 16 give the best result, and only 6
Accuracy (0.9889), Precision (0.9412), Recall (0.949), F1 Score (0.9443), Classification time (353sec).</p>
        <p>For demonstration purposes, we will use the next sentences:

</p>
        <p>Text 1 – "I was thrilled by the outstanding performance of the new iPhone's camera, but the
poor battery life left me frustrated".</p>
        <p>Text 2 – "The exhilarating last-minute victory of Manchester United over Chelsea made the
entire crowd ecstatic".
</p>
        <p>Text 3 – "I am deeply concerned about the lack of strong climate change policies, as they are
essential for protecting our environment and ensuring a sustainable future".</p>
        <p>The POS tags correctly identify grammatical structure, covering pronouns (PRP), verbs (VBD,
VBN), prepositions (IN), determiners (DT), adjectives (JJ), and nouns (NN).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. The sentiment analysis block research</title>
        <p>For sentiment analysis block BERT-Base model were selected due to high accuracy 93.5% on SST-2
using a minimum number of parameters (110M), which reduces computational costs. This model
demonstrates stability of operation and wide applicability in NLP tasks.</p>
        <p>The experiments conducted in the work consisted of configuring the model for sentiment
analysis, starting from the basic configuration - batch size - 16 (both for training and evaluation),
number of epochs - 3.</p>
        <p>For demonstration purposes, we will use the same sentences that were tested in the section “The
named entity recognition block research ” (Table 3).</p>
        <p>BERTopic provides a sophisticated approach to topic modeling using BERT embeddings that
allow for extracting coherent and semantically rich topics from text data. Systematic experiments
confirm the model's effectiveness, adaptability, and efficiency for text of short length and complexity.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. The topic modeling block research</title>
        <p>A comparative analysis of topic modeling approaches shows that classical methods such as LDA and
NMF offer good interpretability but have limitations when working with short texts and depend on
extensive preprocessing. Unlike these methods, BERTopic leverages transformers and a modified
TFIDF (c-TF-IDF) to generate context-aware topics, making it more potent for dynamic and hierarchical
topic modeling. It enables the adaptive extraction of topics from data streams, which is especially
important for analyzing unstructured text.</p>
        <p>For demonstration purposes, we will use the same sentences that were tested in the section The
named entity recognition block research (Table 4).</p>
        <p>In the future, the marker can solve many problems outside our system, so it is a universal indicator
of mood, emotional coloring, and the context of the text.</p>
        <p>The solved tasks (the named entity recognition task, the sentiment analysis task, the thematic
modeling task) are the main structural elements of the system proposed by the authors in [21]. This
confirms the relevance and necessity of emotion analysis based on textual information in the context
of the development of digital scientific communities (Figure 1). It can be regarded as an example of
applying the developed approach practically.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussions</title>
      <p>Analysis of the obtained result (Figure 5 and Figure 6) shows that batch size 16 balances training
time and model performance well. Smaller batch sizes significantly increased the training time
without a noticeable increase in performance, while larger batch sizes reduced the training time but
slightly worsened the performance. Training on three epochs gave the best overall performance.
Although four epochs showed minor improvements, the additional training time did not justify the
minor gains. Two epochs were not enough for optimal training.</p>
      <p>The NER and POS tagging results show the system's ability to identify named entities and parts
of speech in sentences. Text 1 identified the iPhone as a B-MISC entity (miscellaneous entity
category). The system identified "iPhone" as a different entity because the product name does not fit
into standard categories such as PERSON, ORG, or LOCATION.</p>
      <p>The experimental results (Figure 7 and Figure 8) show that it is possible to ensure high model
performance by adjusting several essential parameters, such as batch size and number of training
epochs. Batch size 16 and training on three epochs balance training time and model performance
well. The graphs show that Epoch = 3 and Batch Size = 16 provide the highest indicators of Accuracy
(0.738), Precision (0.739), Recall (0.742), F1 Score (0.742), and Classification time (432sec). Four epochs
showed insignificant improvements; additional training time did not yield insignificant gains.</p>
      <p>The sentiment analysis results demonstrate the system's ability to identify the overall sentiment
of each sentence (Table 3). For Text 1, despite the presence of positive elements ("excited" and
"outstanding performance"), the overall sentiment is dominated by a negative aspect ("poor battery
life" and "disappointment"), resulting in a negative classification with a high confidence score. Text
2 is a positive sentence, with several positive words that enhance the feeling ("exciting," "victory,"
and "excited"), resulting in a positive classification with a very high confidence score. In Text 3, the
primary emotion conveyed is concern about inadequate climate policies, reflected in negative terms
such as "deeply concerned" and "lack of strong policies," resulting in a negative classification with a
high confidence score.</p>
      <p>Let us explain the results from Table 4 using the example of Text 1 for BERTopic model. The main
topic is apple. The score is 0.4149593710899353. The probability is 0.5898942351341248. The system
identifies "apple" as the dominant topic with a reasonably high score and probability, reflecting the
central focus on the iPhone. Other related topics include "6s", "smartphones," "smartphones," and
"phones," which are related to the discussion of Apple devices. The inclusion of the terms
"discontinued," "5s", "phone," "devices," and "touchscreen" further confirms the classification,
indicating a complete understanding of the context of the Apple product ecosystem.</p>
      <p>The main topic, "Apple," and related terms such as "6s", "smartphones," and "smartphone" reflect
the focus of the sentence on the iPhone and its features. The system's high probability score indicates
a good understanding of the context.</p>
      <p>The last step of the proposed system is to generate a visual marker that will graphically display
the input text's emotion, mood, and topic. Aggregating the results of emotion classification, NER,
sentiment analysis, and topic modeling for each text block is the input data for the marker generation
module.</p>
      <p>The results’ explanation is given using the example of Text 1. Aggregation of all results will look
like (Figure 9):</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>This study presents a comprehensive analysis of methods for evaluating the emotional state of
participants in digital scientific communities based on textual information. The research primarily
focused on topic modeling, sentiment analysis, and named entity recognition (NER).</p>
      <p>The following key results were obtained:

the OntoNotes dataset was expanded by incorporating and annotating textual data from
scientific forums and article comments. The annotation included thematic categories (e.g.,
Machine Learning, NLP, Physics, Medicine) and sentiment labels (positive, negative, neutral);




the performance of state-of-the-art neural network models for emotion classification, NER,
and sentiment analysis was evaluated. The results demonstrated that BERT-Base and
RoBERTa achieved the highest accuracy, while DistilBERT provided a balance between speed
and accuracy;
the optimal training parameters were identified, with batch size = 16 and three training
epochs offering the best trade-off between performance and computational efficiency;
BERTopic outperformed traditional topic modeling approaches (LDA, NMF), confirming the
effectiveness of transformer-based models in topic analysis;
an integrated approach was proposed for combining the results of different NLP tasks
(emotion analysis, NER, sentiment analysis, and topic modeling) into a unified system,
enabling visualization of results and the generation of graphical markers representing
sentiment and discussion topics.</p>
      <p>Future research directions include enhancing multimodal analysis by incorporating audio and
video data from scientific conferences to gain a deeper understanding of the communication context;
integrating the system into digital platforms for the automated analysis of discussions within
scientific communities; investigating the impact of emotional tone on the quality and effectiveness
of scientific discussions, which could help identify the most productive exchanges.</p>
      <p>The obtained results confirm that the combination of modern natural language processing
techniques and machine learning enables effective analysis of scientific communication, improving
researcher interaction and enhancing the productivity of academic discussions.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Batiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dosyn</surname>
          </string-name>
          ,
          <article-title>Implementation of the intellectual system of sentiment analysis and clusterization of publications in the Twitter social network</article-title>
          ,
          <source>Innovative Technologies and Scientific Solutions for Industries</source>
          <volume>1</volume>
          (
          <issue>23</issue>
          ) (
          <year>2023</year>
          )
          <fpage>25</fpage>
          -
          <lpage>44</lpage>
          . doi:
          <volume>10</volume>
          .30837/ITSSI.
          <year>2023</year>
          .
          <volume>23</volume>
          .025.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Shuliak</surname>
          </string-name>
          , T. Kondratieva,
          <article-title>Intention capability and tonality of English news headlines about Ukraine in the context of medialinguistics</article-title>
          , International Humanitarian University Herald.
          <source>Philology</source>
          <volume>56</volume>
          (
          <year>2022</year>
          )
          <fpage>163</fpage>
          -
          <lpage>170</lpage>
          . doi:
          <volume>10</volume>
          .32841/
          <fpage>2409</fpage>
          -
          <lpage>1154</lpage>
          .
          <year>2022</year>
          .
          <volume>56</volume>
          .36.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wankhade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C. S.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <article-title>A survey on sentiment analysis methods, applications, and challenges</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>55</volume>
          (
          <year>2022</year>
          )
          <fpage>5731</fpage>
          -
          <lpage>5780</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10462-022- 10144-1.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Panchendrarajan</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Zubiaga,</surname>
          </string-name>
          <article-title>Synergizing machine learning &amp; symbolic methods A survey on hybrid approaches to natural language processing</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>251</volume>
          (
          <year>2024</year>
          )
          <fpage>124097</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Barkovska</surname>
          </string-name>
          et al.,
          <article-title>Analysis of the impact of the contextual embeddings usage on the text classification accuracy</article-title>
          ,
          <source>Radioelectronic and Computer Systems 2024(3)</source>
          (
          <year>2024</year>
          )
          <fpage>67</fpage>
          -
          <lpage>79</lpage>
          . doi:
          <volume>10</volume>
          .32620/reks.
          <year>2024</year>
          .
          <volume>3</volume>
          .05.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>O.</given-names>
            <surname>Barkovska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kholiev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Havrashenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mohylevskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kovalenko</surname>
          </string-name>
          ,
          <article-title>A conceptual text classification model based on two-factor selection of significant words</article-title>
          ,
          <source>in: COLINS (2)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>244</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Maksymenko</surname>
          </string-name>
          , et al.,
          <article-title>Improving the machine translation model in specific domains for the Ukrainian language</article-title>
          ,
          <source>in: 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>129</lpage>
          . doi:
          <volume>10</volume>
          .1109/CSIT56902.
          <year>2022</year>
          .
          <volume>10000529</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>O. J.</given-names>
            <surname>Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dogra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Diwakar</surname>
          </string-name>
          ,
          <article-title>A systematic review of NLP methods for sentiment classification of online news articles</article-title>
          , in: 2023 14th International Conference on
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>