<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Context-enriched approach to students depression monitoring in education using BERT-GPT hybrid model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olexander Mazurets</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman Vit</string-name>
          <email>vit.roman.vit@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maryna Molchanova</string-name>
          <email>m.o.molchanova@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Illia Tymofiiev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olena Sobko</string-name>
          <email>olenasobko.ua@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Khmelnytskyi National University</institution>
          ,
          <addr-line>11, Instytuts'ka str., Khmelnytskyi, 29016</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>167</fpage>
      <lpage>176</lpage>
      <abstract>
        <p>The paper proposes a new context-enriched approach to students' depression monitoring in education using a BERT-GPT hybrid model. The neural network, which is the basis of the approach, is trained taking into account the loss function, which is optimised simultaneously for both submodels, and the pre-trained weights of BERT and GPT-2 are updated during fine-tuning, taking into account the specifics of psycho-emotional text patterns. Thus, the model not only detects the presence of a depressive state, but is also enriched with educational and contextual features of the language. The developed approach increases the accuracy of detecting depressive states by at least 0.0525 compared to the closest analogues, and on average by 0.14, which indicates the superiority of the combined architecture in the binary classification task. The achieved accuracy at the level of 0.99 indicates the efectiveness of combining contextual features of two transformer-type models, which provides deeper modelling of semantic and syntactic information in texts. The proposed approach promotes early identification of psycho-emotional risks among students, which allows for the timely implementation of preventive measures and psychosocial support. For the education sector, this means increasing the level of academic success, reducing the number of cases of emotional burnout and improving the overall mental well-being of the student community, which in the long term contributes to the formation of a more sustainable, healthy and productive educational environment.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;neural networks</kwd>
        <kwd>GPT2</kwd>
        <kwd>BERT</kwd>
        <kwd>student depression</kwd>
        <kwd>education</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The mental health of students is gaining increasing attention in the context of ensuring quality education
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and supporting academic success [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Depressive states and emotional burnout associated with
academic workload and social challenges significantly afect cognitive functions [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], motivation and
general well-being of students [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Traditional methods of diagnosing depression, based on surveys and
clinical interviews [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], often turn out to be insuficiently operational or unsuitable for mass screening
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In this regard, there is a need to develop automated methods for detecting and classifying signs
of depression based on the analysis of text data that students leave in social networks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], forums
or specialised platforms [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Modern approaches to natural language processing, particularly neural
architectures based on transformers, demonstrate efectiveness in classifying mood, emotions [ 9], and
mental states [10]. However, to increase the accuracy of detecting depressive manifestations in an
educational context, it is important to consider both the text’s lexical features and contextual information
that forms a deeper understanding of the user’s psychological state. Combining models that integrate
context-enriched analysis with generative capabilities opens up new horizons in automatic recognition
of depression.
      </p>
      <p>The paper aims to develop and test a method for context-enriched student depression monitoring in
education using a BERT-GPT hybrid model. The paper’s main contributions are the combination of a
two-stream neural network architecture, which simultaneously uses the BERT model for deep syntactic
and contextual understanding of the text and the GPT-2 model, which provides semantic enrichment
through the transformer’s generative properties.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>Recently, machine learning and natural language processing methods have been widely used to detect
signs of depression.</p>
      <p>Scientists [11] study the problem of depression among students as a factor that negatively afects
their academic performance, social functioning, and future professional prospects. The authors apply
machine learning methods to predict the likelihood of depressive states, considering personal, academic,
and behavioural variables. A comparison of several models showed that logistic regression provides the
best accuracy (85%) with balanced metrics of precision, completeness, and 1-measure. The SMOTE
method was used to solve the class imbalance problem, which increased the model’s sensitivity to
underrepresented cases. The results demonstrate the potential of using predictive analytics to develop
targeted psychosocial interventions within educational institutions, and also confirm the efectiveness
of machine learning in identifying psycho-emotional risks in the student environment.</p>
      <p>In a study analysing data from Reddit, the platform is seen as an important complement to the
traditional healthcare system due to the speed of exchange of ideas, the versatility of emotional
expression, and the use of medical terminology. Analysis of comments and posts with suicidal intent
using NLP confirmed that subreddits are reliable online resources for obtaining help and providing
reliable data on the mental state of users. The use of machine learning algorithms such as Naive Bayes,
SVM, logistic regression, and random forest showed the efectiveness of identifying individuals at risk,
where logistic regression achieved 77.29% accuracy and an 1 score of 0.77 [12].</p>
      <p>Another study focused on sentiment analysis based on Twitter data using classical approaches such
as TF-IDF, Bag of Words, and Multinomial Naive Bayes. These methods, applied to real-time tweets,
demonstrated accuracy that allows them to be considered as additional tools for diagnosing depression,
with the possibility of adapting to diferent languages [13].</p>
      <p>The authors [14] investigated the problem of detecting depression among students based on data from
the Chinese social network Weibo. The authors proposed a hybrid approach that combines text features
obtained through pre-trained BERT with manually calculated user meta-features. Based on these two
groups of features, several “multimodal fusion” strategies were implemented, among which the best
results were demonstrated by the late fusion method at the logistic regression level. The maximum
accuracy of the model reached 93.75%, which indicates the efectiveness of involving the structured
user context as a complement to text analysis.</p>
      <p>In [15], which focuses on improving the diagnosis of depression using machine learning and NLP,
the dificulties of detecting depressive states in the presence of comorbid disorders, such as
posttraumatic stress disorder, are emphasised. Using data cleaning procedures, feature selection and
classifier comparison based on the DAIC-WOZ set, it was shown that the Random Forest and XGBoost
models achieve about 84% accuracy, significantly exceeding the SVM performance (72%).</p>
      <p>The study [16] aimed to identify the risk of anxiety and depressive disorders among US college students
by building predictive machine learning models. The authors use a large array of empirical data for
modelling based on the XGBoost, Random Forest, Decision Tree and logistic regression algorithms.
According to the results, the highest classification quality was achieved with AUC = 0.77, indicating
the models’ stable discriminatory power. The proposed approach has the potential to be implemented
in educational counselling as a tool for early detection of mental disorders, allowing for preventive
measures before the development of clinical symptoms.</p>
      <p>Another study is devoted to the early detection of depression among cancer patients through the
analysis of messages from a secure portal. Classifiers built based on logistic regression, support vector
machines and BERT transformer models, trained on a large corpus of messages, showed high eficiency.
The BERT and RedditBERT models outperformed other algorithms in terms of AUROC (88% and
86%, respectively). In addition, the obtained predictions correlated with diagnoses and treatment of
depression, which confirms the practical value of such models in clinical applications [17].</p>
      <p>Despite significant progress in applying machine learning and natural language processing methods to
detect depressive states, several scientific and applied problems remain open. Most existing approaches
often ignore the importance of complex contextual information, which is particularly important in
the student environment due to the specifics of the educational process, linguistic and behavioural
features of communication, and the dynamics of the psycho-emotional state during the academic year.
Considering these factors, particularly through a combination of semantic and sequential aspects of the
text, allows modelling deeper behaviour and mood patterns. Based on the combined neural architecture
BERT-GPT, the proposed approach will provide such a context-enriched representation of text that
will increase the accuracy and reliability of the binary classification of depressive states in the student
environment.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method design</title>
      <p>The approach implements cognitive-semantic integration, which is efective in emotional monitoring
tasks in the educational environment, allowing for the identification of psychoemotional risks of
depression at early stages with high accuracy. The scheme and stages of the approach are shown in
ifgure 1.</p>
      <p>The approach to context-based detection of depressive states in students in an educational
environment consists of transforming input text data using a trained two-stream neural network model
BERT-GPT [18], which provides integrated syntactic-semantic processing to obtain a numerical score
reflecting the probability of depressive manifestations in a student.</p>
      <p>At the first stage, responsible for pre-processing, the input text of student self-expression undergoes
parallel tokenisation using the BERT and GPT-2 models. This provides a two-vector representation of
the input data: taking into account the syntactic structure (via BERT) and the semantic context (via
GPT-2) [19], which is critically important for capturing non-obvious manifestations of depressive states.</p>
      <p>At the second stage, a separate data pass is performed through each model. The BERT model extracts
syntactic features, especially focusing on grammatical patterns and formal language constructions
[20], which may indicate a cognitive change. While GPT-2 analyses the text’s deep semantic space,
[21] allows it to consider hidden emotional content, allusions, and the style of speech characteristic of
afective disorders.</p>
      <p>At the third stage, the vectors of the output features from the two streams are combined into a single
representative vector through the concatenation layer. This combined vector is passed to the dense
layer, which performs logistic regression using sigmoid activation. The obtained probability value is
interpreted as an assessment of the degree of depression of the student in the educational process,
expressed in percentage format. Accordingly, the output data is a percentage assessment of depression.</p>
      <p>Since the key part of the methodology is the trained neural network model, it is advisable to provide
an algorithm 1 for its training.</p>
      <p>The model was trained on a curated dataset of student-authored texts, where each sample was
Algorithm 1 Fine-tuning BERT-GPT2 Architecture for Depression Prediction
Let:
1. Initialize pretrained BERT and GPT-2 models with frozen vocabulary.
2. For each training sample :
a) Tokenize  using both BERT and GPT-2 tokenizers.
b) Encode  via both encoders:</p>
      <p>ℎ ←  (), ℎ ←  ()
c) Extract [CLS] representation from BERT and first-token from GPT-2:</p>
      <p>← concat(ℎ [], ℎ[0])
d) Compute prediction:</p>
      <p>ˆ ← ( ⊤ + )
e) Compute loss:</p>
      <p>ℓ ← ℒ(ˆ , )
3. Backpropagate gradients and update parameters  ,  ,  ,  using Adam optimizer.
4. Iterate over mini-batches until convergence criteria or maximum epochs reached.
Validation: Evaluate trained model on held-out validation set using: accuracy, precision, recall,
1score, confusion matrix.
labelled with a binary indicator of depressive tendencies. The dataset was partitioned into training and
validation subsets in a 90/10 ratio, ensuring class distributional balance. During training, the model
optimised the parameters of the BERT and GPT-2 encoders and the final linear classification layer using
the Adam optimiser [22]. The objective was to minimise the binary cross-entropy loss between the
predicted probabilities and ground-truth labels. Fine-tuning was performed iteratively in mini-batches,
with convergence monitored via validation performance metrics such as accuracy, precision, recall, and
1-score [23].</p>
      <p>Thus, the methodology and algorithm for obtaining a trained context-based two-stream architecture
for neural network analysis, which combines syntactic and semantic text processing for monitoring
depressive manifestations in students, are presented.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>This paper utilises the “Student-Depression-Text dataset” [24], an annotated corpus comprising 7.489
English-language textual samples from social media platforms, including user comments from Facebook
and related forums. The dataset is structured in Excel format and includes the following attributes: raw
text content, binary class label (0 = non-depressed, 1 = depressed/anxious), participant age, age group,
and gender. The dataset was balanced to avoid bias [25]. All contributors to the dataset are students
aged between 15 and 17, fluent in English, and representative of an adolescent demographic particularly
vulnerable to afective disorders.</p>
        <p>Each text entry is manually labelled as either indicative of a normal psychological state or suggestive
of depressive or anxiety-related expression. These annotations were designed to reflect linguistic cues
associated with underlying emotional distress. Including demographic variables enables the exploration
of correlations between mental health indicators and age or gender, thereby ofering a richer context
for modelling.</p>
        <p>The dataset is the foundation for training a dual-stream neural architecture to identify signs of
depression in educational contexts based on student-generated textual data.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Experimental software</title>
        <p>The experimental software prototype for automated detection of depressive tendencies in educational
contexts was implemented using Python [26] and specialised natural language processing and machine
learning libraries. The system integrates pretrained transformer-based models to extract syntactic and
semantic features from student-generated textual data.</p>
        <p>The core functionality was deployed as a web-based application using the Flask microframework,
enabling an accessible RESTful interface for model inference and interaction. Model development and
ifne-tuning were conducted using PyTorch as the primary deep learning framework [ 27], with
transformer models sourced via the Hugging Face Transformers library [28]. Feature extraction leveraged
the “BertTokenizer”, “BertModel”, “GPT2Model” components for obtaining contextualized embeddings
[29].</p>
        <p>Data preprocessing and management were facilitated through Pandas [30] and NumPy [31], which
supported structured manipulation of input samples and eficient tensor operations during training and
evaluation. The final application allows users to submit input text and receive probabilistic feedback on
the presence of depressive markers based on learned linguistic patterns. The interface of the developed
software prototype is shown in figure 2.</p>
        <p>This modular software system serves as a functional prototype and an experimental platform for
evaluating the efectiveness of transformer-based feature fusion in mental health detection tasks.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Result and discussion</title>
      <p>Initially, an experiment was conducted to investigate the influence of training hyperparameters on the
performance of the developed neural model. Table 1 presents the comparative results of four
configurations based on a combined BERT-GPT2 architecture, each trained with diferent hyperparameter
settings. The evaluation employed standard classification metrics, including accuracy, precision, recall,
and 1-score. This analysis aimed to identify optimal combinations of learning rate, batch size, number
of epochs, and input sequence length for eficient model convergence and robust generalisation.
“Gpt_Bert3” model demonstrated the highest performance across all metrics, suggesting that a lower
learning rate and moderate batch size may contribute positively to generalisation. The results underline
the sensitivity of transformer-based models to sequence length and learning rate, which should be
carefully tuned in downstream mental health prediction tasks.</p>
      <p>However, the rest of the results are also quite high, indicating all trained models’ ability to identify
depressive states associated with learning in educational institutions correctly.</p>
      <p>The best model was also tested on a labeled sample of over 7000 text samples, the result is shown in
ifgure 3 in the form of confusion matrix.</p>
      <p>As the confusion matrix shows, the neural network has a minimum rate of false positives.</p>
      <p>The developed approach was also compared with known analogues, the results are given in table 2.
The proposed model showed the highest accuracy values (0.99) and 1-measure (0.98) among the
compared approaches. In the study [17], the BERT and RedditBERT models achieved accuracies of 0.88
and 0.86, respectively, which are 0.11 and 0.13 lower. Logistic regression in the student environment
[11] showed an accuracy of 0.86, which is 0.13 lower. In the Reddit study with the analysis of suicidal
intentions [12], these indicators were 0.77, which is 0.22 lower. In the case of the multimodal approach
with Weibo [14], the model showed 0.9375 accuracy, which is 0.0525 lower. The classical algorithms
Random Forest and XGBoost [15] achieved an accuracy of 0.84, which is 0.15 lower. The results indicate
the superiority of the combined BERT-GPT2 architecture in binary classification of depressive states.</p>
      <p>However, the given approach has limitations. The main limitation is the language – currently, the
neural network works only with English-language texts. Another limitation concerns the length of the
input sequence. Within the framework of the study, the maximum number of input tokens is 128.</p>
      <p>Further research will aim to continue training the neural network of the dual architecture with
diferent parameters, such as the number of epochs, batch size, and learning rate, to reduce error and
work with languages other than English.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The paper proposes a new context-enriched approach to students’ depression monitoring in education
using a BERT-GPT hybrid model. The neural network, which is the basis of the approach, is trained
taking into account the loss function, which is optimised simultaneously for both submodels, and the
pre-trained weights of BERT and GPT-2 are updated during fine-tuning, taking into account the specifics
of psycho-emotional text patterns. Thus, the model not only detects the presence of a depressive state,
but is also enriched with educational and contextual features of the language.</p>
      <p>The developed approach increases the accuracy of detecting depressive states by at least 0.0525
compared to the closest analogues and on average by 0.14, which indicates the superiority of the
combined architecture in the binary classification task. The achieved accuracy at the level of 0.99
indicates the efectiveness of combining contextual features of two transformer-type models, which
provides deeper modelling of semantic and syntactic information in texts.</p>
      <p>The proposed approach promotes early identification of psycho-emotional risks among students,
which allows for the timely implementation of preventive measures and psychosocial support. For the
education sector, this means an increase in students’ academic performance. It should be noted that
the constructed model for monitoring students’ depression has not been tested in the real educational
process to assess its efectiveness using mathematical statistics methods.</p>
      <p>In addition, the conclusion that implementing the proposed system will improve academic
performance is too optimistic since diagnosis does not yet mean that students will receive adequate assistance
to overcome existing problems within the study period. Additional experimental studies are needed to
confirm this conclusion.</p>
    </sec>
    <sec id="sec-7">
      <title>Author contributions</title>
      <p>Conceptualization, Olexander Mazurets; methodology, Roman Vit and Maryna Molchanova; formulation
of tasks analysis, llia Tymofiiev and Olena Sobko; software, Roman Vit and Maryna Molchanova;
writing–original draft, Roman Vit and Maryna Molchanova; analysis of results, Olexander Mazurets
and Olena Sobko; visualization, llia Tymofiiev; reviewing and editing, Olexander Mazurets. All authors
have read and agreed to the published version of the manuscript.</p>
    </sec>
    <sec id="sec-8">
      <title>Funding</title>
      <sec id="sec-8-1">
        <title>This research received no external funding.</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Data availability statement</title>
      <p>No new data were created or analysed during this study. Data sharing is not applicable.</p>
    </sec>
    <sec id="sec-10">
      <title>Conflicts of interest</title>
      <sec id="sec-10-1">
        <title>The authors declare no conflict of interest.</title>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Acknowledgments</title>
      <p>The authors express their gratitude to Tetiana Vakaliuk, Viacheslav Osadchyi and Olha Pinchuk to
scientific and organizational support of 4th Workshop on Digital Transformation of Education 2025.</p>
    </sec>
    <sec id="sec-12">
      <title>Declaration on Generative AI</title>
      <sec id="sec-12-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[9] O. Zalutska, M. Molchanova, O. Sobko, O. Mazurets, O. Pasichnyk, O. Barmak, I. Krak, Method for
Sentiment Analysis of Ukrainian-Language Reviews in E-Commerce Using RoBERTa Neural
Network, in: V. Lytvyn, A. Kowalska-Styczen, V. Vysotska (Eds.), Proceedings of the 7th International
Conference on Computational Linguistics and Intelligent Systems. Volume I: Machine Learning
Workshop, Kharkiv, Ukraine, April 20-21, 2023, volume 3387 of CEUR Workshop Proceedings,
CEUR-WS.org, 2023, pp. 344–356. URL: https://ceur-ws.org/Vol-3387/paper26.pdf.
[10] Ş. H. Gemicioğlu, M. Sönmez, G. Kara, B. Bozer, Ö. Güder, I. Oztürk, Examination of depression,
anxiety, and stress in nursing students receiving distance education during the COVID-19 pandemic,
Journal of Psychiatric Nursing 16 (2025) 38–47. doi:10.14744/phd.2025.26566.
[11] A. O. Hassan, I. M. Jamal, S. D. Ahmed, A. U. Abdullahi, Predicting Student Depression using
Machine Learning: A Comparative Analysis of Machine Learning Algorithms for Early Depression
Detection in Students, AITU SCIENTIFIC RESEARCH JOURNAL 4 (2025) 28–35. doi:10.63094/
AITUSRJ.25.4.1.4.
[12] P. Jain, K. R. Srinivas, A. Vichare, Depression and suicide analysis using machine learning and
NLP, in: Journal of Physics: Conference Series, volume 2161, IOP Publishing, 2022, p. 012034.
doi:10.1088/1742-6596/2161/1/012034.
[13] A. Mali, R. R. Sedamkar, Prediction of Depression Using Machine Learning and NLP Approach, in:
V. E. Balas, V. B. Semwal, A. Khandare (Eds.), Intelligent Computing and Networking, volume 301
of Lecture Notes in Networks and Systems, Springer Nature Singapore, Singapore, 2022, pp. 172–181.
doi:10.1007/978-981-16-4863-2_15.
[14] Y. Luo, Z. Ye, R. Lyu, Detecting student depression on Weibo based on various multimodal fusion
methods, in: S. U. Jan (Ed.), Fourth International Conference on Signal Processing and Machine
Learning (CONF-SPML 2024), volume 13077, International Society for Optics and Photonics, SPIE,
2024, p. 130770M. doi:10.1117/12.3027177.
[15] G. Lorenzoni, C. Tavares, N. Nascimento, P. Alencar, D. Cowan, Assessing ML classification
algorithms and NLP techniques for depression detection: An experimental case study, PloS one 20
(2025) e0322299. doi:10.1371/journal.pone.0322299.
[16] Y. Zhai, Y. Zhang, Z. Chu, B. Geng, M. Almaawali, R. Fulmer, Y.-W. D. Lin, Z. Xu, A. D. Daniels,
Y. Liu, Q. Chen, X. Du, Machine learning predictive models to guide prevention and intervention
allocation for anxiety and depressive disorders among college students, Journal of Counseling &amp;
Development 103 (2025) 110–125. doi:10.1002/jcad.12543.
[17] M. M. van Buchem, A. A. de Hond, C. Fanconi, V. Shah, M. Schuessler, I. M. Kant, E. W. Steyerberg,
T. Hernandez-Boussard, Applying natural language processing to patient messages to identify
depression concerns in cancer patients, Journal of the American Medical Informatics Association
31 (2024) 2255–2262. doi:10.1093/jamia/ocae188.
[18] M. Salıcı, Ü. E. Ölçer, Impact of Transformer-Based Models in NLP: An In-Depth Study on BERT
and GPT, in: 2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP),
2024, pp. 1–6. doi:10.1109/IDAP64064.2024.10710796.
[19] O. V. Barmak, O. V. Mazurets, I. V. Krak, A. I. Kulias, L. E. Azarova, K. Gromaszek, S. Smailova,
Information technology for creation of semantic structure of educational materials, in: R. S.
Romaniuk, M. Linczuk (Eds.), Photonics Applications in Astronomy, Communications, Industry,
and High-Energy Physics Experiments 2019, volume 11176, 2019, p. 1117623. doi:10.1117/12.
2537064.
[20] N. M. Gardazi, A. Daud, M. K. Malik, A. Bukhari, T. Alsahfi, B. Alshemaimri, BERT applications in
natural language processing: a review, Artificial Intelligence Review 58 (2025) 166. doi: 10.1007/
s10462-025-11162-5.
[21] G. Bharathi Mohan, R. Prasanna Kumar, S. Parathasarathy, S. Aravind, K. B. Hanish, G. Pavithria,
Text Summarization for Big Data Analytics: A Comprehensive Review of GPT 2 and
BERT Approaches, Springer Nature Switzerland, Cham, 2023, pp. 247–264. doi:10.1007/
978-3-031-33808-3_14.
[22] Adam, 2025. URL: https://keras.io/api/optimizers/adam/.
[23] G. Naidu, T. Zuva, E. M. Sibanda, A Review of Evaluation Metrics in Machine Learning Algorithms,
in: R. Silhavy, P. Silhavy (Eds.), Artificial Intelligence Application in Networks and Systems, volume
724 of Lecture Notes in Networks and Systems, Springer International Publishing, Cham, 2023, pp.
15–25. doi:10.1007/978-3-031-35314-7_2.
[24] N. Yadav, Student-depression-text, 2023. URL: https://www.kaggle.com/datasets/nidhiy07/
student-depression-text.
[25] O. Sobko, O. Mazurets, M. Molchanova, I. Krak, O. Barmak, Method for analysis and formation of
representative text datasets, in: T. Hovorushchenko, E. Zaitseva, S. Lysenko, V. G. Levashenko (Eds.),
Proceedings of the 1st International Workshop on Advanced Applied Information Technologies
with CEUR-WS, Khmelnytskyi, Ukraine, Zilina, Slovakia, December 5, 2024, volume 3899 of CEUR
Workshop Proceedings, CEUR-WS.org, 2024, pp. 84–98. URL: https://ceur-ws.org/Vol-3899/paper9.
pdf.
[26] Python, 2025. URL: https://www.python.org/.
[27] M. Molchanova, V. Didur, O. Mazurets, O. Sobko, O. Zakharkevich, Method for Construction
and Demolition Waste Classification Using Two-Factor Neural Network Image Analysis, in:
N. Shakhovska, A. T. Augousti, S. Liaskovska, O. Duran (Eds.), Proceedings of the 2nd International
Conference on Smart Automation &amp; Robotics for Future Industry (SMARTINDUSTRY 2025), Lviv,
Ukraine, April 3-5, 2025, volume 3970 of CEUR Workshop Proceedings, CEUR-WS.org, 2025, pp.
168–182. URL: https://ceur-ws.org/Vol-3970/PAPER14.pdf.
[28] Hugging face, 2025. URL: https://huggingface.co/docs/transformers/index.
[29] I. Krak, O. Zalutska, M. Molchanova, O. Mazurets, E. Manziuk, O. Barmak, Method for neural
network detecting propaganda techniques by markers with visual analytic, in: A. Pakstas, Y. P.
Kondratenko, V. Vychuzhanin, H. Yin, N. Rudnichenko (Eds.), Proceedings of the 12th International
Conference Information Control Systems &amp; Technologies (ICST 2024), Odesa, Ukraine, September
23-25, 2024, volume 3790 of CEUR Workshop Proceedings, CEUR-WS.org, 2024, pp. 158–170. URL:
https://ceur-ws.org/Vol-3790/paper14.pdf.
[30] pandas, 2025. URL: https://pandas.pydata.org/.
[31] NumPy, 2025. URL: https://numpy.org/.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <article-title>Association between healthy lifestyle choices and mental health among students: a cross sectional study</article-title>
          ,
          <source>BMC Public Health</source>
          <volume>25</volume>
          (
          <year>2025</year>
          )
          <article-title>247</article-title>
          . doi:
          <volume>10</volume>
          .1186/s12889-025-21482-4.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Madrid-Cagigal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kealy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Mulvenna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Byrne</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Barry</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Donohoe, Digital Mental Health Interventions for University Students With Mental Health Dificulties: A Systematic Review</article-title>
          and
          <string-name>
            <surname>Meta-Analysis</surname>
          </string-name>
          ,
          <source>Early intervention in psychiatry 19</source>
          (
          <year>2025</year>
          )
          <article-title>e70017</article-title>
          . doi:
          <volume>10</volume>
          .1111/eip.70017.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. U.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Abas</surname>
          </string-name>
          , F. ur
          <string-name>
            <surname>Rehman</surname>
          </string-name>
          , S. Khan,
          <article-title>Mental health issues: Stress, anxiety, and depression in diploma and degree health care students</article-title>
          ,
          <source>Journal of Medical &amp; Health Sciences Review</source>
          <volume>2</volume>
          (
          <year>2025</year>
          )
          <fpage>3453</fpage>
          -
          <lpage>3471</lpage>
          . doi:
          <volume>10</volume>
          .62019/5vm15e38.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <article-title>Academic burden and emotional problems among adolescents: A longitudinal mediation analysis</article-title>
          ,
          <source>Journal of Adolescence</source>
          <volume>97</volume>
          (
          <year>2025</year>
          )
          <fpage>989</fpage>
          -
          <lpage>1001</lpage>
          . doi:
          <volume>10</volume>
          .1002/jad.12471.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.-R.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Real-time monitoring to predict depressive symptoms: study protocol</article-title>
          ,
          <source>Frontiers in Psychiatry</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <article-title>1465933</article-title>
          . doi:
          <volume>10</volume>
          .3389/fpsyt.
          <year>2024</year>
          .
          <volume>1465933</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Vu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yamamoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Watanabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kuriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N. H.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Araki</surname>
          </string-name>
          , et al.,
          <article-title>Prediction of depressive disorder using machine learning approaches: findings from the NHANES, BMC Medical Informatics and Decision Making 25 (</article-title>
          <year>2025</year>
          )
          <article-title>83</article-title>
          . doi:
          <volume>10</volume>
          .1186/s12911-025-02903-1.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W. A.</given-names>
            <surname>Osman</surname>
          </string-name>
          ,
          <article-title>Social media use and associated mental health indicators among University students: a cross-sectional study</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>15</volume>
          (
          <year>2025</year>
          )
          <article-title>9534</article-title>
          . doi:
          <volume>10</volume>
          .1038/s41598-025-94355-w.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>I. Levkovich</surname>
          </string-name>
          ,
          <article-title>Is artificial intelligence the next co-pilot for primary care in diagnosing and recommending treatments for depression?</article-title>
          ,
          <source>Medical Sciences</source>
          <volume>13</volume>
          (
          <year>2025</year>
          )
          <article-title>8</article-title>
          . doi:
          <volume>10</volume>
          .3390/ medsci13010008.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>