<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chenghao Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaobing Zhou</string-name>
          <email>zhouxb@ynu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Information Science and Engineering, Yunnan University</institution>
          ,
          <addr-line>Kunming 650500, Yunnan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays, all kinds of instant message applications have become integral parts of our daily lives. With the quickening pace of society and the increasing work pressure, more and more people suffer from mental disorders such as eating disorders, depression, and unknown disorders. In MentalRiskES 2023, participants tried to detect mental disorder risk early in Spanish, where the corpora are attained from the chat message records of Telegram. This paper describes the participation of the group GetitDone on the 2a subtask. Our team uses BETO, also called Spanish BERT, pretrained on a large Spanish corpus, as our base model. We put efforts into preprocessing the given data and making changes to the classification part of the model. depression detection, disorder detection, dataset preprocessing, BERT, RoBERTa</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Over the past decade, instant messaging apps have gradually become integral to people's lives.
Larger and larger message streams contain more information than it seems. Though multimedia
message has been a significant form of expression, which can also be used in multi-model sentiment
analysis, the text is still the most important carrier of human language and is the primary way we express
ourselves. For example, when we send a message to our friend, it may contain some subconscious
thoughts you will never notice. The major goal of this task is to capture potential mental disorders
through the chats between users, which means early detection. It can be noticed that some relevant
evaluation campaigns [1] have been held previously, but almost all of them are for English. Hardware
limitation, unfamiliarity with Spanish, and lack of corpus are challenges for our group.</p>
      <p>The importance and urgency of mental health have become increasingly recognized in today's
society. Mental health issues, such as depression, profoundly impact individuals. The prevalence of
mental health disorders is alarmingly high, with millions worldwide affected by these conditions. The
COVID-19 pandemic has further highlighted the significance of mental health [2]. Early detection and
intervention are crucial in effectively addressing mental health issues, especially for those who have
already developed mental health problems but are unaware of them. Identifying signs and symptoms
early on allows for timely support, treatment, and prevention of further deterioration. It can help
individuals regain control over their lives.</p>
      <p>It is worth noting that in addition to the prediction results to be submitted, it is also required to submit
information about CO2 emissions, which puts further requirements on the performance of the algorithm
and model. Our group works on the binary classification task of depression detection, which is subtask
2a. Unlike conventional sentiment analysis, whose training and testing data are just some sentences or
paragraphs, the data provided here is a set of chat messages, which can be regarded as a series of related</p>
      <p>2023 Copyright for this paper by its authors.
sentences. This paper uses some techniques to preprocess the training data and organize the data before
training.</p>
      <p>In this paper, we first mention some related work on depression detection and carbon emissions of
AI models and then introduce the dataset used in the pre-trained model and some characteristics of the
given dataset. In this section, the preprocessing method this paper used is also introduced. The model
section describes the model we proposed in detail, including the structure of the model and some
hyperparameters used in the training. Lastly, we analyze the results from three aspects, draw a
conclusion about our work, and put forward a few directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>The detection of depression in online user-generated content has gained significant attention in
recent years. Various studies have explored different approaches and techniques to address this
challenge. Early research focused on using linguistic features and sentiment analysis to identify
depressive symptoms in text data. Nadeem et al. [3] employed machine learning algorithms to classify
depression-related posts on online forums based on lexical and syntactic patterns. Similarly, Gautam,
and Yadav [4] utilized sentiment analysis to detect depressive language patterns in Twitter data.</p>
      <p>With the advancements in natural language processing (NLP) and deep learning, researchers started
leveraging pretrained language models for depression detection. For instance, Bucur et al. [5] applied a
fine-tuned BERT model to classify depression-related text obtained on Reddit. Additionally, Wolk et
al. [6] explored the use of GPT for identifying depression symptoms in social media posts.</p>
      <p>Another research interest involves analyzing social network structures and behavioral patterns. Islam
et al. [7] investigated the impact of social network characteristics on depression detection, highlighting
the importance of social connections in predicting mental health conditions.</p>
      <p>Furthermore, researchers have explored integrating multimodal data, such as text, images, and audio,
to enhance depression detection. Jahan et al. [8] proposed a multimodal approach that combined images,
videos, and text to improve the accuracy of depression detection in Twitter and Facebook posts.</p>
      <p>Overall, the existing literature showcases a wide range of methodologies, including linguistic
analysis, deep learning models, social network analysis, and multimodal fusion, for detecting
depression in online user-generated content.</p>
      <p>When we searched for relevant papers, we found that fewer studies have been conducted on
depression detection in Spanish compared to more widely spoken languages like English, which make
the MentalRiskES evaluation campaigns more meaningful. This paper aims to propose a fast and
resource-efficient method to complete depression detection.</p>
      <p>Carbon emissions associated with AI have become a topic of concern in recent years. The rapid
growth of AI applications, especially deep learning models, requires significant computational
resources that contribute to increased energy consumption and, in turn, carbon emissions. A study by
Strubell et al. [9] analyzed the carbon footprint of training large language models. ECO2AI [10] can
also be used to track the carbon emissions of machine learning models.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>The provided training dataset is separated by subject. Every subject file contains the conversation
text between the target subject, whom we'd like to predict whether suffers from depression or not, and
other people. Each message in the conversation is also attached with an ID number and a datetime,
which makes every sentence identical and chronological.</p>
      <p>As with conventional text cleaning, we especially take care of hashtags, special symbols (e.g.,
currency symbols), URLs, emoji, and repetitive characters first.</p>
      <p>When checking the training dataset, we found some of the messages are so tricky because of their
large scale in length. Figure 1 shows the frequency of sentences in terms of the number of words in the
sentence. The maximum length of the message sentence is 4279, which has about 750 words. Figure 2
shows the method we used to handle this situation.</p>
      <sec id="sec-3-1">
        <title>Subject file</title>
      </sec>
      <sec id="sec-3-2">
        <title>Processed batches</title>
        <p>messages
with
different
length</p>
        <p>Generally speaking, we clip and join the message into the same length first, then pad the last line
with padding tokens. Additionally, in the process of clipping and joining, we pick up a random
percentage of messages at the beginning of every subject file. For example, in Figure 2, there are five
messages in a certain subject file. We pick up all of the messages to produce the first batch and 60% to
produce the second one.</p>
        <p>The reason we clip and join these messages is as follows: We believe that the information about the
depressions does not distribute evenly across messages, so we try to help the model to focus more on
the information that actually reflects whether or not the subject is depressed. The clipping and joining
operation could relatively increase the density of this kind of information.</p>
        <p>In the testing phase, we build up and append a subject file for every message we received. And when
making a prediction on a certain subject, we will put the messages received in the current round together
with the ones received previously and then feed them into the model.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Model</title>
      <p>Our model is based on the model in Python library pysentimiento [11], specifically, the pretrained
model robertuito-sentiment-analysis, which was trained based on a model called RoBERTuito [12].
This model is based on RoBERTa [13] and trained on a Spanish tweets corpus TASS 2020 [14].</p>
      <p>The architecture of the model we proposed is shown in Figure 3. For the tokenizer part, we use the
default setting in the pretrained model robertuito-sentiment-analysis, which distinguishes 30002 kinds
(including the special tokens such as [PAD], [SEP], etc.) of word index. The encoder is the primary
improved part of our proposed model architecture. It begins with a 12-head self-attention, followed by
a GRU, and finally, a fully connected layer. As for the classifier, we use a fully connected layer, a
dropout layer and a linear layer.</p>
      <p>Processed batch</p>
      <p>Classifier</p>
      <p>RoBERTuito
tokenizer
……
input index</p>
      <p>mask
×12
self-attention
self-attention
self-attenGtiRonU</p>
      <p>GRU
GRU
full-connected
full-connected
full-connected</p>
      <p>Encoder</p>
      <p>In the classification part, we use a three-class classifier, the same as conventional sentiment analysis
has done. For this task, the sum of neutral and positive probabilities is considered the non-depressed
probability, and the negative probability is considered the depressed probability. When we process a
new message using the method mentioned above, if the depressed probability exceeds the
nondepressed probability, which means the depressed probability is greater than 50 percent, we report the
depression immediately.</p>
      <p>The following is a detailed description of the model's details. After getting the tokens of every
sentence, we will put the input ids and attention mask into the embedding part. In the embedding part,
we use word embedding, position embedding, and type embedding. All of the embedding dims are
taken to be 768, while the size of the dictionary is set to 30002, 130, and 1, respectively, which means
the max length of a sequence is 128. Next is an LN layer, whose eps is set to 10­­-12, and a dropout
layer, whose probability is set to 0.1.</p>
      <p>As for the encoder part, every self-attention part is composed of attention, recurrent and intermediate.
Table 1 shows the details of the encoder. Please check out the code for our model on GitHub [15] for
more information.</p>
      <p>In the fine-tuned training phase, we use a common set of configurations combined with our actual
hardware conditions, including a batch size of two, an optimizer of AdamW, and a learning rate of
0.00002. It is hard to determine the number of epochs during the training. We first try to train for twenty
epochs, but the accuracy starts to fall after about fifteen epochs, so finally twelve epochs are believed
to be best for our model.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>The tasks in MentalRiskES are all about the online problem of mental disorder detection, so our
group focuses more on the speed and latency of the detection. According to the official results [16], in
task 2a, we get an excellent result, ranking first, on latencyTP and speed, which is shown in Table 2.
For the application scenario of this task, we believe that early detection is even more important than
accuracy because people should learn whether they have depression or not as soon as possible. The
results have proved that our strategy of judging time works on the speed of detection.</p>
      <p>In terms of accuracy, the accuracy is greater than 60 percent, ranking twentieth, which can be
regarded as an acceptable result. Considering the practical application scenario, stability is also very
important other than speed and latency. The Macro-average indexes, ranking 18th in terms of
MacroF1, are almost the same in the evaluation, which indicates that our method used for this task is barely
stable.</p>
      <p>In the training phase, we get an accuracy of 0.62 on the training dataset, which is very close to the
result in the evaluation phase shown in Table 3. This phenomenon shows that the bottleneck is in the
dataset or the model itself. However, as we can see in the ranking, the highest accuracy is 0.738, which
is far below other sentiment analysis tasks. This result may indicate that the dataset is not large enough
for the model to learn the characteristics of the depression feature. Besides, overfitting is also present
in our model. It will happen after training for about 12 epochs, which is also an area for further
improvement.</p>
      <p>Our team check out all of the subjects whose predictions were wrong, which is shown in Table 4,
and find that almost every one of them has raw Unicode code starting with “\u”. But intuitively, the
impact of a few words in the middle of a message on the overall judgment should be very small. This
also shows that the model proposed in this paper relies very strongly on the continuity of word senses
in sentences.</p>
      <p>Additionally, in our assumption, we should try our best effort to detect the subject who is suffering
depression, but as a result, the wrong predictions are almost evenly split, which is also a direction that
can be improved.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>Generally speaking, this paper represents the GetitDone participation for the MentalRiskES 2a task.
We got inspiration from the RoBERTa model and used a model pretrained on Spanish corpora. With
the hardware limitations, it’s difficult to do more pretraining on the extra depression corpus in Spanish.
During the experiment, we tried some common classifiers, but no much better results were obtained.
So we use a linear classifier in our model.</p>
      <p>In future work, three aspects can be considered to improve the experiment as follows.
1. Collect more depression detection corpora to train the existing model.
2. Improve the classifier to make a more reasonable decision.</p>
      <p>3. Work out a structure to hold the data that can reflect more data features.</p>
    </sec>
    <sec id="sec-7">
      <title>7. References</title>
      <p>[1] Martın-Rodilla, P., Losada, D. E., &amp; Crestani, F. (2022, August). Overview of eRisk 2022: Early
Risk Prediction on the Internet. In Experimental IR Meets Multilinguality, Multimodality, and
Interaction: 13th International Conference of the CLEF Association, CLEF 2022, Bologna, Italy,
September 5–8, 2022, Proceedings (Vol. 13390, p. 233). Springer Nature.
[2] Latoo, J., Haddad, P. M., Mistry, M., Wadoo, O., Islam, S. M. S., Jan, F., ... &amp; Alabdulla, M.
(2021). The COVID-19 pandemic: an opportunity to make mental health a higher public health
priority. BJPsych open, 7(5), e172.
[3] Nadeem, M., Horn, M., Coppersmith, G., &amp; Sen, S. Identifying Depression on Twitter.
[4] Gautam, G., &amp; Yadav, D. (2014, August). Sentiment analysis of twitter data using machine
learning approaches and semantic analysis. In 2014 Seventh international conference on
contemporary computing (IC3) (pp. 437-442). IEEE.
[5] Bucur, A. M., Cosma, A., &amp; Dinu, L. P. (2021). Early Risk Detection of Pathological Gambling,
Self-Harm and Depression Using BERT, CEUR-WS.org, online
http://ceur-ws.org/Vol2936/paper-77.pdf
[6] Wołk, A., Chlasta, K., &amp; Holas, P. (2021). Hybrid approach to detecting symptoms of depression
in social media entries. Retrieved from https://aisel.aisnet.org/pacis2021/192
[7] Islam, M. R., Kabir, M. A., Ahmed, A., Kamal, A. R. M., Wang, H., &amp; Ulhaq, A. (2018).</p>
      <p>Depression detection from social network data using machine learning techniques. Health
information science and systems, 6, 1-12.
[8] Jahan, R., &amp; Tripathi, M. M. (2022). Multimodal depression detection using machine learning. In
Artificial Intelligence, Machine Learning, and Mental Health in Pandemics (pp. 53-72). Academic
Press.
[9] Strubell, E., Ganesh, A., &amp; McCallum, A. (2019). Energy and Policy Considerations for Deep
Learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics (pp. 3645–3650). Association for Computational Linguistics.
[10] Budennyy, S., Lazarev, V., Zakharenko, N., Korovin, A., Plosskaya, O., Dimitrov, D., Arkhipkin,
V., Oseledets, I., Barsola, I., Egorov, I., Kosterina, A., &amp; Zhukov, L. (2022). Eco2AI: carbon
emissions tracking of machine learning models as the first step towards sustainable AI. arXiv
eprints, arXiv:2208.00406.
[11] Perez, J., Giudici, J., &amp; Luque, F. (2021). pysentimiento: A Python Toolkit for Sentiment Analysis
and SocialNLP tasks. arXiv e-prints, arXiv:2106.09462.
[12] Perez, J., Furman, D., Alonso Alemany, L., &amp; Luque, F. (2022). RoBERTuito: a pre-trained
language model for social media text in Spanish. In Proceedings of the Thirteenth Language
Resources and Evaluation Conference (pp. 7235–7243). European Language Resources
Association.
[13] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike
Lewis, Luke Zettlemoyer, &amp; Veselin Stoyanov. (2020). RoBERTa: A Robustly Optimized BERT
Pretraining Approach.
[14] García-Vega, M., Díaz-Galiano, M. C., García-Cumbreras, M. A., Del Arco, F. M. P.,
MontejoRáez, A., Jiménez-Zafra, S. M., ... &amp; Chiruzzo, L. (2020, September). Overview of TASS 2020:
Introducing emotion detection. In Proceedings of the Iberian Languages Evaluation Forum
(IberLEF 2020) Co-Located with 36th Conference of the Spanish Society for Natural Language
Processing (SEPLN 2020), Málaga, Spain (pp. 163-170).
[15] Hu, G. 2023MentalRiskES_task2a_model.</p>
      <p>https://github.com/GeorgeHu6/OpenCode/tree/main/2023MentalRiskES
[16] Mármol-Romero, A., Moreno-Muñoz, A., Plaza-del-Arco, F., Molina-González, M.,
MartínValdivia, M., Ureña-López, L., &amp; Montejo-Ráez, A. (2023). Overview of MentalRiskES at
IberLEF 2023: Early Detection of Mental Disorders Risk in Spanish. Procesamiento del Lenguaje
Natural, 71.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>