<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>at EMit: Ensemble Approach for Categorial Emotion Detection in Social Media Messages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nguyen Ba Dai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nguyen Ngoc Phuong Uyen</string-name>
          <email>uyennnp21411@st.uel.edu.vn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dang Van Thin</string-name>
          <email>thindv@uit.edu.vn</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Information Science and Engineering, University of Information Technology</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Information Systems, University of Economics and Law</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Economics and Law</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Information Technology</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Vietnam National University</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a submission system for the EMit (Emotions in Italian) shared task at EVALITA 2023, which focuses on categorial emotion detection in social media messages related to TV shows, TV series, music videos, and advertisements domain. We employ an ensemble approach, leveraging the BERT (Bidirectional Encoder Representations from Transformers) models known for its advanced language understanding capabilities. The BERT model is fine-tuned using domain-specific data to enhance its performance in emotion detection. The ensemble architecture combines multiple pre-trained models and utilizes a soft voting technique for robust decision-making. The results demonstrate the efectiveness of the team's ensemble model, achieving a Top 3 ranking in task 1 with 49.94% of the F1-score.</p>
      </abstract>
      <kwd-group>
        <kwd>Messages</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Within several fields, including Natural Language Pro</title>
        <p>cessing (NLP), Artificial Intelligence (AI) has made
significant contributions in ofering eficient answers for
crucial societal and human issues. Two critical areas of
NLP are sentiment analysis and emotional recognition
[1]. While sentiment analysis usually classifies data into
tional recognition extracts distinct human emotions such
as disgust, fear, joy, and more. Emotion detection has
been studied and applied extensively in computational
and linguistic techniques to help computers understand
and, at times, generate human languages. We can
understand its significance because emotions play vital roles
in the existence or the complete make-up of individuals.</p>
      </sec>
      <sec id="sec-1-2">
        <title>EVALITA 2023 [2] provides a shared framework for the</title>
        <p>evaluation of diferent systems and approaches on the</p>
      </sec>
      <sec id="sec-1-3">
        <title>EMit (Emotions in Italian) shared task [3]. In this shared ifcation problems, were proposed for participants. The ifrst challenge called Categorial Emotion Detection, aims to identify emotions in social media messages or the</title>
        <p>LGOBE
(D. V. Thin)
https://nlp.uit.edu.vn/ (D. V. Thin)
0009-0008-8559-3154 (N. B. Dai); 0009-0001-8604-1991
(N. N. P. Uyen); 0000-0001-8340-1405 (D. V. Thin)</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related</title>
    </sec>
    <sec id="sec-3">
      <title>Work</title>
      <sec id="sec-3-1">
        <title>Detection of emotions is not entirely novel, for more</title>
        <p>than a decade, there have been several international
evaluation campaigns launched for example the Emotion</p>
      </sec>
      <sec id="sec-3-2">
        <title>Classification shared task at WASSA 2022 [ 5], EmoCon</title>
        <p>at SemEval 2007 [7],... However, further research is still
needed, especially on identifying emotions through
messages related to TV shows, TV series, music videos, and
advertisements.</p>
        <p>In EVALITA 2018, the Hate Speech Detection (HaSpeeDe)
task was conducted specifically for Italian, with the
objective of automatically labeling messages from Twitter
and Facebook as either containing or not containing hate
speech. To develop robust hate speech detection
systems, [8] utilized the dataset released for the HaSpeeDe
shared task, which combined an English dataset and a</p>
      </sec>
      <sec id="sec-3-3">
        <title>German dataset distributed for the Identification of Ofen</title>
        <p>task, two sub-tasks, both designed as multilabel classi- text at SemEval 2019 [6], The Afective Text shared task
sive Language shared task organized at Germeval 2018. 3. Approach
Their findings identified a recurrent neural architecture
that demonstrates stability and high performance across Figure 1 shows the overview of our ensemble model for
various languages. They also evaluated the impact of task A. The combined model integrates multiple
preseveral components commonly used in the task, includ- trained models to leverage their domain-specific
expering the type of embeddings, the incorporation of addi- tise and enhance performance. By using the soft voting
tional features (text-based or emotion-based), the role of method, the model aggregates the predictions from each
hashtag normalization, and the influence of emojis. [ 9] model to generate accurate results.
introduced FEEL-IT, a novel benchmark corpus of Italian The integration process involves obtaining
indepenTwitter posts annotated with four basic emotions: anger, dent outputs from each pre-trained model and combining
fear, joy, and sadness. Another relevant research by [10], them using the soft voting method. This method
considpaper evaluated BERT’s performance in emotion recog- ers the confidence scores associated with each prediction,
nition using the EmotionLines dataset from the Friends assigning higher weights to more reliable predictions for
Television Sitcom series and the EmotionPush dataset a robust final decision. The model architecture facilitates
from Facebook Messenger chat. seamless communication and coordination between the</p>
        <p>
          Research on emotion detection has been conducted pre-trained models, enabling eficient information
exacross various languages and tested on multiple tech- change and fusion of predictions. It can be easily adapted
nological models: an attention-based methodology for and extended to incorporate new models and emerging
identifying and categorizing emotions in textual inter- techniques for continuous improvement.
actions [11], using the Bi-LSTM to classify emotions in In this paper, we utilize the pre-trained BERT model
textual and emoji utterances [12], detecting text emo- [15] for the following reasons: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) First, BERT is a
statetions in social networks with a novel ensemble classifier of-the-art model renowned for its exceptional language
based on the Parzen Tree Estimator (TPE) [13], applying understanding capabilities. It efectively captures the
seK-NN and NB ML techniques in the detection of emotions mantics and nuances of text due to its deep contextualized
[14]. To further contribute to emotion detection experi- representations. (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Second, by leveraging transfer
learnments and conduct research on identifying emotions in ing, BERT provides a significant advantage. It has been
Italian social media audiences, we employ an ensemble pre-trained on extensive and diverse datasets, allowing us
approach for Categorical Emotion Detection in Social to benefit from the knowledge it has acquired. This saves
Media Messages. computational resources and time compared to training
a model from scratch. (3) Finally, BERT’s adaptability to
diferent domains is a key factor. Its pre-training covers
a wide range of domains, making it flexible for various
tasks. By fine-tuning BERT with domain-specific data,
we can enhance its performance for our specific task.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <sec id="sec-4-1-1">
          <title>We only use the oficial training set [ 3], which is provided</title>
          <p>by the organizers, to train our models for task A in the
shared task.</p>
          <p>Table 1 presents the general statistics of the training
and testing set. As depicted in Table 1, it is evident that
the training and test sets in this shared-task experiment
are balanced in terms of average length. These statistics
provide a foundation for understanding the data and
designing appropriate models for the given task.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Pre-processing</title>
        <p>Pre-processing steps are essential in classification-type
tasks to improve the quality of the data and facilitate
efective analysis. Additionally, the provided dataset is
collected from social media messages, therefore, it is
necessary to design the list of pre-processing steps. Based
on the analysis of the training dataset, we design the list
of pre-processing steps in our system as follows:
• Step 3: We transformed the emoji from the
dataset to text because of the large amount of
emoji in sentences.
• Step 4: We tagged the link, phone number,
and hashtag to its related token, for example,
”0123456789” transformed to ”&lt;phone&gt;”.
• Step 5: Finally, we removed the extra space
symbols from the text.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Models</title>
        <p>
          After investigating various models, we have concluded
that the dbmdz/bert-base-italian-cased,
gbarone77/polibert_sa, and dbmdz/bert-base-italian-xxl-cased BERT
models are the most suitable choices for this shared task. We
have identified several reasons to support our decision:
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) These models were trained on the Italia corpus, which
aligns well with the requirements of this shared task. (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
All three models have demonstrated outstanding
performance for this task, as indicated in Table 3.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. System setting</title>
        <sec id="sec-4-4-1">
          <title>We evaluated various models using the HuggingFace</title>
          <p>Transformer library [16]. Each model was trained with
a fixed number of epochs, specifically 5 epochs. The
learning rate used for dbmdz/bert-base-italian-cased and
gbarone77/polibert_sa was set to 3e-5, while for
dbmdz/bert-base-italian-xxl-cased, it was set to 2e-5. The
batch size for all pre-trained language models was set
to 16. No development data was utilized for model
tuning. We employed the AdamW optimizer to optimize our
models. Additionally, we added an extra sigmoid layer
at the output for each model. To ensure reproducibility,
we set a fixed random seed of 42 for training the models.
• Step 1: We removed the ”@USER” and the place- 5. Main results</p>
          <p>holder ”_” carried after it
• Step 2: We removed the redundant symbolic ex- The oficial results and the results of the top systems
pressions and keep only one symbol, for example, are shown in Table 2. Our best model achieves the Top
”!!!”, ”???” transformed to ”!”, ”?”. 3 ranking in the challenge during the final round. Our
model achieved a result of 49.94% in terms of F1-score,
which is lower than the F1 scores of the Top 1 and Top 2
teams, which are +10.34% and +0.92%, respectively.</p>
          <p>Table 3 shows the overall results of our submission
model and other variants on the test set on the challenge.</p>
          <p>Overall, it can be seen that the performance of the model
is improved when we preprocessed data input, fine-tuned,
and ensemble those models. These techniques,
mentioned in previous work [17], have consistently shown
their efectiveness in enhancing model performance.
Preprocessing improves data quality, fine-tuning tailors
models to the task, and ensembling combines strengths for
better results. These findings reinforce the value of these
techniques for improving model performance in similar
domains or tasks. Our ensemble methods have
significantly improved the F1-Score, increasing it from 2.7% to
6.42%. This demonstrates the efectiveness of ensembling
in enhancing model performance on the task at hand.</p>
          <p>On the other hand, Table 3 shows that the models
dbmdz/bert-base-italian-cased,
dbmdz/bert-base-italianxxl-cased, and gbarone77/polibert_sa outperformed the
other models. Among them, the gbarone77/polibert_sa
model achieved the best performance due to its utilization
of Italian language data and its specific training for
sentiment analysis, which aligns well with the task
requirements. The superior performance of these three models
can be attributed to their training on a large amount of
data specific to the Italian language.</p>
          <p>In contrast, the models
bert-base-multilingualcased, Babelscape/wikineural-multilingual-ner, and
nlptown/bert-base-multilingual-uncased-sentiment
were trained on multilingual language data, which
resulted in comparatively lower performance when
applied to Italian-specific tasks. To address this issue,
the Geotrend/bert-base-it-cased model was introduced
as an enhancement to the bert-base-multilingual-cased
model by training it on Italian language data. This
targeted training on Italian data resulted in a
performance improvement of 1.34% compared to the original
multilingual model. However, despite this improvement,
the Geotrend/bert-base-it-cased model still exhibits
lower performance than the top three models mentioned
earlier. One possible reason for this is that the amount
of training data used for the Geotrend/bert-base-it-cased
model was relatively smaller compared to the
extensive data used for the top-performing models. The
mgrella/autonlp-bank-transaction-classification-5521155
models, while trained on Italian language data, were
specifically tailored for the ”bank transaction” field. As a
result, their performance is lower for sentiment analysis
tasks, which is the focus of our study.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <sec id="sec-5-1">
        <title>In this paper, we present our submission system for the</title>
        <p>EMit at EVALITA 2023: The Categorial Emotion
Detection Task, where our approach achieved a Top 3 ranking.</p>
        <p>Instead of relying on a single model, we adopt an
ensemble method approach using the pre-trained BERT model
[15] to tackle the task. Through extensive
experimentation and analysis, we have found this approach to be
highly efective and believe it can be applied to other
domains for sentiment analysis as well. While we
acknowledge that our familiarity with the Italian language
was limited, we recognize the importance of efective text
pre-processing techniques in enhancing performance on
the test set. Although we were unable to leverage these
techniques to their full potential, we firmly believe that
incorporating appropriate text pre-processing techniques
can further improve the performance of our model.
evaluation campaign of natural language process- evaluation, 2019, pp. 287–291.
ing and speech tools for italian, in: Proceedings [13] F. Ghanbari-Adivi, M. Mosleh, Text emotion
detecof the Eighth Evaluation Campaign of Natural Lan- tion in social networks using a novel ensemble
clasguage Processing and Speech Tools for Italian. Final sifier based on parzen tree estimator (tpe), Neural
Workshop (EVALITA 2023), CEUR.org, Parma, Italy, Computing and Applications 31 (2019) 8971–8983.
2023. [14] M. Suhasini, B. Srinivasu, Emotion detection
frame[3] O. Araque, S. Frenda, R. Sprugnoli, D. Nozza, work for twitter data using supervised classifiers,
V. Patti, EMit at EVALITA 2023: Overview of the in: Data Engineering and Communication
TechnolCategorical Emotion Detection in Italian Social Me- ogy: Proceedings of 3rd ICDECT-2K19, Springer,
dia Task, in: Proceedings of the Eighth Evalua- 2020, pp. 565–576.
tion Campaign of Natural Language Processing and [15] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,
Speech Tools for Italian. Final Workshop (EVALITA Bert: Pre-training of deep bidirectional
transform2023), CEUR.org, Parma, Italy, 2023. ers for language understanding, arXiv preprint
[4] L. A. Camras, R. Plutchik, H. Kellerman, Emotion: arXiv:1810.04805 (2018).</p>
        <p>Theory, research, and experience. vol. 1. theories of [16] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C.
Deemotion, American Journal of Psychology 94 (1981) langue, A. Moi, P. Cistac, T. Rault, R. Louf, M.
Fun370. towicz, J. Davison, S. Shleifer, P. von Platen, C. Ma,
[5] V. Barriere, S. Tafreshi, J. Sedoc, S. Alqahtani, Wassa Y. Jernite, J. Plu, C. Xu, T. Le Scao, S.
Gug2022 shared task: Predicting empathy, emotion and ger, M. Drame, Q. Lhoest, A. Rush,
Transformpersonality in reaction to news stories, in: Pro- ers: State-of-the-art natural language
processceedings of the 12th Workshop on Computational ing, in: Proceedings of the 2020 Conference on
Approaches to Subjectivity, Sentiment &amp; Social Me- Empirical Methods in Natural Language
Processdia Analysis, 2022, pp. 214–227. ing: System Demonstrations, Association for
Com[6] A. Chatterjee, K. N. Narahari, M. Joshi, P. Agrawal, putational Linguistics, Online, 2020, pp. 38–45.</p>
        <p>Semeval-2019 task 3: Emocontext contextual emo- URL: https://aclanthology.org/2020.emnlp-demos.6.
tion detection in text, in: Proceedings of the 13th in- doi:10.18653/v1/2020.emnlp- demos.6.
ternational workshop on semantic evaluation, 2019, [17] Y. Xu, X. Qiu, L. Zhou, X. Huang, Improving bert
pp. 39–48. ifne-tuning via self-ensemble and self-distillation,
[7] E. Agirre, L. Márquez, R. Wicentowski, Proceedings arXiv preprint arXiv:2002.10345 (2020).
of the fourth international workshop on semantic
evaluations (semeval-2007), in: Proceedings of the
Fourth International Workshop on Semantic
Evaluations (SemEval-2007), 2007.
[8] M. Corazza, S. Menini, E. Cabrio, S. Tonelli, S.
Villata, A multilingual evaluation for online hate
speech detection, ACM Transactions on Internet</p>
        <p>Technology (TOIT) 20 (2020) 1–22.
[9] F. Bianchi, D. Nozza, D. Hovy, et al., Feel-it:
Emotion and sentiment classification for the italian
language, in: Proceedings of the Eleventh Workshop
on Computational Approaches to Subjectivity,
Sentiment and Social Media Analysis, Association for</p>
        <p>Computational Linguistics, 2021.
[10] Y.-H. Huang, S.-R. Lee, M.-Y. Ma, Y.-H. Chen, Y.-W.</p>
        <p>Yu, Y.-S. Chen, Emotionx-idea: Emotion bert–an
afectional model for conversation, arXiv preprint
arXiv:1908.06264 (2019).
[11] W. Ragheb, J. Azé, S. Bringay, M. Servajean,</p>
        <p>Attention-based modeling for emotion detection
and classification in textual conversations, arXiv
preprint arXiv:1906.07020 (2019).
[12] L. Ma, L. Zhang, W. Ye, W. Hu, Pkuse at
semeval2019 task 3: emotion detection with
emotionoriented neural attention network, in: Proceedings
of the 13th international workshop on semantic</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Acheampong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wenyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nunoo-Mensah</surname>
          </string-name>
          ,
          <article-title>Text-based emotion detection: Advances, challenges, and opportunities</article-title>
          ,
          <source>Engineering Reports</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <article-title>e12189</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Menini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          , G. Venturi,
          <year>Evalita 2023</year>
          :
          <article-title>Overview of the 8th</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>