Emotions & Threat Detection in Urdu using
Transformer Based Models
Anik Basu Bhaumik1 , Mithun Das2
1
    A.K Chowdhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
2
    Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal, India


                                         Abstract
                                         Social media platforms have connected billions of people and helped them share their views on these
                                         platforms. However, the problem arises when malicious users abuse, show anger, and threaten others on
                                         these platforms. Therefore it is indeed necessary to detect such hostile/harmful content. So far, several
                                         studies have been conducted for hostile and negative content detection, but most of the work revolves
                                         around English. Hence to facilitate research for low-resource languages such as Urdu, the organizers of
                                         the “EmoThreat: Emotions & Threat Detection in Urdu”shared task at FIRE 2022 have introduced two
                                         tasks for emotion classification and threatening language detection. In this paper, we investigate the
                                         performance of several transformer-based models and observe that the MBERT model performs the
                                         best for emotion classification. In contrast, the MURIL model performs the best for threatening tweet
                                         classification. Finally, our team hate-alert stands 3rd in task A, 2nd in subtask 1B and 2nd in subtask 2B.

                                         Keywords
                                         Urdu, Threat Detection, Emotion Classification, Natural Language Processing


1. Introduction
Most of our population is connected to each other via the social network; the social network
has and is helping us get news, express our opinion, and slowly influence our growth as a
society. It has been seen that Facebook has roughly 2.93 billion monthly active users1 , Instagram
has 1.21 billion monthly active users2 , and Twitter has over 450 million monthly active users
globally3 . Therefore it can understand the enormous amount of content being shared over the
Internet. One of the issues with these content-sharing platforms is that occasionally bad actors
share negative, abusive, threatening, and aggressive posts on this platform and endanger the
well-being of millions of people [1].
   To mitigate the effect of malicious content, platforms like Facebook4 and Twitter5 have
already made guidelines that the platform users must follow to keep these platforms healthy
and safe; besides, they hired moderators [2] to check the content manually. Although due
FIRE’22: Forum for Information Retrieval Evaluation, December 9-13, 2022, India
Envelope-Open anikbb@gmail.com (A. B. Bhaumik); mithundas@iitkgp.ac.in (M. Das)
Orcid 0000-0003-1442-312X (M. Das)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings         CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073


                  1
                    https://backlinko.com/facebook-users
                  2
                    https://www.statista.com/statistics/183585/instagram-number-of-global-users/
                  3
                    https://www.businessofapps.com/data/twitter-statistics/
                  4
                    https://transparency.fb.com/bn-in/policies/community-standards/hate-speech/
                  5
                    https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
to the large volume of content, it is difficult to filter all the content posted on the platforms
manually. So far, several studies have been conducted to detect such negative and hostile
content automatically [3, 4, 5, 6, 7], but most of the studies are centralized around the English
language[8, 9].
   Therefore to engage and facilitate the research around low resoruce languages, the organizers
of the “EmoThreat: Emotions & Threat Detection in Urdu [10, 11]”6 shared task at FIRE 2022
have introduced two tasks for emotion classification and threat detection in Urdu. Urdu is
spoken widely over South Asia; it is the official language of Pakistan. It is also widely used in
regions of India and the Middle East. It has over 230 million speakers across the globe7 . Urdu is
written in Perso-Arabic script. The objective of the shared task is to devise methodologies to
detect the associated emotion with a text and to classify whether a text is threatening or not.
   In this paper, we investigate several transformer-based models for the classification task,
which have already been seen to outperform the existing baselines and stand as a state-of-the-
art model for various tasks considering hateful and abusive speech [12, 13, 14]. We conduct
pre-processing, data sampling, hyper-parameter tuning, etc., to construct the model. The best
models stand 3rd in task A (Multi-label emotion classification in Urdu), 2nd in subtask 1B(
Classify the given tweet as “threatening” and “non-threatening”), and 2nd in subtask 2B(If the
tweet is classified as a “threatening” tweet, then it should be further classified as a “individual”
or a “group” threat).


2. Related Work
Due to the exponential growth of social media platforms, sharing content on these platforms
has expanded tremendously, further increasing the malicious content on these platforms. There-
fore detection of such malicious content has gained significant attraction among the research
community.
   In 2017, Waseem et al. [5] classified abusive languages into two categories “Directed” (language
directed at a specific person or thing) and “Generalized” (directed at a generalized group). Further,
this category has been divided into another two categories, “Explicit” and “Implicit” (the degree
to which it is explicit).
   In order to accomplish the classification objective of identifying hate/offensive speech em-
bedded in Tweets, Davidson et al. [4] provided a dataset in which thousands of tweets were
categorized as “hate”,“offensive”, and “neither”. They subsequently investigated how linguistic
characteristics like character and word n-grams influenced the performance of a classifier
designed to identify these three categories of Tweets using this dataset. They also used features
such as the number of characters, words, and syllables in each tweet, count indicators for
hashtags, mentions, retweets, and URLs. The authors discovered that one of the problems with
their best models was that they could not distinguish between offensive and hateful posts.
   Pitsilis et al. [15] examined recurrent neural networks (RNNs) in 2018 to detect the offensive
language in English. The author found that RNNs performed admirably on this task using
ensemble methods, achieving an F1-score of 0.9320. RNNs preserve the outcomes of each

    6
        https://sites.google.com/view/multi-label-emotionsfire-task/
    7
        https://en.wikipedia.org/wiki/Urdu
step the model conducts. This technique can capture linguistic context within a text which is
essential for detection. While RNNs have been projected to do well with language models, other
neural network models, including CNN and LSTM, have succeeded at identifying hate/offensive
speech [16, 17].
   Transformer-based [18] language models, such as BERT and m-BERT [19], have recently
gained popularity in various downstream tasks, like categorization and span detection. Transformer-
based models have formerly been found to outperform [3] a number of deep learning models,
including CNN-GRU, LSTM, and others. As a result of seeing how well these Transformer-based
models function, we concentrate on developing them for our classification problem.


3. Dataset Description
The shared tasks present in this competition are divided into two parts. The datasets have been
sampled from Twitter. The Task A is to perform multi-label emotion classification given Urdu
Nastalíq tweets [20, 21, 22, 23]; it has to be classified into one or more of the following categories:
Neutral, Happiness, Surprise, Sadness, Fear, Disgust, Anger. The task B [24, 25, 26, 27, 28] is further
divided into two parts. In the first part(1B), the task is to classify a tweet as threatening or
non-threatening; in the second part, the task is to classify threatening tweets into two categories:
“group” or “individual” threats. The presented data has been collected and annotated from
Natural Language and Text Processing Laboratory8 at Center for Computing Research9 of
Instituto Politécnico Nacional, Mexico.

3.1. Task A
This task is a multi-class classification task in which tweets need to be classified into seven
classes, namely: Anger, Disgust, Fear, Sadness, Surprise, Happiness, Neutral. The training dataset
has total 7,800 instances and the test dataset has total 1,950 instances. The dataset description
for this task has been represented in Table 1.

3.2. Task B
This is a classification task of identifying/detecting threatening language in Urdu with two
sub-tasks.

    • Sub-task 1B : Binary classification of the tweets as threatening and non-threatening
    • Sub-task 2B : If the tweet is classified as a threatening tweet then it should be further
      classified as a ”group” or ”individual threat”.

  For the task B, the training dataset is having 3,564 instances and the test dataset has 935
instances which is annotated as threatening(group / individual) and non-threatening. The
dataset distribution is presented in Table 2. and Table 3


    8
        https://nlp.cic.ipn.mx/
    9
        https://www.cic.ipn.mx/index.php/en/
                                           Emotion classification dataset
                          Category
                                           Train Test
                          Neutral          3014   753
                          Happiness        1046   261
                          Surprise         1550   388
                          Sadness          2190   548
                          Fear             609    152
                          Disgust          761    190
                          Anger            811    203
                          Total Tweets     7800   1950
Table 1
Dataset distribution of Multi-label emotion classification (Task A)

                                                      Threat Dataset
                                Category
                                                      Train Test
                                threatening           1782    308
                                non-threatening       1782    627
                                Total                 3564    935
Table 2
Dataset distribution of threatening language detection (Task 1B)

                                                      Threat Dataset
                                Category
                                                      Train Test
                                Group                 1341    252
                                Individual            441     55
                                non-threatening       1782    628
                                Total                 3564    935
Table 3
Dataset distribution of fine-grained threatening language detection (Task 2B)


4. System Description
This section explains the transformer-based models that have been explored. For task A (Multi-
label emotion classification), we experimented with MBERT [19] and MURIL [29] models10 .
For subtask 1B(Binary classification of threatening language), we experimented with the fol-
lowing models: MBERT, MURIL, “dehatebert-mono-arabic”11 [30] and “indic-abusive-allInOne-
MuRIL”12 [31]. The “dehatebert-mono-arabic” model is an MBERT variant, which is fine-tuned
on the Arabic hate speech dataset, and the “indic-abusive-allInOne-MuRIL” model is a MURIL
variant previously finetuned on eight different abusive Indic languages considering Urdu. For
the sub-task 2B(fine-grained classification of threatening language), we only experimented with


   10
      Code used from: https://github.com/hate-alert/IndicAbusive
   11
      https://huggingface.co/Hate-speech-CNERG/dehatebert-mono-arabic
   12
      https://huggingface.co/Hate-speech-CNERG/indic-abusive-allInOne-MuRIL
            Model      Accuracy      Weighted F1      Micro F1     Macro F1      Hamming loss
            MBERT        0.612          0.709          0.724        0.615           0.092
            MURIL        0.519          0.513          0.610        0.309           0.117
Table 4
Multi-label Emotion Classification Results (Task A)

                            Model                       Accuracy      F1 Score    ROC-AUC
                           MBERT                          0.647         0.666       0.663
                           MURIL                          0.716        0.737        0.729
              dehatebert-mono-arabic [30]                 0.642         0.687       0.641
              indic-abusive-allInOne-MuRIL [31]           0.672         0.706       0.674
Table 5
Two class threatening tweet classification results (subtask 1B). The best performing model is marked in
bold and the second best is marked in underline.


MBERT and MURIL models13 .

4.1. Multi-label Classification
The Task A is a multi-label classification problem, where each post can be classified among
one or more categories. As discussed above we fine tuned transformer-based MBERT and
MURIL models and added a classifier layer on top of that. BCE loss function has been used for
calculating the loss.

4.2. Multi-class Classification
Subtasks 2A and 2B is a binary and ternary classification problems. Here we also add an extra
classification layer on top of the transformer models we used. For this subtask, the Cross-
Entropy loss function has been used as a loss function. Also, as seen from table 3, we can
observe that the data is imbalanced; therefore, appropriate weights have been added to the
classes before fine-tuning the models.

4.3. Tuning Parameters
The models have been run for 5 epochs with Adam optimizer[32] and initial learning rate of
2e-5. As no validation dataset was given, we divided the training data points into 85% and
15% split and used the 15% as a validation set. We predict the test set for the best validation
performance.


5. Results
The performance of the task A, the multi-label emotion classification has been shown in Table
4. We observe that between MBERT and MURIL models, the MBERT model performs the
   13
        Code used from: https://www.kaggle.com/vpkprasanna/bert-model-with-0-845-accuracy
                           Model       F1 Score    Accuracy       ROC-AUC
                           MBERT         0.473       0.621          0.626
                           MURIL         0.535       0.696          0.66
Table 6
Three class threatening tweet classification results (Task 2B)


Table 7
Example of a few misclassified tweets of emotion classification


Table 8
Example of a few misclassified tweets of threat detection


best in terms of all the evaluation metrics(Acc:0.612, Weighted F1: 0.709, Macro F1:0.615). For
the sub-task 1B, we observe the MURIL model perform the best(Acc: 0.716, F1:0.737, ROC-
AUC:0.729) in terms of all metrics and the “indic-abusive-allInOne-MuRIL” model perform the
second best(Acc: 0.672, F1:0.706, ROC-AUC:0.674). One interesting observation is that although
“dehatebert-mono-arabic” and “indic-abusive-allInOne-MuRIL” models are previously finetuned
on hate speech and abusive speech dataset, further fine-tuning them with the threatening tweet
dataset do not outperform the vanila MURIL model. For the sub-task 2B also we obseve the
MURIL model perform the best(Acc: 0.535, F1:0.696, ROC-AUC:0.66).


6. Error Analysis
To further understand when the model is failing, we manually inspected some misclassified
tweets by the best-performing models. For the emotion classification task, we observed that
the actual label itself is sometimes incorrect according to our judgment based on the translated
tweets; therefore, the model is failing for such cases. For threatening tweet detection, some-
times the presence of words such as killing makes the prediction incorrect; the model cannot
distinguish threatening and non-threatening tweets for such cases. We have shown the example
of some misclassified tweets in Table 7 and 8.


7. Conclusion
In this shared task, we have experimented with several transformer-based models for multi-
label emotion classification and threatening tweet detection. In specific, we explored MURIL,
MBERT-based models. We observed that the MBERT model performed the best for the emotion
classification, and for the threatening tweet classification, the MURIL model performed the best.
Our team hate-alert stands 3rd in task A, 2nd in subtask 1B and 2nd in subtask 2B.


References
 [1] J. S. Vedeler, T. Olsen, J. Eriksen, Hate speech harms: a social justice discussion of disabled
     norwegians’ experiences, Disability & Society 34 (2019) 368–383.
 [2] C. Newton, The terror queue, 2019. URL: https://www.theverge.com/2019/12/16/21021005/
     google-youtube-moderators-ptsd-accenture-violent-disturbing-content-interviews-video.
 [3] B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, A. Mukherjee, Hatexplain: A
     benchmark dataset for explainable hate speech detection, in: Proceedings of the AAAI
     Conference on Artificial Intelligence, volume 35, 2021, pp. 14867–14875.
 [4] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the
     problem of offensive language, in: ICWSM, 2017.
 [5] Z. Waseem, T. Davidson, D. Warmsley, I. Weber, Understanding abuse: A typology of
     abusive language detection subtasks, in: Proceedings of the First Workshop on Abusive
     Language Online, Association for Computational Linguistics, Vancouver, BC, Canada,
     2017, pp. 78–84. URL: https://aclanthology.org/W17-3012. doi:10.18653/v1/W17- 3012 .
 [6] M. Das, P. Saha, R. Dutt, P. Goyal, A. Mukherjee, B. Mathew, You too brutus! trapping
     hateful users in social media: Challenges, solutions insights, in: Proceedings of the 32nd
     ACM Conference on Hypertext and Social Media, HT ’21, Association for Computing
     Machinery, New York, NY, USA, 2021, p. 79–89. URL: https://doi.org/10.1145/3465336.
     3475106. doi:10.1145/3465336.3475106 .
 [7] T. Mandl, S. Modha, P. Majumder, D. Patel, M. Dave, C. Mandlia, A. Patel, Overview of the
     hasoc track at fire 2019: Hate speech and offensive content identification in indo-european
     languages, in: Proceedings of the 11th forum for information retrieval evaluation, 2019,
     pp. 14–17.
 [8] M. Das, B. Mathew, P. Saha, P. Goyal, A. Mukherjee, Hate speech in online social media,
     ACM SIGWEB Newsletter (2020) 1–8. doi:10.1145/3427478.3427482 .
 [9] M. Amjad, N. Ashraf, G. Sidorov, A. Zhila, L. Chanona-Hernandez, A. Gelbukh, Automatic
     abusive language detection in urdu tweets, ACTA POLYTECHNICA HUNGARICA (2021).
[10] S. Butt, M. Amjad, F. Balouchzahi, N. Ashraf, R. Sharma, G. Sidorov, A. Gelbukh, Overview
     of EmoThreat: Emotions and Threat Detection in Urdu at FIRE 2022, in: CEUR Workshop
     Proceedings, 2022.
[11] S. Butt, M. Amjad, F. Balouchzahi, N. Ashraf, R. Sharma, G. Sidorov, A. Gelbukh, EmoTh-
     reat@FIRE2022: Shared Track on Emotions and Threat Detection in Urdu, in: Forum for
     Information Retrieval Evaluation, FIRE 2022, Association for Computing Machinery, New
     York, NY, USA, 2022.
[12] S. Banerjee, M. Sarkar, N. Agrawal, P. Saha, M. Das, Exploring transformer based models
     to identify hate speech and offensive content in english and indo-aryan languages, arXiv
     preprint arXiv:2111.13974 (2021).
[13] M. Das, S. Banerjee, P. Saha, Abusive and threatening language detection in urdu us-
     ing boosting based and bert based models: A comparative approach, arXiv preprint
     arXiv:2111.14830 (2021).
[14] M. Das, S. Banerjee, A. Mukherjee, hate-alert@ dravidianlangtech-acl2022: Ensembling
     multi-modalities for tamil trollmeme classification, in: Proceedings of the Second Work-
     shop on Speech and Language Technologies for Dravidian Languages, 2022, pp. 51–57.
[15] G. K. Pitsilis, H. Ramampiaro, H. Langseth, Detecting offensive language in tweets using
     deep learning, ArXiv abs/1801.04433 (2018).
[16] Y. Goldberg, A primer on neural network models for natural language processing, Journal
     of Artificial Intelligence Research 57 (2015). doi:10.1613/jair.4992 .
[17] G. L. De la Pena Sarracén, R. G. Pons, C. E. M. Cuza, P. Rosso, Hate speech detection using
     attention-based lstm, EVALITA Evaluation of NLP and Speech Tools for Italian 12 (2018)
     235.
[18] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo-
     sukhin, Attention is all you need, in: Advances in neural information processing systems,
     2017, pp. 5998–6008.
[19] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, in: Proceedings of the 2019 Conference of
     the North American Chapter of the Association for Computational Linguistics: Human
     Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171–4186.
[20] N. Ashraf, L. Khan, S. Butt, H.-T. Chang, G. Sidorov, A. Gelbukh, Multi-label emotion
     classification of urdu tweets, PeerJ Computer Science 8 (2022) e896.
[21] L. Khan, A. Amjad, N. Ashraf, H.-T. Chang, A. Gelbukh, Urdu sentiment analysis with
     deep learning methods, IEEE Access 9 (2021) 97803–97812.
[22] I. Ameer, N. Ashraf, G. Sidorov, H. Gómez Adorno, Multi-label emotion classification using
     content-based features in twitter, Computación y Sistemas 24 (2020) 1159–1164.
[23] L. Khan, A. Amjad, N. Ashraf, H.-T. Chang, Multi-class sentiment analysis of urdu text
     using multilingual bert, Scientific Reports 12 (2022) 1–17.
[24] M. Amjad, N. Ashraf, A. Zhila, G. Sidorov, A. Zubiaga, A. Gelbukh, Threatening language
     detection and target identification in urdu tweets, IEEE Access 9 (2021) 128302–128313.
[25] M. Amjad, A. Zhila, G. Sidorov, A. Labunets, S. Butt, H. I. Amjad, O. Vitman, A. Gelbukh,
     UrduThreat@ FIRE2021: Shared track on abusive threat identification in Urdu, in: Forum
     for Information Retrieval Evaluation, 2021, pp. 9–11.
[26] M. Amjad, A. Zhila, G. Sidorov, A. Labunets, S. Butt, H. I. Amjad, O. Vitman, A. Gelbukh,
     Overview of the shared task on threatening and abusive detection in Urdu at FIRE 2021,
     in: FIRE (Working Notes), CEUR Workshop Proceedings, 2021.
[27] N. Ashraf, A. Rafiq, S. Butt, H. M. F. Shehzad, G. Sidorov, A. Gelbukh, Youtube based
     religious hate speech and extremism detection dataset with machine learning baselines,
     Journal of Intelligent & Fuzzy Systems (2022) 1–9.
[28] N. Ashraf, R. Mustafa, G. Sidorov, A. Gelbukh, Individual vs. group violent threats classi-
     fication in online discussions, in: Companion Proceedings of the Web Conference 2020,
     2020, pp. 629–633.
[29] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. K. Margam, P. Aggarwal,
     R. T. Nagipogu, S. Dave, et al., Muril: Multilingual representations for indian languages,
     arXiv preprint arXiv:2103.10730 (2021).
[30] S. S. Aluru, B. Mathew, P. Saha, A. Mukherjee, A deep dive into multilingual hate speech
     classification, in: Machine Learning and Knowledge Discovery in Databases. Applied
     Data Science and Demo Track: European Conference, ECML PKDD 2020, Ghent, Belgium,
     September 14–18, 2020, Proceedings, Part V, Springer International Publishing, 2021, pp.
     423–439.
[31] M. Das, S. Banerjee, A. Mukherjee, Data bootstrapping approaches to improve low
     resource abusive language detection for indic languages, in: Proceedings of the 33rd ACM
     Conference on Hypertext and Social Media, HT ’22, Association for Computing Machinery,
     New York, NY, USA, 2022, p. 32–42. URL: https://doi.org/10.1145/3511095.3531277. doi:10.
     1145/3511095.3531277 .
[32] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, 2019. arXiv:1711.05101 .