=Paper=
{{Paper
|id=Vol-3395/T4-2
|storemode=property
|title=Multi-Label Emotion Classification in Urdu
|pdfUrl=https://ceur-ws.org/Vol-3395/T4-2.pdf
|volume=Vol-3395
|authors=Dejah Madhusankar,Avanthika Karthikeyan,Bharathi B
|dblpUrl=https://dblp.org/rec/conf/fire/MadhusankarKB22
}}
==Multi-Label Emotion Classification in Urdu==
<pdf width="1500px">https://ceur-ws.org/Vol-3395/T4-2.pdf</pdf>
<pre>
Multi-Label Emotion Classification in Urdu
Dejah Madhusankar, Avanthika Karthikeyan and Bharathi B
Department of CSE, Sri Siva Subramaniya Nadar College of Engineering, Tamil Nadu, India


                                      Abstract
                                      With the massive growth and widespread usage of social media platforms, the rates of its misuse and its
                                      corresponding impact on society have seen an exponential rise in numbers. The comfort of anonymity
                                      and wide reach offered by social media has led to the convenient spread of hatred and incitement to
                                      threats, that are often targeted against particular users and communities. Thus identifying hate speech,
                                      threats and intense emotions in the digital arena has gained attention recently. This is also the aim of
                                      the EmoThreat: Emotions and Threat Detection in Urdu 2022 Challenge. In this paper, we describe a few
                                      traditional machine learning models and deep neural networks submitted by our team Aces for Task A:
                                      Multi-label emotion classification in Urdu. The models tested include Classifier Chains, MLKNN, RNN
                                      and LSTM Networks implemented with a combination of feature extraction methods such as the Count
                                      Vectorizer and TF-IDF, as well as embedding models like Word2Vec and FastText. Each model has been
                                      discussed in detail in Section 4 after a brief overview of the dataset adopted, in Section 3. Out of these
                                      tested permutations, the Classifier Chains model with TF-IDF vectorization proved to give the most
                                      promising results, which has been detailed in Section 5.

                                      Keywords
                                      Emotion Classification in Urdu, Classifier Chains, RNN, TF-IDF, Neural Networks, Multi-label emotion
                                      detection, fastText, MLKNN


1. Introduction
The right to free speech and expression on global platforms has inadvertently led to the gen-
eration of numerous digital posts containing hateful, sensitive and abusive content. These
vulgar narratives, often-times targeted against certain individuals and communities worsen
users’ experience from communication via such media, while other posts contain actual threats
that put users in danger. Today major social-media platforms such as Google, Meta, YouTube
and Twitter are taking significant efforts to resolve this issue by detecting and censoring ob-
jectionable content before it can instigate disruptions and chaos in society. With the Urdu
language having more than 230 million speakers worldwide[1], a massive amount of user data
gets generated on an everyday basis. Content moderation of such enormous data is difficult
to achieve solely through manpower. This paper thereby explores various Machine Learning
(ML) algorithms and models for generating multi-label classification of emotions in Urdu text,

Forum for Information Retrieval Evaluation, December 16-20, 2022, India
Envelope-Open dejahmadhushankar@gmail.com (D. Madhusankar); karthiavanthika@gmail.com (A. Karthikeyan);
bharathib@ssn.edu.in (B. B)
GLOBE https://www.linkedin.com/in/dejah-madhusankar/ (D. Madhusankar);
https://github.com/Avanthika-K/Multi-Label-Emotion-Classification-in-Urdu (A. Karthikeyan);
https://www.ssn.edu.in/staff-members/dr-b-bharathi/ (B. B)
Orcid 0000-0002-0877-7063 (D. Madhusankar); 0000-0001-7116-9338 (A. Karthikeyan); 0000-0001-7279-5357 (B. B)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
in a shared task called EmoThreat hosted by the Forum for Information Retrieval Evaluation,
2022, at the Indian Statistical Institute, Kolkata, overview of which is cited here [2],[3]. Our
team submitted five runs for Task A - Multi-label Classification of Emotions in Urdu and has
been ranked 8 in the leaderboard.


2. Related Work
With social media as a major platform for blogging and sharing information impacting millions
of users’ lives every day, detecting and identifying intense emotions and objectionable content
becomes a very important task. A lot of research has been conducted over the years on various
aspects of the said issue keeping in mind the significance it holds on today’s digital society. Over
the years, individuals and organizations have come together and proposed various different
models.
   For instance, in [4] the researchers have adopted traditional, deep learning and transformer-
based Machine Learning (ML) approaches along with different kinds of text representation for
emotion classification, while the authors of [5] and [6] propose traditional machine learning
techniques to categorize text as abusive and not abusive. Similar work was carried out by the
authors of [7] where the data set containing tweets in the English language was labeled and
categorized as hate speech, offensive language or neither, leading to the authors concluding
their work as stated - ”We find that racist and homophobic tweets are more likely to be classified
as hate speech but that sexist tweets are generally classified as offensive. Tweets without
explicit hate keywords are also more difficult to classify.” Following this, the authors of [8]
have implemented content-based methods for Multi-label classification of tweets, using various
classifiers such as Binary Relevance, Classifier Chain, and Label Combination. This work had
been undertaken with the literary text [9] providing a backbone to the significance of binary-
relevance-based methods for multi-label classification, where the authors have argued that the
binary-relevance method has significantly much to offer especially in terms of scalability to
large data sets.
   On a much recent scale, the researchers of [10] have explored different text representations,
namely count-based and fastText pre-trained word embeddings for Urdu, and have considered
various machine and deep learning algorithms for evaluation. The latest work of the authors of
[11] proficiently fine-tunes the Multilingual BERT(mBERT) model for Urdu sentiment analysis
that uses four different text representations to train classifiers. Amidst the general concerns
and obstacles faced in developing accurate ML classifiers, it has been majorly realized that the
technicalities of the language involved plays a crucial role in the processing of text. Researchers
from Korea have put forth an analysis of the difficulties that come along with identifying Korean
swear words accurately. Their study, in their words ”proposes a method of discriminating
profanity using a deep learning model that can grasp the meaning and context of words after
separating Hangul into the onset, nucleus, and coda.”
Table 1
Categorical data split for train data set
            Label    Anger     Disgust      Fear   Sadness   Suprise   Happiness   Neutral
            Count    811       761          609    2190      1550      1046        3014


3. Datasets
The dataset used is adapted from Task A of EmoThreat: Emotions and Threat Detection in
Urdu, FIRE 2022. The Urdu text in question has been generated using the Nastalīq Urdu script
dataset and consists of tweets made by various users on Twitter. The labels adopted for the text
classification are based on Ekman’s six basic emotions and neutrality. These six basic emotions
apart from Neutral are: Anger, Disgust, Fear, Sadness, Surprise and Happiness. The data set
consists of 8 columns, representing the 7 labels of emotion followed by the Urdu tweet in the last
column. There is a total of 7,800 rows, each of which corresponds to an independent tweet and
its corresponding classification label - marked as ’1’ in the appropriate column if the emotion is
detected, or ’0’ otherwise. Table 1 shows the distribution of the data set.


4. Implementation and Experiments


Figure 1: Flowchart describing the workflow


4.1. Models
4.1.1. Classifier Chains
Classifier chains is a machine learning model used for problem transformation in multi-label
classification scenarios. It uses |L| (number of labels) binary classifications where each classifi-
cation is linked along a chain and deals with the binary relevance problem associated with its
corresponding label L_j ∈ L[12] Classification starts at the beginning and it propagates along
the chain, wherein each classifier learns and predicts the binary association of that particular
label.
4.1.2. MLkNN
The Multi-Label k-Nearest Neighbors (MLkNN) algorithm works similarly to the K-Nearest
Neighbors (KNN) method, which is a traditional non-parametric supervised learning approach.
MLKNN builds on the kNN approach to find the nearest examples to a test class and proceeds to
use Bayesian inference to select appropriate labels for classification. It starts by identifying the
k-nearest neighbors, followed by the identification of the corresponding labels for the instance
using the information collected from those neighbors. k-values in the range of 10 to 40 were
passed and better results were seen when the value approached 20. In our model, we have used
the default smoothing parameters, ignoring the first neighbors which had values of 1.0 and 0
respectively.

4.1.3. Simple RNN
RNN stands for Recurrent Neural Networks, which is a class of neural networks with loops in
them, allowing information to persist for a period of time. This neural architecture thus allows
inputs to be taken from previous outputs, enabling patterns to be dependent on previously
extracted patterns instead of treating each data point as an individual entity. The RNN model
employed we have built here uses a sigmoid activation layer along with a binary cross-entropy
loss function and Adam optimizer.

4.1.4. LSTM Network
Long Short-Term Memory (LSTM) is a special recurrent neural network (RNN) architecture that
was built to hold an edge over standard feed-forward neural networks and RNN because of their
trait of selectively remembering patterns for a longer duration of time. LSTMs use feedback
connections which distinguishes them from traditional feed forward neural networks. This
property enables LSTM models to process sequences of data without treating each sequence
point as an independent entity, but rather, retaining useful metadata about the previous sequence
points to help with the computation and processing of new data points. We used LSTM-based
neural network classifiers built with the Keras toolkit, along with fastText Urdu word embedding
to classify our Urdu text. We used word tokens, Embedding layer (300 dimensions), input length
(10001 words) as inputs to the LSTM (128 units) layer and sigmoid activation function for the
output layer. In this pipeline, we used categorical-cross entropy as the loss function and the
Adam optimizer for parameter optimization.

4.2. Feature Extraction
4.2.1. CountVectorizer
CountVectorizer is a tool provided by the scikit-learn library in Python, used to convert text
into a vector based on the frequency (count) of each word that occurs in the entire text. This is
used as a feature extraction method for text classification problems. Making use of frequencies,
it converts a group of text documents to a matrix of token counts. Here, the MLKNN model
uses CountVectorizer and takes max_features as 1000 with max_df value of 0.85.
4.2.2. TF-IDF
TF-IDF, short for Term Frequency-Inverse Document Frequency is also a scikit-learn library
feature provided by Python for feature extraction in text processing. It converts a collection of
raw documents to a matrix of TF-IDF features. Term frequency(TF) refers to the frequency of a
word that occurs in a document while Inverse Document Frequency(IDF) implies the number of
times the document contains the word in the corpus. TF-IDF is nothing, but the multiplication
of TF and IDF. It works similarly to CountVectorizer but gives more importance to the words
and their relevance. This enables us to remove the words that are less important for our analysis,
thereby simplifying the model building by reducing the input dimensions.

4.2.3. Word2Vec
Word2Vec is a technique to efficiently create word embedding which makes use of shallow
neural networks. The effectiveness of Word2Vec is due to its ability to gather and group together
vectors of similar words. The embedding matrix was generated using the Word2Vec model,
having a model vocabulary’s size of 85868. The embedding layer was done using Word2Vec
embedding matrix and used a vocabulary size of 5000.

4.2.4. FastText
FastText is an open-source, free library used for efficient learning of word embedding and text
classifications. It is a very fast NLP library created by Facebook’s AI Research (FAIR) lab. It
allows the training of both supervised and unsupervised representations of sentences. The
implemented model utilizes Urdu word embeddings which are represented as vectors having a
dimension of 300.


5. Results
The performance parameters of the various models tested are presented in the tables below.
Table 2 presents the model performance results shown while training and Table 3 displays the
results shown by test data. Considering the training data set, a performance accuracy of around
60% was achieved using the Classifier chains model and the simple RNN model. Overall, in the
test data set, the Classifier Chains model gave the best results, with a performance accuracy of
around 43%. This can possibly be attributed to its property of combining the computational
efficiency of the widely-known Binary Relevance method for multi-label classification, while
still being able to take the label dependencies into account for classification. It also retains the
advantages of the binary method including low memory and run-time complexity. Moreover, in
the recent decade, studies exploring the underlying theory and working behind classifier chains
have made many improvements to the training procedures, such that this method remains
among the state-of-the-art options for multi-label learning of text data. [13]
   The least performing model here is MLKNN, though there has been a visible improvement
in metrics when different feature extraction algorithms and parameter tuning were applied.
The reasons behind its underperformance appear to be inconclusive. Yet we believe that the
Table 2
Cross validation scores of the proposed system
                 Model                Feature Extraction    Accuracy    Weighted F1
                 Classifier Chains    TF-IDF                0.59        0.62
                 Simple RNN           Word2Vec              0.597       0.56
                 MLKNN                Count Vectorizer      0.55        0.60
                 MLKNN                TF-IDF                0.57        0.66
                 Lstm                 fastText              0.40        0.22


Table 3
Performance of the proposed system using test data
                  Model                Feature Extraction    Accuracy    Macro F1
                  Classifier Chains    TF-IDF                0.426       0.381
                  Simple RNN           Word2Vec              0.052       0.114
                  MLKNN                Count Vectorizer      0.354       0.371
                  MLKNN                TF-IDF                0.052       0.114
                  Lstm                 fastText              0.189       0.24


pre-processing of text and the size of training data used always play a much bigger role than
expected. MLKNN is moreover known to be computationally expensive because the algorithm
stores all of the training data and is sensitive to irrelevant features. This could potentially lead
to errors during classification, which hypothetically explains its high performance with the
training data and poor performance with the test data.


6. Conclusions
In this paper, we have presented solutions to Task A of EmoThreat: Emotions and Threat
Detection in Urdu, at FIRE 2022. In this work, 4 different ML models along with 4 feature
extraction methods were adopted and tested for identifying the seven categories of emotions
(six basic emotions along with, neutral), and thereby classify various kinds of texts or tweets.
Prediction accuracy, F1 macro and micro scores were used as evaluation metrics with the F1
score as the key parameter. Although the results presented are adequately good, we believe that
they could potentially be improved with better training and cross-validation methods. It can
also be enhanced further by exploring the linguistic features of the Urdu language for better
feature extraction. The possibility of other ML models outperforming the proposed systems
remains open, and a higher performance accuracy can be fore-looked by employing fine-tuned
parameters. For future works, transformer-based models and other deep learning approaches
can be applied with fine-tuned parameters to improve emotion detection and classification.
Acknowledgments
Thanks to the Department of CSE, SSN College of Engineering, Kalavakkam, India, for providing
us with the opportunity, awareness, and guidance for this task. Special thanks to the organizers
of FIRE 2022 for providing us with the necessary data set and for bringing together individuals
nationwide to work on this project.


References
 [1] Ethnologue, What are the top 200 most spoken languages?, Available at https://www.
     ethnologue.com/guides/ethnologue200 (2018/03/03), ????
 [2] S. Butt, M. Amjad, F. Balouchzahi, N. Ashraf, R. Sharma, G. Sidorov, A. Gelbukh, Overview
     of EmoThreat: Emotions and Threat Detection in Urdu at FIRE 2022, in: CEUR Workshop
     Proceedings, 2022.
 [3] S. Butt, M. Amjad, F. Balouchzahi, N. Ashraf, R. Sharma, G. Sidorov, A. Gelbukh, EmoTh-
     reat@FIRE2022: Shared Track on Emotions and Threat Detection in Urdu, in: Forum for
     Information Retrieval Evaluation, FIRE 2022, Association for Computing Machinery, New
     York, NY, USA, 2022.
 [4] N. Ashraf, L. Khan, S. Butt, H.-T. Chang, G. Sidorov, A. Gelbukh, Multi-label emotion
     classification of urdu tweets, PeerJ Computer Science 8 (2022) e896.
 [5] B. B. J. B. T. M. K A. Karthikraja, Aarthi Suresh Kumar, Abusive and threatening language
     detection in native urdu script tweets exploring four conventional machine learning
     techniques and mlp806-812, FIRE 2021 Working Notes (2021).
 [6] M. Amjad, N. Ashraf, A. Zhila, G. Sidorov, A. Zubiaga, A. Gelbukh, Threatening language
     detection and target identification in urdu tweets, IEEE Access 9 (2021) 128302–128313.
 [7] T. Davidson, D. Warmsley, M. Macy, I. Weber, Automated hate speech detection and the
     problem of offensive language, in: Proceedings of the international AAAI conference on
     web and social media, volume 11, 2017, pp. 512–515.
 [8] I. Ameer, N. Ashraf, G. Sidorov, H. Gómez Adorno, Multi-label emotion classification using
     content-based features in twitter, Computación y Sistemas 24 (2020) 1159–1164.
 [9] J. Read, B. Pfahringer, G. Holmes, E. Frank, Classifier chains for multi-label classification,
     in: Joint European conference on machine learning and knowledge discovery in databases,
     Springer, 2009, pp. 254–269.
[10] L. Khan, A. Amjad, N. Ashraf, H.-T. Chang, A. Gelbukh, Urdu sentiment analysis with
     deep learning methods, IEEE Access 9 (2021) 97803–97812.
[11] L. Khan, A. Amjad, N. Ashraf, H.-T. Chang, Multi-class sentiment analysis of urdu text
     using multilingual bert, Scientific Reports 12 (2022) 1–17.
[12] J. Read, B. Pfahringer, G. Holmes, E. Frank, Classifier chains for multi-label classification,
     Machine learning 85 (2011) 333–359.
[13] J. Read, B. Pfahringer, G. Holmes, E. Frank, Classifier chains: a review and perspectives,
     Journal of Artificial Intelligence Research 70 (2021) 683–718.

</pre>