<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the HASOC-DravidianCodeMix Shared Task on Ofensive Language Detection in Tamil and Malayalam</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bharathi Raja Chakravarthi</string-name>
          <email>bharathi.raja@insight-centre.org</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prasanna Kumar Kumaresan</string-name>
          <email>prasanna.mi20@iiitmk.ac.in</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ratnasingam Sakuntharaj</string-name>
          <email>sakuntharaj@esn.ac.lk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anand Kumar Madasamy</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sajeetha Thavareesan</string-name>
          <email>sajeethas@esn.ac.lk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>B Premjith</string-name>
          <email>b_premjith@cb.amrita.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>K Sreelakshmi</string-name>
          <email>k_sreelakshmi@cb.students.amrita.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Subalalitha Chinnaudayar Navaneethakrishnan</string-name>
          <email>subalalitha@gmail.com</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John P. McCrae</string-name>
          <email>john.mccrae@insight-centre.org</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Mandlg</string-name>
          <email>mandl@uni-hildesheim.de</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Computational Engineering and Networking (CEN), Amrita School of Engineering</institution>
          ,
          <addr-line>Coimbatore, Amrita Vishwa Vidyapeetham</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Eastern University</institution>
          ,
          <country country="LK">Sri Lanka</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Indian Institute of Information Technology and Management-Kerala</institution>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Insight Centre for Data Analytics, National University of Ireland</institution>
          ,
          <addr-line>Galway</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>National Institute of Technology Karnataka Surathkal</institution>
          ,
          <addr-line>Karnataka</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>SRM Institute of Science and Technology</institution>
          ,
          <addr-line>Chennai, Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the results of HASOC-Dravidian-CodeMix shared task1 held at FIRE 2021, a track on ofensive language identification for Dravidian languages in Code-Mixed Text in this paper. This paper will detail the task, its organisation, and the submitted systems. The identification of ofensive language was viewed as a classification task. For this, 16 teams participated in identifying ofensive language from Tamil-English code mixed data, 11 teams for Malayalam-English code mixed data and 14 teams for Tamil data. The teams detected ofensive language using various machine learning and deep learning classification models. This paper has analysed those benchmark systems to find out how well they accommodate a code-mixed scenario in Dravidian languages, focusing on Tamil and Malayalam.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sentiment analysis</kwd>
        <kwd>Dravidian languages</kwd>
        <kwd>Tamil</kwd>
        <kwd>Malayalam</kwd>
        <kwd>Kannada</kwd>
        <kwd>Code-mixing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Advancements in technology have aimed to ease peoples’ lives and have attracted many users
towards digitization, particularly younger generations [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. As a result, the number of people
Ǳ
using social media to express their opinions and beliefs has increased dramatically [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
However, the lack of regulation gives individuals the freedom to post ofensive content. There is
also no mechanism to regulate the posting of hateful content in under-resourced languages
[
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ].
      </p>
      <p>
        Tamil is a Dravidian language spoken primarily in Sri Lanka, India, Malaysia, and Singapore
[
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ]. It is an agglutinative language with a rich morphological structure [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Tamil has 247
letters comprising of 12 vowels, 18 consonants, 216 composite letters combining each
consonant with each vowel, and one special letter known as ”Ayutha eluththu”. Malayalam is also a
Dravidian language spoken in Kerala, India [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ]. Malayalam also has its own script for
writing; however, social media users use Latin script or mix languages when commenting or
posting online [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ].
      </p>
      <p>
        The HASOC-DravidianCodeMix shared task 2021 aims to provide a new gold standard
corpus for ofensive language identification of code-mixed text in Dravidian languages
(TamilEnglish and Malayalam-English). Code-mixed content online results from people mixing
multiple languages, especially their native language and another commonly spoken language while
expressing their views [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Ofensive language often comprises of hate speech, such as racism,
ageism, homophobia, transphobia, ableism and any hate-promoting content against an
individual or group [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. It has been an active area of research in both academia and industry for the
past two decades [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. There is an increasing demand for the identification of ofensive
language in code-mixed social media texts [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>There were 16 teams involved in identifying ofensive language from Tamil-English code
mixed data, 11 teams in identifying ofensive language from Malayalam-English code mixed
data, and 14 teams in identifying ofensive language in Tamil data. The teams used a variety of
machine learning and deep learning classification models to identify ofensive language. The
purpose of this study is to examine such benchmark systems in order to determine how well
they fit a code-mixed scenario in Dravidian languages, with a particular emphasis on Tamil
and Malayalam.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task Description</title>
      <p>The task aims to identify ofensive language content of the code-mixed comments/posts in
Dravidian Languages (Tamil, Tamil-English and Malayalam-English) collected from social
media. The comment/post may contain more than one sentence, but the average sentence length
in the corpora is one. Each comment/post is annotated at the comment/post level. This dataset
also exhibits class imbalance problems that mirrors real-world scenarios.</p>
      <p>• Task 1</p>
      <p>Task 1 focuses on ofensive language identification from Tamil text. Task 1 is a
coarsegrained binary classification where each participating system has to classify YouTube
comments in Tamil into two classes: Ofensive and Not-ofensive.</p>
      <p>– Not-Ofensive – The comments does not contain ofensive language. Example:</p>
      <sec id="sec-2-1">
        <title>Text: ேபரைவ சார்பாக படம் ெவற்ற ெபற வாழ்துககள்</title>
        <p>Task
Task 1: Tamil
Task 2: Tamil
Task 3: Malayalam
– Ofensive - The comments contain hate, ofensive or profane content.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Text: ேபாடா ெவங்காயம் ஒனன்யலாம் அடுச் ெகாளள்மு்</title>
        <p>ெவைண்ண .</p>
        <p>Translation: You onion we should beat you to death butter – butter and onion
are ofensive words in Tamil.
• Task 2</p>
        <p>Task 2 focus on ofensive language identification in code-mixed Malayalam-English and
Tamil-English comments. Example: Code-mixed Tamil
– Not-Ofensive – The comments does not contain ofensive language.</p>
        <sec id="sec-2-2-1">
          <title>Text: iantha padam rumba nalla iruku</title>
          <p>Translation of codemixed Tamil: This movie is very good
– Ofensive – The comments does not contain ofensive language.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Text: i ammaye bhegikku</title>
          <p>Translation of codemixed Malayalam: f..k this mother f..kers
2.1. Dataset description
The datasets for both Task 1 and Task 2 were prepared by collecting comments from YouTube.
Table 1 shows the number of comments in each dataset.
2.1.1. Task 1: Tamil Dataset
We collected data from YouTube comments for Task 1 using the YouTube comment scrapper 1
to download the comments from particular videos. The comments were collected from movie
trailers. We removed all the comments which were not in Tamil. These comments were then
used to create a dataset for the ofensive language classification task. This dataset contains a
total of 6,534 comments and is split into train and test. The training dataset consists of 5,880
comments and the test dataset consists of 654 comments.</p>
          <p>1https://pypi.org/project/youtube-comment-scraper-python/</p>
          <p>
            No. TeamName
1 SSN_NLP
2 MUCIC [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]
3 SSN_NLP_MLRG [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ]
4 IRLab [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]
5 BITS Pilani [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ]
6 AIML [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ]
7 Pegasus [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ]
8 KonguCSE
9 Jusgowithurs
10 Gothainayaki.A
11 MUM
12 SSNCSE_NLP [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ]
13 AI_ML NIT Patna
14 Saahil Raj
2.1.2. Tamil and Malayalam Dataset
Task 2 data was also taken from YouTube comments and posts. These comments were used to
create a dataset for the ofensive language classification in both languages. The dataset includes
diferent types of code-mixing, such as mixing Tamil and Latin characters for the Tamil dataset,
code mixed data for the Malayalam dataset, and mixing at the word level. The Tamil dataset
contains a total of 5,941 comments from this split into training, development and test. The
training dataset consists of 4,000 comments, the development dataset contains 940 comments,
and the test dataset consists of 1,001 comments. The Malayalam dataset contains a total of 5,951
comments from this split into training, development and test. The training dataset consists
of 4,000 comments, the dev dataset contains 951 comments, and the test dataset consists of
1,000 comments. These datasets also are published in the same competition, HASOC-Dravidian
CodeMixed, which is on Codalab.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>We have received fourteen, sixteen and eleven submissions for Task 1: Tamil track, Task 2:
Tamil track and Task 2: Malayalam track, respectively. The submissions were evaluated based
on weighted average F1-score, and rank lists were prepared accordingly. Table 2 shows the rank
list of teams that participated in Task 1: Tamil track. Tables 3 and 4 show the rank lists of the
teams that competed in Task 2: Tamil track and Task 2: Malayalam track, respectively. Tables 2,
3 and 4 show the precision, recall and weighted average F1-score of all the participating teams
on test data. In this section, we briefly describe the methodologies of teams that participated
in the three tasks.</p>
      <p>
        • SSN_NLP_MLRG [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]: Team SSN_NLP_MLRG participated in the Tamil-English
subNo. TeamName
1 MUCIC [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
2 AIML [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]
3 SSN_IT_NLP [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]
4 ZYBank AI
5 IRLab [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
6 HSU [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]
7 IIITSurat [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]
8 Team Pegasus [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
9 PSG [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]
10 SSNCSE_NLP [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]
11 IIITD-shanker [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]
12 CEN_NLP
13 RameshKannan
14 MUM
15 AI_ML_NIT_Patna
16 JBTTM
No. TeamName
1 AIML [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]
2 MUCIC [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
3 HSU [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]
4 IIIT Surat [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]
5 IRLab [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
6 IIITD-ShankarB [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]
7 SSNCSE_NLP [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]
8 Pegasus [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
9 CEN_NLP
10 MUM
11 JBTTM
task. The authors implemented both traditional machine learning and deep learning
models for the classification. They experimented with Support Vector Machine (SVM)
[
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], naive bayes, random forest and extreme gradient boosting ensemble classifiers
for categorizing the ofensive contents with N-gram, character and word level Term
Frequency-Inverse Document Frequency (TF-IDF) and Bag-of-Words (BoW) features.
The deep learning models used for the classification includes a shallow Neural Network
(NN), a Long Short Term Memory (LSTM) [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] and a Convolutional Neural Network
(CNN). The embeddings in the NN were initialized using the fastText [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] pre-trained
word embeddings. The authors also followed a transfer learning approach by
multilingual Bidirectional Encoder Representation (mBERT) [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ], ALBERT [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] (A Lite BERT for
self-supervised learning of language representations), DistilBERT [37] (Distilled version
of BERT[
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]) with the ktrain, and ULMFiT [38] with Fastai [39] to build the classification
model.
• HSU_TransEmb [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]: Team HSU_TransEmb used a Transformer ensemble system to
identify the ofensive contents from Tamil-English and Malayalam-English code-mixed
data. The ensemble system consists of mBERT, DistilBERT and MuRIL models [40]. The
preprocessed data were fed to the three ensemble BERT models, and the class
probabilities were computed. The class label was identified from the sum of the class probabilities
obtained from the BERT models.
• MUCIC [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]: Team MUCIC took part in both Tamil-English and Malayalam-English
shared tasks. They used word-level as well as character-level N-gram based TF-IDF for
extracting the features from the texts. Furthermore, they identified 40,000 frequent
features in each case and constructed a combined set containing 80,000 frequent features.
They employed linear SVM, random forest, logistic regression and an ensemble of these
three classifiers to train the model. The logistic regression model obtained the highest
F1-score of 0.881 in the Tamil-English task, whereas random forest exhibited the best
performance with an F1-score of 0.783.
• IIITSurat [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]: Team IIITSurat took part in both shared tasks and employed machine
learning and deep learning models for classification. Machine learning classifiers such
as logistic regression, random forest, naive bayes, XG boost, and SVM were trained over
TF-IDF features. In addition to machine learning models, the authors executed Deep
Neural Network (DNN), CNN, BiLSTM and Transformer-based models such as BERT
[
        <xref ref-type="bibr" rid="ref35">35</xref>
        ], Indic BERT [41] and MuRIL [40] for classification. Among all the models, MuRIL
achieved the highest F1-scores of 0.78 and 0.91 in Malayalam-English and Tamil-English
tasks, respectively.
• Pegasus [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]: Team Pegasus submitted their results in Task 1 and Task 2. They utilized
XLM-RoBERTa [42] and DistilBERT models for identifying ofensive language social
media text. As mentioned earlier, the authors deployed the embedding generated using the
BERT and fed it into a BiLSTM network. In Task 1, Team Pegasus to avoid repetition of
the authors concatenated the embeddings obtained from both BERT models and passed
them to a BiLSTM network. This model attained an F1-score of 0.810. The authors
performed transliteration and translation on Task 2 data and applied the XLM-RoBERTa
model to extract the embedding, which obtained F1-scores of 0.612 and 0.670 in
TamilEnglish and Malayalam-English tasks, respectively.
• IRLab [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]: Team IRLab implemented a Deep Neural Network (DNN) with TF-IDF
features for Tasks 1 and 2. The authors extracted unigram to six-gram TF-IDF features and
identified the first 30,000 features. A DNN with four dense layers read these features and
predicted the class label for each data. They also performed hyperparameter tuning for
each model to fix the best model. Their model achieved F1-scores of 0.84, 0.65 and 0.71
in Task 1, Tamil-English, and Malayalam-English shared tasks.
• AIML [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]: Team AIML proposed an ensemble model which used character
N-grambased TF-IDF features for the identification of ofensive texts. The authors considered
one to six character N-gram features and trained an ensemble of SVM, logistic
regression and random forest. Their model attained an F1-score of 0.83 in Task 2, whereas
it achieved F1-scores of 0.67 and 0.77 in Tamil-English and Malayalam-English tasks,
respectively.
• SSN_IT_NLP [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]: Team SSN_IT_NLP presents an ofensive language identification model
for Tamil-English data. The mBERT generates embeddings from the data, which are then
fed to an ensemble of SVM, XG Boost and Linear Discriminant Analysis (LDA). The label
predicted by the majority of the models was selected as the final output.
• NLP_CSE: Team NLP_CSE employed machine learning and deep learning models for
predicting the ofensive data. A logistic regression classifier takes TF-IDF features for
training the model. Furthermore, the authors used random oversampling algorithms
to deal with the class imbalance problem in the data. The model obtained an F1-score
of 0.5243. In addition to the logistic regression model, the authors implemented an
LSTM-based encoder-decoder architecture and a transformer-based model. The
encoderdecoder model was a deep multi-layer network that also incorporated an attention
mechanism. This model consisted of stacks of four encoders and four decoders. The
transformer model, mBERT, was used to generate the embedding for sentences and considered
the cosine similarity between sentences for classification.
• BITS_Pilani [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]: Team BITS_Pilani used a DNN which contain an embedding layer,
pooling layer, dropout layer, a fully connected layer and an output layer for
classifying the text into Ofensive and Not ofensive in the Tamil-English subtask. The model
achieved an F1-score of 0.835 in the competition.
• M Subramanian et al.: Team M Subramanian et al. employed the naive bayes multinomial
model, KNN, logistics regression, and SVM classifier with BoW features for classifying
the social media text into ofensive or not ofensive categories. This team participated in
the shared task for only Tamil data. The Logistic regression model attained the highest
performance among the classifiers.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>The distribution of the ofensive languages classes are imbalanced in both datasets. This takes
into account the varying degrees of importance of each class in the dataset. We used a
classiifcation report tool from Scikit learn 2.</p>
      <p>Precision =</p>
      <p>+  
(1)
2https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
Recall =</p>
      <p>+  
F-Score = 2 ∗</p>
      <p>Precision ∗ Recall</p>
      <p>Precision + Recall
 weighted = ∑( of  × Weight of )
 weighted = ∑( of  × Weight of )


=1

=1
=1
 −  
weighted = ∑( −  
of  × Weight of )</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>Shared tasks on ofensive language detection in CodeMix Tamil and Malayalam data were
organized as part of HASOC 2021. Fourteen submissions for Track 1: Tamil and sixteen submissions
in Track 2. For Malayalam, eleven teams submitted their results in Track 2. Table 5 shows the
number of teams participated in each shared task. Participating teams explored N-gram based
TF-IDF, BoW and diferent variants of BERT for representing the input text. None of the teams
used language specific features. They used various conventional machine learning classifiers
such as SVM, naive bayes, random forest. logistic regression, XG boost, KNN and ensemble of
machine learning classifier models for the identification of the ofensive language text. In
addition to that, DNN, LSTM and its variants and transformer-based classifiers were also studied
for the classification. Team HSU_TransEmb explored an ensemble of mBERT, DistilBERT and
MuRIL for detecting ofensive texts from CodeMix Tamil and Malayalam data. NLP_CSE
investigated the performance of oversampling algorithms to address the class imbalance problem in
the data. Tables 1, 2 and 3 show the rank lists for Task 1: Tamil track, Task 2: Malayalam track
and Task 2: Tamil track, respectively. Figures 1, 2 and 3 show precision, recall and F1-scores
of submissions in Track 1: Tamil, Track 2: Tamil and Track 2: Malayalam. Figure 4 shows the
box-plots of the performance of the teams participated in Track 1: Tamil, Track 2: Tamil and
Track 2: Malayalam.</p>
      <p>Team SSN_NLP obtained the first rank in Track 1 with an F1-score of 0.859. MUCIC and
SSN_NLP_MLRG grabbed second and third positions with F1-scores of 0.852 and 0.844. Among
the 14 teams, seven scored F1-scores greater than 0.8. Looking at the models used by the teams,
one can see that the teams that finished top used diferent kinds of feature extraction models
and classifiers.</p>
      <p>Team MUCIC attained the first position in Track 2: Tamil shared task, and they achieved
an F1-score of 0.678. MUCIC used word level as well as character level N-gram based
TFIDF features for classification. They performed the predictions using SVM, random forest,
logistic regression, and an ensemble of these three. The second-placed team, AIML, and the
(2)
(3)
(4)
(5)
(6)
Competition
All three tasks
Track 1: Tamil
Both tasks in Track 2
Track 2: Tamil alone
Track 2: Malayalam alone
Track 1: Tamil and Track 2: Tamil
third-placed team, SSN_IT_NLP, scored F1-scores of 0.670 and 0.668, respectively. AIML also
utilized the N-gram based TF-IDF features with SVM, logistic regression and random forest.
They considered unigram to six-gram features for this analysis. SSN_IT_NLP made use of
mBERT embeddings with SVM, XG boost and LDA to identify the ofensive language texts
among the data. Among the 16 teams that participated, ten teams recorded F1-scores greater
than 0.6.</p>
      <p>In Track 2: Malayalam, AIML reached the top position with an F1-score of 0.766. MUCIC and
HSU were placed in the second and third positions with F1-scores of 0.762 and 0.735,
respectively. AIML used unigram to six-gram based TF-IDF features with SVM, logistic regression
and random forest classifiers for the identification of ofensive language texts. MUCIC also
followed a similar methodology, but they used only the most frequent forty thousand n-gram
based TF-IDF features from each class for classification. Team HSU utilized an ensemble of
mBERT, DistilBERT and MuRIL for the detection of ofensive language contents. In this task,
6 out of 11 teams obtained an F1-score greater than 0.7, and one team scored an F1-score less
than 0.6.</p>
      <p>It is interesting to note that teams that used TF-IDF features attained the top position in
both tasks in Track 2. A similar trend was visible in HASOC 2020 [43]. The teams that won
the HASOC 2020 shared tasks in CodeMix data used TF-IDF features with machine learning
classifiers.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This paper gives an overview of the HASOC- Dravidian-CodeMix shared task at FIRE 2021.
The shared task consisted of three subtasks for Tamil, CodeMix Tamil and Malayalam
languages. There were 16 teams who participated in Tamil-English code mixed data, 11 teams in
Malayalam-English code mixed data and 14 teams in Tamil data. Teams used methods
ranging from Bag of Words, TF-IDF to BERT-based models to represent the data and applied
conventional machine learning algorithms, deep neural networks and transformer networks for
prediction. One team employed oversampling algorithms to deal with the imbalance in the
data by synthetically generating the data points in minority classes. The analysis of the
methods of the teams showed that both conventional and deep learning/transformer-based methods
exhibit similar performances in terms of the evaluation metrics used for assessing the models.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This publication is the outcome of the research supported in part by a research grant from
Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289_P2 (Insight_2), and Irish
Research Council grant IRCLA/2017/129 (CARDAMOM-Comparative Deep Models of
Language for Minority and Historical Languages). We also thank Ciara Oloughlin for her help
with proof reading.
(2019).
[37] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller,
faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019).
[38] J. Howard, S. Ruder, Universal language model fine-tuning for text classification, in:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers), Association for Computational Linguistics, Melbourne,
Australia, 2018, pp. 328–339. URL: https://aclanthology.org/P18-1031. doi:1 0 . 1 8 6 5 3 / v 1 /
P 1 8 - 1 0 3 1 .
[39] J. Howard, S. Gugger, Fastai: a layered api for deep learning, Information 11 (2020) 108.
[40] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. K. Margam, P.
Aggarwal, R. T. Nagipogu, S. Dave, et al., Muril: Multilingual representations for indian
languages, arXiv preprint arXiv:2103.10730 (2021).
[41] D. Kakwani, A. Kunchukuttan, S. Golla, N. Gokul, A. Bhattacharyya, M. M. Khapra, P.
Kumar, inlpsuite: Monolingual corpora, evaluation benchmarks and pre-trained
multilingual language models for indian languages, in: Proceedings of the 2020 Conference on
Empirical Methods in Natural Language Processing: Findings, 2020, pp. 4948–4961.
[42] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave,
M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning
at scale, arXiv preprint arXiv:1911.02116 (2019).
[43] B. R. Chakravarthi, A. K. M, J. P. McCrae, B. Premjith, K. Soman, T. Mandl, Overview
of the track on HASOC-Ofensive Language Identification-DravidianCodeMix., in: FIRE
(Working Notes), 2020, pp. 112–120.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>HopeEDI: A multilingual hope speech detection dataset for equality, diversity, and inclusion</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Computational Modeling of People's Opinions</source>
          , Personality, and
          <article-title>Emotion's in Social Media, Association for Computational Linguistics</article-title>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>53</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .peoples-
          <volume>1</volume>
          .5.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          ,
          <article-title>Findings of the shared task on hope speech detection for equality, diversity, and inclusion</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion</source>
          , Association for Computational Linguistics, Kyiv,
          <year>2021</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>72</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .ltedi-
          <volume>1</volume>
          .8.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arcan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Little</surname>
          </string-name>
          , P. Buitelaar,
          <article-title>TrollsWithOpinion: A Dataset for Predicting Domain-specific Opinion Manipulation in Troll Memes</article-title>
          ,
          <source>arXiv preprint arXiv:2109.03571</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Andrew</surname>
          </string-name>
          , JudithJeyafreedaAndrew@DravidianLangTech-EACL2021:
          <article-title>ofensive language detection for Dravidian code-mixed YouTube comments</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</source>
          , Kyiv,
          <year>2021</year>
          , pp.
          <fpage>169</fpage>
          -
          <lpage>174</lpage>
          . URL: https://aclanthology. org/
          <year>2021</year>
          .dravidianlangtech-
          <volume>1</volume>
          .
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bharathi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. S. A</surname>
          </string-name>
          , SSNCSE_NLP@
          <article-title>DravidianLangTech-EACL2021: Ofensive language identification on multilingual code mixing text</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</source>
          , Kyiv,
          <year>2021</year>
          , pp.
          <fpage>313</fpage>
          -
          <lpage>318</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . dravidianlangtech-
          <volume>1</volume>
          .
          <fpage>45</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sampath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thenmozhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thangasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nallathambi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments</article-title>
          , arXiv preprint arXiv:
          <volume>2109</volume>
          .00227 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sakuntharaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mahesan</surname>
          </string-name>
          ,
          <article-title>A novel hybrid approach to detect and correct spelling in Tamil text</article-title>
          , in: 2016 IEEE International Conference on Information and
          <article-title>Automation for Sustainability (ICIAfS)</article-title>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sakuntharaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mahesan</surname>
          </string-name>
          ,
          <article-title>Use of a novel hash-table for speeding-up suggestions for misspelt Tamil words</article-title>
          ,
          <source>in: 2017 IEEE International Conference on Industrial and Information Systems (ICIIS)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sakuntharaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mahesan</surname>
          </string-name>
          ,
          <article-title>Detecting and correcting real-word errors in Tamil sentences</article-title>
          ,
          <source>Ruhuna Journal of Science</source>
          <volume>9</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Nuhman</surname>
          </string-name>
          , Basic Tamil Grammar, Readers Association, Kalmunai, Department of Tamil, University of Peradeniya,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mahesan</surname>
          </string-name>
          ,
          <article-title>Word embedding-based Part of Speech tagging in Tamil texts</article-title>
          ,
          <source>in: 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>478</fpage>
          -
          <lpage>482</lpage>
          .
          <source>doi:1 0 . 1 1 0 9 / I C I I S 5 1</source>
          <volume>1 4 0 . 2 0 2 0 . 9 3 4 2 6 4 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mahesan</surname>
          </string-name>
          ,
          <article-title>Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts</article-title>
          , in: 2020 Moratuwa Engineering Research Conference (MERCon),
          <year>2020</year>
          , pp.
          <fpage>272</fpage>
          -
          <lpage>276</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ M E R C o</surname>
          </string-name>
          <article-title>n 5</article-title>
          <volume>0 0 8 4 . 2 0 2 0 . 9 1 8 5 3 6 9 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mahesan</surname>
          </string-name>
          ,
          <source>Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation, in: 2019 14th Conference on Industrial and Information Systems (ICIIS)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>320</fpage>
          -
          <lpage>325</lpage>
          .
          <source>doi:1 0 . 1 1 0 9 / I C I I S 4 7</source>
          <volume>3 4 6 . 2 0 1 9 . 9 0 6 3 3 4 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          , N. Jose,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>DravidianCodeMix: Sentiment Analysis and Ofensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text</article-title>
          ,
          <source>arXiv preprint arXiv:2106.09460</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Soman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Thamburaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          , et al.,
          <article-title>DravidianMultiModality: A Dataset for Multi-modal Sentiment Analysis in Tamil and Malayalam</article-title>
          , arXiv preprint arXiv:
          <volume>2106</volume>
          .04853 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>A survey of current datasets for code-switching research</article-title>
          ,
          <source>in: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>136</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Farra</surname>
          </string-name>
          , R. Kumar, SemEval
          <article-title>-2019 task 6: Identifying and categorizing ofensive language in social media (OfensEval)</article-title>
          ,
          <source>in: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Minneapolis, Minnesota, USA,
          <year>2019</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>86</lpage>
          . URL: https://aclanthology.org/S19-2010.
          <article-title>doi:1 0 . 1 8 6 5 3 / v 1 / S 1 9 - 2 0 1 0</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jose</surname>
          </string-name>
          , E. Sherly,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Overview of the track on sentiment analysis for dravidian languages in code-mixed text, Forum for Information Retrieval Evaluation (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Farra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Predicting the type and target of ofensive posts in social media, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics</article-title>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>1415</fpage>
          -
          <lpage>1420</lpage>
          . URL: https: //aclanthology.org/N19-1144.
          <source>doi:1 0 . 1 8</source>
          <volume>6 5 3</volume>
          / v 1 / N 1 9
          <article-title>- 1 1 4 4</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>F.</given-names>
            <surname>Balouchzahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bashang</surname>
          </string-name>
          , G. Sidorov,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Shashirekha</surname>
          </string-name>
          , CoMaTa OLI-
          <article-title>Code-mixed Malayalam and Tamil Ofensive Language Identification</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalaivani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thenmozhi</surname>
          </string-name>
          , SSN_NLP_MLRG@
          <string-name>
            <surname>Dravidian-CodeMix-FIRE2020</surname>
          </string-name>
          :
          <article-title>Sentiment Code-Mixed Text Classification in Tamil and Malayalam using ULMFiT</article-title>
          ,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Saroj</surname>
          </string-name>
          , S. Pal, IRLab@IIT-BHU@
          <article-title>Dravidian-CodeMix-FIRE2020: Sentiment Analysis on Multilingual Code Mixing Text Using BERT-BASE</article-title>
          ,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tripathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Ofensive Language Classification of Code-Mixed Tamil with Keras</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kumari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <source>Ofensive Language Identification on Multilingual Code Mixing Text</source>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kalyan Jada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yasaswini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Puranik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sampath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thangasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Pal Thamburaj</surname>
          </string-name>
          ,
          <article-title>Analyzing Social Media Content for Detection of Ofensive Text</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>N. N. Appiah</given-names>
            <surname>Balaji</surname>
          </string-name>
          , B. B,
          <string-name>
            <surname>B. J</surname>
          </string-name>
          ,
          <string-name>
            <given-names>SSNCSE</given-names>
            _NLP@
            <surname>Dravidian-CodeMix-FIRE2020</surname>
          </string-name>
          :
          <article-title>Sentiment Analysis for Dravidian Languages in Code-Mixed Text</article-title>
          ,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Divya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sripriya</surname>
          </string-name>
          , Ofensive Content Recognition, in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>S. N. V. C.</given-names>
            <surname>Basava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Karri</surname>
          </string-name>
          ,
          <article-title>Transformer Ensemble System for Detection of Ofensive Content in Dravidian Languages</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Ofensive Language Identification on Multilingual Code Mixed Text using BERT</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>S.</given-names>
            <surname>Benhur</surname>
          </string-name>
          <string-name>
            <surname>J</surname>
          </string-name>
          , K. S,
          <article-title>Pretrained Transformers for Ofensive Language Identification in Tanglish</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Biradar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saumya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chauhan</surname>
          </string-name>
          ,
          <article-title>mBERT based model for identification of ofensive content in south Indian languages</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          ,
          <article-title>Support-vector networks</article-title>
          ,
          <source>Machine learning 20</source>
          (
          <year>1995</year>
          )
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>Long short-term memory</article-title>
          ,
          <source>Neural computation 9</source>
          (
          <year>1997</year>
          )
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Bag of tricks for eficient text classification</article-title>
          ,
          <source>in: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>2</volume>
          ,
          <string-name>
            <surname>Short</surname>
            <given-names>Papers</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>427</fpage>
          -
          <lpage>431</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/ N19-1423.
          <source>doi:1 0 . 1 8</source>
          <volume>6 5 3</volume>
          / v 1 / N 1 9
          <article-title>- 1 4 2 3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , R. Soricut,
          <string-name>
            <surname>Albert:</surname>
          </string-name>
          <article-title>A lite bert for self-supervised learning of language representations</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .11942
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>