<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transformer based Sentiment Analysis in Dravidian Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pawan Kalyan Jada</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D Sashidhar Reddy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konthala Yasaswini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arunaggiri Pandian K</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prabakaran Chandran</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anbukkarasi Sampath</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sathiyaraj Thangasamy</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Information Technology Tiruchirappalli</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kongu Engineering College</institution>
          ,
          <addr-line>Erode, Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Mu Sigma Inc.</institution>
          ,
          <addr-line>Bengaluru, Karnataka</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Sri Krishna Adithya College of Arts and Science</institution>
          ,
          <addr-line>Coimbatore, Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The development of social media platforms have enabled users to express their thoughts and opinions about entities freely, without any inadvertent implications it may have on a person/group. Due to the volume of active social media users, it is becoming increasingly apparent for the need of automated sentiment analysis systems for social media. This paper describes our work on the task of Sentiment Analysis in Dravidian language-DravidianCodeMix 2021. We propose a soft voting classifier with the help of other fine-tuned multilingual language models, achieving the best weighted F1-Score of 0.752, 0.619, and 0.648 in Malayalam, Tamil, and Kannada respectively. Our approach achieved the best results in Tamil, securing 3 rank in the language. The source codes of our systems are published1.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sentiment Analysis</kwd>
        <kwd>Dravidian languages</kwd>
        <kwd>Transfer learning</kwd>
        <kwd>Transformers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Social media is a powerful and robust tool that has led an inherent impact on the users [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Over
the years, the internet has gained more and more users jumping from 738M in the year 2000 all
the way up to 3.6B in 2020. It provided a medium for people of diferent age, background,
ethnicity to interact accounting for more cultural exchanges as well as exposure to new ideologies
from diferent users. It also enabled us to access information across the planet, to socialize and
stay up-to-date with the latest technologies and to share our ideas and thoughts to the world.
With millions of active users, content generated on social media is dificult to be moderated
by human beings. Analysing the opinions expressed by the users is important to identify the
areas of disagreements and diferences among the users [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Users can openly share their ideas on social media sites such as YouTube, Facebook,
Instagram, and Twitter. Certain people’s perspectives can be harmful to a specific community,
gender, religion, or race. These unpleasant posts/comments might be detrimental to one’s
mental health. Sentiment analysis is the technique of categorising a statement based on its
polarity. Sentiment analysis aids in evaluating consumer satisfaction with the products and
services that many businesses give [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], as well as understanding public opinion, which may
aid in making better decisions in the future. Indefinitely, it became a prominent subject of
study in the Natural Language Processing research field. Because of the huge quantity of data
created on a daily basis, the study into evaluating the sentiment on social media postings has
grown exponentially.
      </p>
      <p>
        The majority of data found on social media is frequently code-mixed. The combination of
two or more languages in a phrase is known as code-mixing [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ]. Because of variations in
syntax, vocabulary, and meaning, code-mixed writing is far more dificult to read than standard
language. As a result, achieving good results in activities such as Sentiment Analysis, Named
Entity Recognition, POS Tagging, and so on becomes extremely dificult.
      </p>
      <p>
        Tamil evolved from the Proto-Dravidian language, which is estimated to have existed prior
to 500 BC. Tamil is the oficial language of the Indian state of Tamil Nadu, as well as Singapore
and Sri Lanka [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. There are around 77 million Tamil speakers worldwide. The Tamil-Brahmi
script was the parent script from which the subsequent Vatteluttu and Tamil scripts evolved.
It consists of 12 vowels, 18 consonants, and 1 aytam (voiceless velar fricative).
      </p>
      <p>
        Kannada and Malayalam are two more Dravidian languages that are widely spoken in
Karnataka and Kerala, respectively. Kannada can alternatively be spelled Kanarese or Kannana.
Kannada is spoken by about 40 million people and is recognised as a classical language. The
earliest Kannada inscription comes from around 450 CE. Kannada literature was influenced
by the Lingayat and Haridasa movements and began with Kavirajamarga and Pampa Bharata.
Ramacharitan is the oldest surviving literacy text in Malayalam. Malayalam contains 15
vowels, 36 consonants, and a variety of additional symbols. The Valleluttu script is incorporated
in the contemporary Malayalam. These Dravidian languages generate a massive volume of
code-mixed data [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>The rest of the paper is organised as follows, Section 2 comprises of the related work in
sentiment analysis. Section 3 entails the dataset used and task descriptions, while Section 4
provides a detailed description of the architecture used for this task. Section 5 discusses about
the results of our models in the shared task, and finally, Section 6 concludes our work and talks
about potential directions for future works.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Sentiment analysis is essential in introspection [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The availability of code-mixed data from
social media was critical to extracting data for sentiment analysis [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The topic of code-mixing
in Dravidian languages is explored in [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. Sentiment analysis tasks were completed in the
late 1990s by classifying text or phrases [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Finn Arup Nielsen, Opinion Finder, and General
Inquirer produced a new word list in order to provide a score to each term [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The sentiment
of a sentence is determined by the individual score of each word in the sentence. Two typical
ways to solving a sentiment analysis problem are machine learning approaches and
lexiconbased approaches [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Opinion lexicon is employed in the Lexicon-based technique to identify
sentence polarity [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ]. Naive Bayes is one approach for dealing with sentiment analysis.
In previous years, N-grams were proposed to extract sentiments [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. These approaches were
inefective due to the dynamic nature of data. Many studies have been conducted in recent
years to integrate deep learning and machine learning approaches for efective sentiment
categorisation.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], the authors have developed a model using Conditional Random Field for
part-ofspeech tagging on mixed script social media text which contained two or three languages,
among which English is one language and the others are Hindi, Bengali and Tamil.
      </p>
      <p>
        Several types of multilingual and cross-lingual embeddings were employed in order to
eficiently transfer knowledge from monolingual text to code-mixed language for sentiment
analysis of code-mixed text in. These embeddings have shown to improve the performance of
sentiment analysis on code-mixed text. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] presented the Sentiment Analysis of Code-Mixed Text
(SACMT) model, which consists of twin Bidirectional LSTM networks for sentiment analysis of
code-mixed text and address the problem by projecting the sentences onto a single sentiment
space using shared parameters. This method outperforms the state-of-the-art sentiment
analysis methods on code-mixed data. For the sentiment analysis of Dravidian code-mixed dataset,
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] presented meta embeddings with the transformer and GRU model. When sarcasm is
employed in negative polarity remarks, the system is unable to determine the sentiment. [
        <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
        ]
employed a hybrid model of bidirectional LSTM and CNN architectures which extracts
character features from each word. Crowdsourcing approaches were utilised in [
        <xref ref-type="bibr" rid="ref24 ref25">24, 25</xref>
        ] to manually
rate polarity in twitter posts. To classify a sentence into one of the sentiment classes, a
parallel ensemble of two models - a traditional machine learning model and an end-to-end deep
learning model was employed in [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
      </p>
      <p>
        The authors of [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] provides a unique technique to detecting sentiment polarity in Twitter
messages by extracting a vector of weighted nodes from the WordNet graph which presents a
domain-independent non-supervised solution. An end-to-end Cross-lingual sentiment
analysis (CSLA) model that eliminates the requirement for unsupervised cross-lingual word
embeddings (CLWE) by utilising unlabelled data in diferent languages and domains was introduced
by the authors of [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. A significant element of [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] is the development of a new multimodal
opinion database labelled at the utterance level. In [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], a benchmark dataset, a comprehensive
corpus of around 12000 Bengali reviews, was introduced and the performance of supervised
machine learning (ML) classifiers was evaluated in a machine-translated English dataset and
compared to the source Bengali dataset. Several researchers bench marked multi-task learning
on auxiliary tasks on Dravidian languages [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ].
      </p>
      <p>
        A surge of information is created everyday as a result of world-wide internet usage, which
poses huge risks, because online texts with high toxicity can cause personal assaults, mental
health problems, online harassment, and bullying behaviours [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] the authors
integrated the outcomes of three feature-based classifiers for identifying cyber hate speech on
Twitter and investigated the benefits of ensembles of diferent classifiers. The authors of [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]
investigated the challenge of hate speech identification in code-mixed texts and provided a
dataset of code-mixed Hindi-English tweets from Twitter. The authors of [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] suggested a
typology that encapsulates the key similarities and distinctions across subtasks, and addressed
the consequences for feature construction and data annotation, based on the previous work on
hate speech, cyberbullying, and online abuse.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset and Task description</title>
      <p>
        In this section, we describe the dataset provided by the organisers to the participants and the
task [
        <xref ref-type="bibr" rid="ref10 ref36 ref37">36, 37, 10</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>
          The organisers of FIRE-2021 provided training and validation code-mixed data sets in
TamilEnglish, Kannada-English and Malayalam-English [
          <xref ref-type="bibr" rid="ref38">38, 39, 40</xref>
          ]. The datasets consist of
comments collected from Youtube that are annotated with sentiment polarity. In the data sets,
there are three types of code-mixed sentences : Inter-Sentential Code-Mixing, Intra-Sentential
Code-Mixing and Tag switching. The training and validation data sets comprises of sentences
in five classes :
1. Positive state - The comment provides an explicit or implicit indication that the speaker
is in a positive state.
2. Negative state - The comment provides an explicit or implicit indication that the speaker
is in a negative state.
3. Mixed feelings - The comment provides an explicit or implicit indication that the speaker
is experiencing both positive and negative feeling.
4. Neutral state - The comment provides no explicit or implicit indication of the speaker’s
emotional state.
5. Not in intended language - Comment not in Tamil/Malayalam/Kannada.
        </p>
        <p>The Tamil code-mix data set consists of 35,656 comments for the train set, 3,962 for the
validation set and 4,403 comments for testing the model. In the Kannada code-mix data set,
there are 6,212 comments for training, 6,91 for validating and 7,68 for testing the model. The
Malayalam code-mix data set comprises of 15,888 comments in training set, 1,766 comments
in validation set and 1,963 comments in test set.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Task description</title>
        <p>The participants are required to produce labels indicating the sentiment polarity of a given
code-mixed comment. Each sentence should be classified into one of these labels : Positive,
Negative, Neutral, Mixed feelings, Not-in-intended language. At the beginning of the task, the
training and development data sets were already made available to the participants. Only the
comments from the test split were eventually made accessible to participants via Codalab. The
weighted-average F1 scores were considered for oficial ranking since the labels in the task
were not balanced.
Ithu yethu maathiri illama puthu maathiyaala irukku
Pulikku pakaram patti odande vere mattam onnum ella.
ആദ്യ നൂറു േകാടി േവണ്ടവർ ... adei mwonoose like
ರಂಗಿತರಂಗದ ಇತಿಹಾಸ ಮರುಕಳಿಸುವಂತಿದೆ!
Are bhai Yek dum phel diya
Language</p>
        <p>Tamil
Malayalam
Malayalam
Kannnada</p>
        <p>Kannada</p>
        <p>Class</p>
        <p>Positive
Mixed feelings</p>
        <p>Neutral</p>
        <p>Positive
Not-kannada</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. System Description</title>
      <p>To determine the sentiment of a particular text, we employed pre-trained transformer models.
The models employed for the cause are MuRIL [41], mBERT [42], DistilmBERT [43] and
XLMRoberta [44]. These models are then fine-tuned for this particular task. For all three languages,
the same models were utilised. After obtaining the probability scores from diferent models,
we soft vote [ 45] these scores to get our final result. Soft Voting computes the weighted sum
of all the probabilities for each class label and then forecasts the class label with the highest
likelihood. Each individual classifier in soft voting ofers a probability value that a certain data
point belongs to a specified target class. The predictions are weighted by the significance of
the classifier and totaled. The target label with the highest sum of weighted probability then
receives the vote.
4.1. MuRIL
MuRIL [41] is an Indic language model that has been extensively trained and improved to
perform better in Indian languages. It supports around 17 languages, including English and
16 other Indian languages. MuRIL surpassed multilingual BERT on all benchmark data sets of
Indic languages. Masked Language Modeling (MLM) [46] and Translation Language Modeling
(TLM) are two approaches used in MuRIL’s pre-training phase. TLM makes use of parallel
translation data where it takes a sequence of parallel sentences from the translation data and
randomly mask tokens from the source as well as from the target sentence, hence establishing a
cross-lingual mapping among the tokens. MuRIL is the outcome of pre-training a BERT-based
Encoder model with MLM and TLM objectives. MuRIL was also pre-trained on PMINDIA and
Dakshina datasets. It comprises of 236M parameters.</p>
      <sec id="sec-4-1">
        <title>4.2. XLM-Roberta</title>
        <p>XLM-Roberta [44] is a variant of Roberta that is multilingual. The hybrid model XLM-Roberta
was trained on 2.5 TB of commoncrawl data and combines XLM and Roberta. It was trained
using the multilingual MLM loss on 100 diferent languages. XLM-R achieved
state-of-theart results on multiple cross lingual benchmarks. xlm-roberta-base is fine-tuned for our
sentiment analysis task, which contains 12-layers, 768-hidden-state, 8-heads and a parameter size
of 270M.
4.3. BERT
The Encoder of a Transformer is utilised in the design of Bidirectional Encoder Representation
from Transformers (BERT). During its pre-training phase, BERT is trained on the whole
English Wikipedia and the Brown Corpus. It is trained with two language modelling objectives:
Masked Language Modeling (MLM), in which 15% of the tokens are randomly masked, and
Next Sentence Prediction (NSP), in which the model must predict whether the first sentence
precedes the second sentence or not. Here, we adopt a bert-base-multilingual-cased [47] model
trained on top of the largest Wikipedia corpus, which contains 104 languages. This model
consists of 12 layers, 12 Attention heads, and approximately 179M parameters.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.4. DistilBERT</title>
        <p>DistilBERT [48] is a modified version of BERT model. It uses triple loss language modelling that
combines cosine distance loss with knowledge distillation. In comparison to the MLM loss, the
two distillation losses in the triple loss have a significant impact on model performance. The
authors found it useful to include a cosine embedding loss, which tends to align the directions in
big model distillation. Knowledge distillation is a compression approach that involves training
a small model to mimic the behaviour of a bigger model. DistilBERT is not only 60% faster
than BERT, but it also includes 40% less parameters. In this case, we use a cased multilingual
distilbert model with 6 layers, 768 dimensions, and 12 Attention heads.</p>
        <p>Output Probabilities
Fully Connected Layer (x2)</p>
        <p>Concatenate</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.5. Methodology</title>
        <p>First, we preprocess the data by removing emojis and punctuation. The text is then tokenized
using the tokenizer of the corresponding language model, and all sequences are padded to
the same length. The sequence output is then retrieved and then sent up to two BiLSTM [49]
layers with units 200 and 100, respectively. The generated output values are concatenated after
being fed into a global average pooling layer and a global max pooling layer. To acquire the
probability scores, this is then fed into several Fully Connected layers, followed by a softmax
activation function as shown in Figure 3. Refer Table 2 for the parameters used in the models.</p>
        <p>Parameters
Optimizer
Dropout Rate
Batch Size
Max Length
Learning Rate
Activation Function
Loss Function</p>
        <p>Values</p>
        <p>Adam
0.5
64
200
1e-3</p>
        <p>Softmax
cross-entropy</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <p>We have fine-tuned various transformer models, like MuRIL, BERT, XLM-RoBERTa, DistilBERT.
We used the Tensorflow implementation of the models, provided by the Hugging-face library.
Based on the results released by the organisers, we have secured third position with an
F1score of 0.626 on the Tamil test set. We received F1-scores of 0.609 and 0.708 on the Kannada
and Malayalam test sets, respectively. We anticipated the models would provide similar results
based on the soft-voting scores we obtained on the validation sets. In comparison to the Tamil
models, the Kannada and Malayalam models performed relatively poor on the Kannada and
Malayalam test sets, contrary to our expectations. The results of soft-voting technique on test
data sets is shown in table 3.</p>
      <p>Language
Tamil
Malayalam
Kannada</p>
      <p>We submitted the results obtained through the soft-voting technique because it produced the
best results for the three Dravidian languages. We also submitted the scores of distilBERT for
Tamil and XLM-Roberta for Kannada and Malayalam. Soft-voting yielded F1-scores of 0.619,
0.752, and 0.648 for Tamil, Malayalam, and Kannada, respectively. Table 4 shows the weighted
average Precision, Recall, and F1-scores of the transformer models and soft-voting approach
evaluated on development data sets of the three Dravidian languages.</p>
      <p>Among the transformer models, distilBERT performed better on the Tamil validation set
with an F1-score of 0.607, while XLM-Roberta performed better on the Malayalam and Kannada
validation sets with F1-scores of 0.721 and 0.621, respectively. MuRIL is the model which gave
rather poor performance on Tamil and Kannada data sets despite being specifically built for
Indian languages.</p>
      <p>When compared to the results of soft-voting technique on the Tamil test set and
development set, the F1-score improved from 0.619 to 0.626. We have also observed a decrease in the
performance of soft-voting in the Kannada and Malayalam languages. One of the causes is
the data sets discrepancy in class distribution. The majority of the texts fall into the positive
category, followed by the unknown state and negative categories. Our models performed well
in the majority class and poorly in the minority class. We also notice that the F1-score of the
Mixed Feeling class is quite low when compared to the non-Tamil class, despite the fact that
the number of sentences belonging to the former label is significantly higher than the latter in
the Tamil validation-set. Another anomaly we noticed is that the F1-Score of not-malayalam
label is the highest even though the data set has more samples belonging to the positive class
in Malayalam data set.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we describe our work on sentiment analysis of Dravidian languages for Kannada,
Malayalam, and Tamil. We fine-tuned various pre-trained multilingual natural language
models such as BERT, XLM-R, DistilBERT, and MuRIL to classify the sequence into one of these
5 classes: Positive, Negative, Neutral, Mixed-feelings, and Not-in-indented language.
Overall, XLM-Roberta performed competently compared to other models. Soft Voting technique is
applied to the individual model’s probabilities, enhancing the overall performance. The
problem of class imbalance had a serious impact on performance of the model in the low support
classes. The soft-voting technique achieved weighted-average F1 Scores of 0.626, 0.708 and
0.609 for Tamil, Malayalam, and Kannada respectively. In the future, we intend to apply class
weighting techniques and semi-supervised approaches to further improve our performance.
dataset for dravidian languages in code-mixed text, CoRR abs/2106.09460 (2021). URL:
https://arxiv.org/abs/2106.09460. arXiv:2106.09460.
[39] B. R. Chakravarthi, N. Jose, S. Suryawanshi, E. Sherly, J. P. McCrae, A sentiment analysis
dataset for code-mixed Malayalam-English, in: Proceedings of the 1st Joint Workshop
on Spoken Language Technologies for Under-resourced languages (SLTU) and
Collaboration and Computing for Under-Resourced Languages (CCURL), European Language
Resources association, Marseille, France, 2020, pp. 177–184. URL: https://aclanthology.org/
2020.sltu-1.25.
[40] B. R. Chakravarthi, V. Muralidaran, R. Priyadharshini, J. P. McCrae, Corpus
creation for sentiment analysis in code-mixed Tamil-English text, in: Proceedings of the
1st Joint Workshop on Spoken Language Technologies for Under-resourced languages
(SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL),
European Language Resources association, Marseille, France, 2020, pp. 202–210. URL:
https://aclanthology.org/2020.sltu-1.28.
[41] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. K. Margam, P.
Aggarwal, R. T. Nagipogu, S. Dave, S. Gupta, S. C. B. Gali, V. Subramanian, P. Talukdar,
Muril: Multilingual representations for indian languages, CoRR abs/2103.10730 (2021).</p>
      <p>URL: https://arxiv.org/abs/2103.10730. arXiv:2103.10730.
[42] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional
transformers for language understanding, in: Proceedings of the 2019 Conference of the
North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short Papers), Association for Computational
Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://aclanthology.org/
N19-1423. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 9 - 1 4 2 3 .
[43] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller,
faster, cheaper and lighter, ArXiv abs/1910.01108 (2019).
[44] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave,
M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning
at scale, CoRR abs/1911.02116 (2019).
[45] P. Kalyan, D. Reddy, A. Hande, R. Priyadharshini, R. Sakuntharaj, B. R. Chakravarthi, IIITT
at CASE 2021 task 1: Leveraging pretrained language models for multilingual protest
detection, in: Challenges and Applications of Automated Extraction of Socio-political
Events from Text, Association for Computational Linguistics, Online, 2021, pp. 98–104.</p>
      <p>URL: https://aclanthology.org/2021.case-1.13.
[46] W. L. Taylor, “cloze procedure”: A new tool for measuring readability, Journalism
quarterly 30 (1953) 415–433.
[47] T. Pires, E. Schlinger, D. Garrette, How multilingual is multilingual bert?, arXiv preprint
arXiv:1906.01502 (2019).
[48] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller,
faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019).
[49] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (1997)
1735–1780.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Aichner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grünfelder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Maurer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jegeni</surname>
          </string-name>
          ,
          <article-title>Twenty-five years of social media: a review of social media applications and definitions from 1994 to 2019</article-title>
          , Cyberpsychology, Behavior, and
          <source>Social Networking</source>
          <volume>24</volume>
          (
          <year>2021</year>
          )
          <fpage>215</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sobkowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaschesky</surname>
          </string-name>
          , G. Bouchard,
          <article-title>Opinion mining in social media: Modeling, simulating, and forecasting political opinions in the web</article-title>
          ,
          <source>Government information quarterly 29</source>
          (
          <year>2012</year>
          )
          <fpage>470</fpage>
          -
          <lpage>479</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Anto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Antony</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Muhsina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Johny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>James</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Wilson,
          <article-title>Product rating using sentiment analysis</article-title>
          ,
          <source>in: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>3458</fpage>
          -
          <lpage>3462</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>U.</given-names>
            <surname>Barman</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wagner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Foster</surname>
          </string-name>
          ,
          <article-title>Code mixing: A challenge for language identification in the language of social media</article-title>
          ,
          <source>in: Proceedings of the first workshop on computational approaches to code switching</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Muysken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Muysken</surname>
          </string-name>
          , et al.,
          <article-title>Bilingual speech: A typology of code-mixing</article-title>
          , Cambridge University Press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Puranik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Domain identification of scientific articles using transfer learning and ensembles, in: Trends and Applications in Knowledge Discovery and Data Mining: PAKDD 2021 Workshops, WSPA, MLMEIN, SDPRA, DARAI, and</article-title>
          <string-name>
            <surname>AI4EPT</surname>
          </string-name>
          , Delhi, India, May
          <volume>11</volume>
          ,
          <source>2021 Proceedings 25</source>
          , Springer International Publishing,
          <year>2021</year>
          , pp.
          <fpage>88</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Christie</surname>
          </string-name>
          ,
          <article-title>The medieval tamil-language inscriptions in southeast asia and china</article-title>
          ,
          <source>Journal of Southeast Asian Studies</source>
          (
          <year>1998</year>
          )
          <fpage>239</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pratapa</surname>
          </string-name>
          , G. Bhat,
          <string-name>
            <given-names>M.</given-names>
            <surname>Choudhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sitaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dandapat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bali</surname>
          </string-name>
          ,
          <article-title>Language modeling for code-mixing: The role of linguistic theory based synthetic data</article-title>
          ,
          <source>in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>1543</fpage>
          -
          <lpage>1553</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>De Saa</surname>
          </string-name>
          , L. Ranathunga,
          <article-title>Self-reflective and introspective feature model for hate content detection in sinhala youtube videos, in: 2020 From Innovation to Impact (FITI), volume 1</article-title>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          , Kancmd:
          <article-title>Kannada codemixed dataset for sentiment analysis and ofensive language detection</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krishnasamy</surname>
          </string-name>
          ,
          <article-title>Code mixing among tamil-english bilingual children</article-title>
          ,
          <source>International Journal of Social Science and Humanity</source>
          <volume>5</volume>
          (
          <year>2015</year>
          )
          <fpage>788</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Sridhar</surname>
          </string-name>
          ,
          <article-title>On the functions of code-mixing in kannada (</article-title>
          <year>1978</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hatzivassiloglou</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>McKeown, Predicting the semantic orientation of adjectives, in: 35th annual meeting of the association for computational linguistics and 8th conference of the european chapter of the association for computational linguistics</article-title>
          ,
          <year>1997</year>
          , pp.
          <fpage>174</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Å. Nielsen</surname>
          </string-name>
          ,
          <article-title>A new anew: Evaluation of a word list for sentiment analysis in microblogs</article-title>
          ,
          <source>arXiv preprint arXiv:1103.2903</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>T. K. Patel</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Habimana-Grifin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Achilefu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Alitalo</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          <string-name>
            <surname>McKee</surname>
            ,
            <given-names>P. W.</given-names>
          </string-name>
          <string-name>
            <surname>Sheehan</surname>
            ,
            <given-names>E. S.</given-names>
          </string-name>
          <string-name>
            <surname>Musiek</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Xiong</surname>
          </string-name>
          , et al.,
          <article-title>Dural lymphatics regulate clearance of extracellular tau from the cns</article-title>
          ,
          <source>Molecular neurodegeneration 14</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Taboada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brooke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tofiloski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Voll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stede</surname>
          </string-name>
          ,
          <article-title>Lexicon-based methods for sentiment analysis</article-title>
          ,
          <source>Computational linguistics 37</source>
          (
          <year>2011</year>
          )
          <fpage>267</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Puranik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Durairaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sampath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Thamburaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Attentive fine-tuning of Transformers for Translation of low-resourced languages</article-title>
          @
          <source>LoResMT</source>
          <year>2021</year>
          , in:
          <source>Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages, European Association for Machine Translation, Online</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Aisopos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tzannetos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Violos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Varvarigou</surname>
          </string-name>
          ,
          <article-title>Using n-gram graphs for sentiment analysis: an extended study on twitter</article-title>
          ,
          <source>in: 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
          </string-name>
          ,
          <article-title>Part-of-speech tagging of code-mixed social media text</article-title>
          , in: Proceedings of the second workshop on computational approaches to code switching,
          <year>2016</year>
          , pp.
          <fpage>90</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>N.</given-names>
            <surname>Choudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Bindlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shrivastava</surname>
          </string-name>
          ,
          <article-title>Sentiment analysis of code-mixed languages leveraging resource rich languages</article-title>
          ,
          <year>2018</year>
          . arXiv:
          <year>1804</year>
          .00806.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dowlagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mamidi</surname>
          </string-name>
          ,
          <article-title>Cmsaone@dravidian-codemix-fire2020: A meta embedding and transformer model for code-mixed sentiment analysis on social media text</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2101</volume>
          .
          <fpage>09004</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yasaswini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Puranik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          , IIITT@DravidianLangTech-EACL2021:
          <article-title>Transfer learning for ofensive language detection in Dravidian languages</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</source>
          , Kyiv,
          <year>2021</year>
          , pp.
          <fpage>187</fpage>
          -
          <lpage>194</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .dravidianlangtech-
          <volume>1</volume>
          .
          <fpage>25</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>K.</given-names>
            <surname>Puranik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>IIITT@LTEDI-EACL2021-hope speech detection: There is always hope in transformers</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion</source>
          , Association for Computational Linguistics, Kyiv,
          <year>2021</year>
          , pp.
          <fpage>98</fpage>
          -
          <lpage>106</lpage>
          . URL: https: //aclanthology.org/
          <year>2021</year>
          .ltedi-
          <volume>1</volume>
          .
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Diakopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Shamma</surname>
          </string-name>
          , Characterizing Debate Performance via Aggregated Twitter Sentiment, Association for Computing Machinery, New York, NY, USA,
          <year>2010</year>
          , p.
          <fpage>1195</fpage>
          -
          <lpage>1198</lpage>
          . URL: https://doi.org/10.1145/1753326.1753504.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thabasum Aara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Arunaggiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Sai Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prabalakshmi</surname>
          </string-name>
          ,
          <article-title>A novel convolutional neural network architecture to diagnose covid-19</article-title>
          , in: 2021
          <source>3rd International Conference on Signal Processing and Communication (ICPSC)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>595</fpage>
          -
          <lpage>599</lpage>
          .
          <source>doi:1 0 . 1 1 0 9 / I C S P C 5 1</source>
          <volume>3 5 1 . 2 0 2 1 . 9 4 5 1 7 0 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Jhanwar</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
          </string-name>
          ,
          <article-title>An ensemble model for sentiment analysis of hindi-english codemixed data</article-title>
          ,
          <year>2018</year>
          . arXiv:
          <year>1806</year>
          .04450.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Montejo-Ráez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Martínez-Cámara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Martin-Valdivia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A. U.</given-names>
            <surname>López</surname>
          </string-name>
          ,
          <article-title>Random walk weighting over sentiwordnet for sentiment polarity detection on twitter</article-title>
          ,
          <source>in: Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <article-title>Towards a unified end-to-end approach for fully unsupervised crosslingual sentiment analysis</article-title>
          ,
          <source>in: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>1035</fpage>
          -
          <lpage>1044</lpage>
          . URL: https://aclanthology.org/K19-1097.
          <source>doi:1 0 . 1 8</source>
          <volume>6 5 3</volume>
          / v 1 / K 1 9
          <article-title>- 1 0 9 7</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>V.</given-names>
            <surname>Pérez-Rosas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-P.</given-names>
            <surname>Morency</surname>
          </string-name>
          ,
          <article-title>Utterance-level multimodal sentiment analysis</article-title>
          ,
          <source>in: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2013</year>
          , pp.
          <fpage>973</fpage>
          -
          <lpage>982</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sazzed</surname>
          </string-name>
          ,
          <article-title>Cross-lingual sentiment classification in low-resource Bengali language</article-title>
          , in: Proceedings of the Sixth Workshop on Noisy User-generated
          <string-name>
            <surname>Text</surname>
          </string-name>
          (
          <article-title>W-NUT</article-title>
          <year>2020</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>60</lpage>
          . URL: https://aclanthology. org/
          <year>2020</year>
          .wnut
          <article-title>-1.8. doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 0</article-title>
          . w n u t -
          <volume>1</volume>
          . 8 .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sampath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Thamburaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Hope speech detection in under-resourced kannada language</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2108</volume>
          .
          <fpage>04616</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>S. V.</given-names>
            <surname>Georgakopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Tasoulis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Vrahatis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. P.</given-names>
            <surname>Plagianakos</surname>
          </string-name>
          ,
          <article-title>Convolutional neural networks for toxic comment classification</article-title>
          ,
          <source>in: Proceedings of the 10th Hellenic Conference on Artificial Intelligence</source>
          , SETN '18,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2018</year>
          . URL: https://doi.org/10.1145/3200947.3208069.
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 2 0 0 9 4 7 . 3 2 0 8 0 6 9 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>P.</given-names>
            <surname>Burnap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making</article-title>
          ,
          <source>Policy &amp; Internet</source>
          <volume>7</volume>
          (
          <year>2015</year>
          )
          <fpage>223</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bohra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vijay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shrivastava</surname>
          </string-name>
          ,
          <article-title>A dataset of HindiEnglish code-mixed social media text for hate speech detection</article-title>
          ,
          <source>in: Proceedings of the Second Workshop on Computational Modeling of People's Opinions</source>
          , Personality, and Emotions in Social Media, Association for Computational Linguistics, New Orleans, Louisiana, USA,
          <year>2018</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>41</lpage>
          . URL: https://aclanthology.org/W18-1105.
          <source>doi:1 0 . 1 8</source>
          <volume>6 5 3</volume>
          / v 1 / W 1 8
          <article-title>- 1 1 0 5</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Waseem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warmsley</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Weber</surname>
          </string-name>
          ,
          <article-title>Understanding abuse: A typology of abusive language detection subtasks</article-title>
          ,
          <year>2017</year>
          . arXiv:
          <volume>1705</volume>
          .
          <fpage>09899</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sakuntharaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Madasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          , P. B,
          <string-name>
            <given-names>S. Chinnaudayar</given-names>
            <surname>Navaneethakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC-DravidianCodeMix Shared Task on Ofensive Language Detection in Tamil and Malayalam</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chinnappa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Durairaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vasantharajan</surname>
          </string-name>
          ,
          <source>Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text</source>
          <year>2021</year>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          , N. Jose,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Dravidiancodemix: Sentiment analysis and ofensive language identification</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>