<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>for Detection of Ofensive Content in Dravidian Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>B S N V</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chaitaya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karri</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anjali</string-name>
          <email>anjalipoornima.k16@iiits.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Information Technology</institution>
          ,
          <addr-line>SriCity</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>1</volume>
      <fpage>3</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>Hate speech is a form of oral, written or physical activity that criticizes or uses derogatory language in correspondence to a person or a community discriminating their identity factors. Hate speech or the use of ofensive language can endanger democratic principles and societal stability. The growing usage of social media is also increasing the number of people being afected by hate speech. Online hate speech moderation has been significantly increasing, especially through social media platforms like Facebook, Twitter, YouTube, and Instagram. It is high time to take appropriate actions to curb the intensifying online hate speech by supporting the detection of hate speech or ofensive language texts in social media. The work presented to Hate Speech and Ofensive Content Identification in Dravidian-CodeMix (HASOC) 2021, a joint assignment under Forum for Information Retrieval Evaluation (FIRE) 2021, is described in this paper. In this paper, we proposed an ensemble system of transformer models (mBERT, DistilBERT and MuRIL) to achieve the task of identifying social media code-mixed comments/posts in Dravidian Languages (Malayalam-English and Tamil-English) as ofensive or notofensive texts. The motivation behind this was to use the power of transformers in combination with ensembling to enhance the prediction quality. For sub-task 2, the proposed ensemble method received 3rd and 6th positions in Malayalam and Tamil languages, respectively. The code is publicly available at https://github.com/chaitnayabasava/HSU_TransEmb.</p>
      </abstract>
      <kwd-group>
        <kwd>Dravidian Languages</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Social media platforms ofer users freedom of expression. Simultaneously, they also bring up
new challenges in terms of freedom of expression, speech, and human dignity. Hate speech on
the internet is the expression of tensions between various groups and can also have a detrimental
impact on society. Hate speech expressed through social media is not inherently diferent from
hate expressed outside, but it could have specific dificulties stemming from its indefiniteness,
durability, and anonymity. Hate speech in online venues may persist in many formats across
several platforms, and it can be connected multiple times. Counteracting hate speech in the
internet world demands more thought and innovative strategies. Social media platforms such as
Youtube, Facebook, and Twitter each have algorithms for identifying hate speech. Nonetheless,
identifying and classifying hate speech is still a significant issue for social media firms alongside
researchers.</p>
      <p>India being a diverse country, most of the Indians mix up diferent languages with English
while communicating. In the multilingual community, code-mixing is common, and code-mixed
writings are occasionally produced in non-native scripts. Due to the convenience of using
local languages alongside English, code-mixed languages are becoming increasingly popular
on diferent social media platforms. However, ambiguity is introduced by spelling variances
and the absence of grammatical standards, making it increasingly arduous to automate text
analysis. We can observe a growing demand for ofensive language identification, especially
on social media messages, which are mostly code-mixed. Many researchers have been looking
into varying algorithms for detecting hate speech, and most of the studies concentrated on
monolingual text data. But, due to the intricacy of code-mixing, models trained on monolingual
data commonly fail when tried on code-mixed data.</p>
      <p>
        Therefore, as part of HASOC 2021 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we developed a classification model to identify ofensive
texts in code-mixed Dravidian languages. HASOC 2021 has two sub-tasks and this paper provides
the working notes on sub-task 2, which involves categorizing the given code-mixed tweet as
ofensive or non-ofensive. The evaluation metric reported and considered for model selection
in this paper is the weighted average F1-score. The competition page and reference document
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] provide further information on the challenges. We organized the rest of the paper as follows:
section 2 highlights the relevant work, section 3 details the proposed technique, section 4 depicts
the experiments and outcomes, and section 5 concludes the article and summarises our findings.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The task of hate-speech detection is often treated as a text classification task. Using machine
learning or deep learning approaches to detect ofense, hostility, and hate speech in
usergenerated content is one of the most efective strategies for combating this problem. As
indicated by recent articles, this topic has got a lot of attention recently. Few survey articles that
describe significant areas that have been investigated for this task include are as follows. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
represents a survey covering the important areas that were investigated for employing natural
language processing to automatically recognize various types of utterances. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] looked at
strategies for detecting hate speech in social media and separating it from ordinary obscenities.
The findings showed that the most dificult part is distinguishing between profanity and hate
speech. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] examined the complexities of the concept of hate speech, which is defined diferently
across platforms and settings, and ofers a unified definition.
      </p>
      <p>
        In the literature, a number of distinct classifiers have been used in various works. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] was
one of the earliest research in the problem of hate speech detection. The authors developed
a prototype for detection abusive messages using a decision-tree generator with 47-features
corresponding to the syntax and semantics. Later, machine learning classification methods
like SVM and logistic regression were used to tackle the task of hate speech detection. For
instance, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] used logistic regression to perform obscenity-related ofensive tweets detection.
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] constructed machine learning models like a Support Vector Machine with a linear kernel,
and a Random Forest with 100 trees to identify cyber hate for a range of protected traits such as
race, disability, and sexual orientation to facilitate the automatic detection of cyber hate online,
specifically on Twitter. The feature set used by them included Bag of Words, features obtained
by identifying hostile words and phrases for hate speech and typed dependencies. Although
bag-of-words methods have a high recall rate, they also have a high incidence of false positives.
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] developed Convolution network-based models to achieve Hate-speech detection.
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] trained diferent CNN models with diferent sets of features like 4-grams, word2vec word
vectors, word vectors which are randomly generated and combination of word vectors with
n-grams. With 78.3% F-score, CNN model with word2vec features vectors performed best. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
experimented with 16k annotated dataset and used features by coupling the embedding learned
by deep neural network models and gradient boosted decision trees.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] employed Recurrent Neural networks namely LSTM and BiLSTM. The former
authors implemented SVM and LSTM for Hate-speech detection in Italian language, using
morpho-syntactic and syntactic features, sentiment polarity, and lexical text features. The
later, experimented with Convolutional Networks, BiLSTM and Convolutional Networks with
BiLSTM to identify postings indicating the user’s use of medicine. Frequently, a single solution
to a complicated problem does not apply to all possible circumstances. As a result, researchers
employ ensemble methods to solve such issues. Thus, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] addressed this classification
task using ensembles with stacked deep learning CNN ensembles and an ensemble of Recurrent
Neural Network classifiers respectively. Therefore, taking inspiration of using deep learning
techniques and ensembles, in this paper we proposed an ensemble of transformers [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The
power of pre-trained transformers was harnessed by BERT. BERT is a pre-trained model on
unlabelled text corpus which can further be fine-tuned for specific tasks like classification. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
presented an overall idea of all the methods and results for Ofensive Language Identification
in Dravidian Languages-EACL 2021. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] provided an overall idea of the task of hate speech
recognition in Tamil, Malayalam, Hindi, English and German as part of the HASOC track at
FIRE 2020. The authors of [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] worked to compare diferent pretrained text embeddings to
classify hate speech in Indian Code-Mixed sentences.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>
        To achieve the task of classifying the code-mixed tweet as an ofensive or not-ofensive tweet,
we proposed an ensemble model of transformers. This section elaborates on the dataset and
its pre-processing steps, subsequently explaining the ensemble setting. Figure 1 depicts the
architecture of the proposed Transformer Ensemble model for classifying ofensive tweets using
the dataset given by the organizers of HASOC 2021 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>3.1. Pre-Processing</title>
        <p>The phase of pre-processing is very crucial, especially while working with tweets. Unprocessed
tweets are unstructured, often containing redundant information and noise that could mislead
predictions. We processed the tweets by transforming them to lower case and subsequently
tokenized each tweet. Tokenization converts a tweet into words, punctuation marks, numeric
digits, and other symbols. These tokenized tweets were further processed by removing the
punctuation’s since they do not add much information to the underlying content. Tweets mostly
go with the # and @ handles, which would not help us much in modelling and may lead to
biases that ultimately hamper the predictions. We removed digits, URLs, # and @ handles using
regex expressions. Emojis and emoticons have become an integral part of our everyday lives
and frequently appear in social media texts. We also removed these symbols and characters
during pre-processing. The categories of the given dataset are also not uniform. We dropped
data points with labels: not-Tamil and not-Malayalam. Finally, we trained diferent models
using cleaned tweets with two labels, namely ’NOT’ and ’OFF’.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Models</title>
        <p>
          To build our ensemble model, we majorly worked with three diferent transformer-based models,
namely multilingual BERT (mBERT), Multilingual Representations for Indian Languages (MuRIL)
and Distilled BERT (DistilBERT). [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] has marked the use of transformer models with
encoderdecoder blocks using attention maps for long sequence tasks. The goal of transformers is to
completely manage the dependencies between input and output using attention maps and
recurrent networks. Bidirectional Encoder Representations from Transformers, Google’s BERT
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] has paved the way for a new era of using transfer learning in NLP. This language model
is built with a multi-layer bidirectional Transformer encoder along with bi-directional
selfattention layers. It enables the users to fine-tune the pre-trained language model to achieve
state-of-the-art performance in many NLP-related tasks like question answering, translation,
classification, etc. BERT’s pre-training objectives, Masked Language Modelling (MLM) and
Next Sentence Prediction, are straightforward yet efective. The MLM masks tokens in the
input randomly and, the goal of the model is to predict the masked tokens. The next-sentence
prediction makes sure that the model understands the connection between consecutive sentences.
Thus these unsupervised pre-training objectives made BERT a powerful pre-trained model for
language representations.
        </p>
        <p>
          The original pre-trained models of Google BERT have been trained on lower-cased English
text. Since our task was the classification of tweets in code-mixed Dravidian languages, we tried
to use other BERT models from HuggingFace [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] that were pre-trained in diferent languages.
The mBERT is the original BERT base model pre-trained on the top 102 languages, including
Tamil and Malayalam. The model is pre-trained with the same MLM objective as BERT with the
Wikipedia corpus. mBERT develops complex cross-lingual representations that enable language
transfer of code-mixed tweets more eficiently. [ 19] proposed a lighter version of BERT which
reduced the number of parameters by 40% preserving 97% of the language representations
knowledge and increasing the computation speed by 60%. DistilBERT is a lighter and faster
transformer model with a triple loss combining the language modelling, distillation of the
BERT base, and cosine-distance. For the proposed ensemble model, we used the multilingual
DistilBERT model, having 6 transformer layers with 12 attention heads and 134M parameters
in total and is a distilled version of mBERT. [20] proposed MuRIL, a multilingual Language
model that was trained specifically on a large corpus of 17 Indian Languages. This model was
designed to perform a range of fine-tuned NLP tasks in Indian languages. This model is also
trained on transliterated data, which is a regular occurrence in the Indian environment and can
help in improving the performance of the classification task in Dravidian languages.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Training</title>
        <p>Hugging Face pre-trained transformer models have been used to build models for the fine-tuning
task of tweet classification. The outputs of the last hidden layer of the corresponding models
are averaged and used as the final feature representation of the tweet. This representation is
ifnally passed through an output layer with output dimensions equal to the number of classes,
two. We used a batch size of 32 with a max sequence length of 256 and trained the classifier
by monitoring the cross-entropy loss, which increases when the prediction diverges from the
ground truth.</p>
        <p>
          The dataset provided by Chakravarthi et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] has a slight imbalance issue between the
two available classes (’OFFENSIVE’, ’NOT-OFFENSIVE’). The Malayalam dataset has 2047
notofensive and
1953 ofensive tweets whereas, the Tamil dataset has 2020 not-ofensive and
1980
ofensive tweets. To address this imbalance, we used inverse weighting to penalize the incorrect
predictions of the lower-represented class more in the cross-entropy loss function. Finally, we
trained the models with a learning rate of 1e-5 for 30 epochs.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Ensemble of Transformers</title>
        <p>We employed a voting soft ensemble model for getting the final predictions. As discussed in
section 3.3, we fine-tuned each considered model using the provided dataset. The motivation
behind using an ensemble voting mechanism is to have a system that combines the outputs of
various BERT based models to give the final predictions. The base models were trained using
varying amounts of data and transformer layers, resulting in each model identifying diferent
patterns from the text. By using the ensemble setting, we can capture and use these multiple
patterns to give the final prediction. This setting has helped us improve the performance above
the F1-score of the best performing model amongst the considered one’s.</p>
        <p>In the proposed soft voting ensemble setting, the prediction probabilities of each model are

averaged as shown in eq 1. The final prediction then comes from using eq 2, where  
probability of the comment being ’NOT-OFFENSIVE’ predicted by model  and  is the total
is the
number of models considered for the ensemble setting.</p>
        <p>=</p>
        <p>{
  ,
  ,
=</p>
        <p>−1
1
 =0
∑</p>
        <p>if  
else</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>We considered various transformer-based multilingual models, trained with datasets containing
the two languages (Tamil and Malayalam) in focus and fine-tuned them using the provided
dataset using the training setup described in section 3.3. To apply the pre-trained models of
BERT, we first need to tokenize the input using the Bert Tokenizers. These tokenizers split
the input text into tokens and add tokens like [CLS] and [SEP] used to indicate the start and
end of sentences. We considered the max length as 256 so that the input sentences are padded
or truncated to this length. Lastly, the attention mask is created and returned along with the
tokenized input. The classifier is fed the average of features from the last hidden layer of the
BERT model and fine-tuned using Adam optimizer with weight decay with a learning rate of
1e-5. We trained the models with a batch size of 32 for 30 epochs each.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We proposed an ensemble transformer model that utilized various transformers trained on
multilingual data to identify hate speech and ofensive language in the Dravidian languages,
Tamil and Malayalam. The proposed ensemble model was able to outperform the standalone
models on the dev set. Yet, the F1-score of all the models is very low on the provided test
set. The poor performance may be mainly be attributed to the change in distribution from the
train and dev sets. In future work, we will consider using multiple open-sourced Hate speech
recognition code-mix datasets along with the provided dataset to cover various possible data
patterns. We will also explore the efects of using language-specific LSTM based models like
ULMFit [21].
Language Processing: System Demonstrations, Association for Computational Linguistics,
Online, 2020, pp. 38–45. URL: https://www.aclweb.org/anthology/2020.emnlp-demos.6.
[19] V. Sanh, L. Debut, J. Chaumond, T. Wolf, DistilBERT, a distilled version of BERT: smaller,
faster, cheaper and lighter, 2020. a r X i v : 1 9 1 0 . 0 1 1 0 8 .
[20] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. K. Margam, P. Aggarwal,
R. T. Nagipogu, S. Dave, S. Gupta, S. C. B. Gali, V. Subramanian, P. Talukdar, MuRIL:
Multilingual Representations for Indian Languages, 2021. a r X i v : 2 1 0 3 . 1 0 7 3 0 .
[21] J. Howard, S. Ruder, Universal language model fine-tuning for text classification, arXiv
preprint arXiv:1801.06146 (2018).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sakuntharaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Madasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          , P. B,
          <string-name>
            <given-names>S. Chinnaudayar</given-names>
            <surname>Navaneethakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC-DravidianCodeMix Shared Task on Ofensive Language Detection in Tamil and Malayalam</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Hate Speech Detection using Natural Language Processing</article-title>
          ,
          <source>in: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media</source>
          , Association for Computational Linguistics, Valencia, Spain,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://aclanthology.org/W17-1101.
          <source>doi:1 0 . 1 8</source>
          <volume>6 5 3</volume>
          / v 1 / W 1 7
          <article-title>- 1 1 0 1</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Detecting Hate Speech in Social Media,
          <source>in: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP</source>
          <year>2017</year>
          ,
          <string-name>
            <given-names>INCOMA</given-names>
            <surname>Ltd</surname>
          </string-name>
          .,
          <string-name>
            <surname>Varna</surname>
          </string-name>
          , Bulgaria,
          <year>2017</year>
          , pp.
          <fpage>467</fpage>
          -
          <lpage>472</lpage>
          . URL: https://doi.org/10.26615/
          <fpage>978</fpage>
          -954-452-049-6_
          <fpage>062</fpage>
          .
          <source>doi:1 0 . 2 6</source>
          <volume>6 1 5 / 9 7 8 - 9 5 4 - 4 5 2 - 0 4 9 - 6</volume>
          _
          <issue>0</issue>
          6
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Automatic Detection of Hate Speech in Text, ACM Computing Surveys (CSUR) 51 (</article-title>
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Spertus</surname>
          </string-name>
          ,
          <article-title>Smokey: Automatic Recognition of Hostile Messages</article-title>
          ,
          <source>in: Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence</source>
          , AAAI'97/IAAI'97, AAAI Press,
          <year>1997</year>
          , p.
          <fpage>1058</fpage>
          -
          <lpage>1065</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rose</surname>
          </string-name>
          ,
          <article-title>Detecting Ofensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus</article-title>
          ,
          <source>in: Proceedings of the 21st ACM International Conference on Information and Knowledge Management</source>
          , CIKM '12,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2012</year>
          , p.
          <fpage>1980</fpage>
          -
          <lpage>1984</lpage>
          . URL: https://doi.org/10. 1145/2396761.2398556.
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 2 3 9 6 7 6 1 . 2 3 9 8 5 5 6 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Burnap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Us and them: identifying cyber hate on Twitter across multiple protected characteristics</article-title>
          ,
          <source>EPJ Data science 5</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Gambäck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U. K.</given-names>
            <surname>Sikdar</surname>
          </string-name>
          ,
          <article-title>Using Convolutional Neural Networks to Classify HateSpeech</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Abusive Language Online</source>
          , Association for Computational Linguistics, Vancouver, BC, Canada,
          <year>2017</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>90</lpage>
          . URL: https://aclanthology.org/W17-3013.
          <source>doi:1 0 . 1 8</source>
          <volume>6 5 3</volume>
          / v 1 / W 1 7
          <article-title>- 3 0 1 3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Badjatiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Varma</surname>
          </string-name>
          ,
          <article-title>Deep learning for hate speech detection in tweets</article-title>
          ,
          <source>in: Proceedings of the 26th international conference on World Wide Web companion</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>759</fpage>
          -
          <lpage>760</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Del Vigna12</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimino23</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Petrocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tesconi</surname>
          </string-name>
          ,
          <article-title>Hate me, hate me not: Hate speech detection on facebook</article-title>
          ,
          <source>in: Proceedings of the First Italian Conference on Cybersecurity (ITASEC17)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>86</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mahata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Friedrichs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Shah</surname>
          </string-name>
          , et al.,
          <article-title># phramacovigilance-Exploring Deep Learning Techniques for Identifying Mentions of Medication Intake from Twitter</article-title>
          , arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>06375</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Pitsilis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ramampiaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Langseth</surname>
          </string-name>
          ,
          <article-title>Detecting ofensive language in tweets using deep learning</article-title>
          , arXiv preprint arXiv:
          <year>1801</year>
          .
          <volume>04433</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jose</surname>
          </string-name>
          , A.
          <string-name>
            <surname>Kumar</surname>
            <given-names>M</given-names>
          </string-name>
          , T. Mandl,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R L</given-names>
            ,
            <surname>J. P. McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <article-title>Findings of the Shared Task on Ofensive Language Identification in Tamil, Malayalam, and Kannada</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</source>
          , Kyiv,
          <year>2021</year>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>145</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . dravidianlangtech-
          <volume>1</volume>
          .
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Kumar</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC Track at FIRE 2020: Hate Speech and Ofensive Language Identification in Tamil, Malayalam, Hindi, English and German</article-title>
          ,
          <source>in: Forum for Information Retrieval Evaluation</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Raja</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Comparison of Pretrained Embeddings to Identify Hate Speech in Indian Code-Mixed Text</article-title>
          ,
          <source>in: 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>25</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ I C A C C C N</surname>
          </string-name>
          <article-title>5 1</article-title>
          <volume>0 5 2 . 2 0 2 0 . 9 3 6 2 7 3 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-Art Natural Language Processing</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>