=Paper=
{{Paper
|id=Vol-3159/T3-15
|storemode=property
|title=Detection Offensive Tamil Texts using Machine Learning and Multilingual Transformers Models
|pdfUrl=https://ceur-ws.org/Vol-3159/T3-15.pdf
|volume=Vol-3159
|authors=Malliga Subramanian,Kogilavani Shanmuga Vadivel,Antonette Shibani,Adhithiya G.J,Deepti R,Gowthamkrishnan S
|dblpUrl=https://dblp.org/rec/conf/fire/SubramanianKSJR21
}}
==Detection Offensive Tamil Texts using Machine Learning and Multilingual Transformers Models==
<pdf width="1500px">https://ceur-ws.org/Vol-3159/T3-15.pdf</pdf>
<pre>
Detection Offensive Tamil Texts using Machine Learning and
Multilingual Transformers Models
Malliga Subramanian1, Kogilavani Shanmuga Vadivel1, Antonette Shibani2, Adhithiya G J1,
Deepti R1, Gowtham Krishnan S1
1
 Department of Computer Science and Engineering, Kongu Engineering College, Erode, Tamil Nadu, India
2
 University of Technology Sydney, Ultimo, Australia

               Abstract

               Social media has facilitated an exponential increase in the distribution of hostile and toxic content
               in today's world. To tackle this issue, automated detection of such offensive content including hate
               speech, inflammatory, and abusive language have recently piqued the interest of many in the
               Natural Language Processing community. In this paper, we present machine learning and
               multilingual transformer models to automatically classify Tamil language comments as Offensive
               or Not-Offensive texts. The models make use of the dataset supplied for the Hate Speech and
               Offensive Content Identification (HASOC) challenge in Dravidian Languages under FIRE 2021.
               Among the proposed offensive language identification models, RoBERTa outperforms the others
               with an accuracy of 91.39%.

               Keywords
               Hate Speech, Offensive, Toxicity, Social Media, Youtube, Tamil, Machine Learning, RoBERTa,
               HASOC


1. Introduction
   People around the world increasingly use social media platforms to share information and communicate
with each other as part of their daily interactions. The widespread popularity of social media and micro-
blogging platforms enhances connectivity among people, but it can also negatively affect their lives.
Hateful and offensive comments have proliferated in social media, often by toxic users who hide under
anonymity features of these platforms bypassing editorial control [1, 2]. If not curbed early, such toxic
behavior can have a ripple effect and dissuade other people from being a part of the community [3] The
hostile environment can stop them from expressing themselves freely due to the fear of being abused or
harassed. Offensive language, hate speech and other objectionable content on the internet hence pose a
threat to the well-being of our society [4].
   The adverse societal impact of the spread of offensive language in social media is well recognized.
However, the sheer scale of the issue and lack of regulation makes it a hard problem to tackle. Many
countries outlaw hate speech on social media, with the caveat that it does not target any specific group or
incite criminal behavior. Because hate speech shapes public opinion, platforms including YouTube,
Facebook, and Twitter, have policies and tools in place to moderate hate speech content and related
offensive behaviour to curb its ill effects on society [5]. One common approach is to employ automated
methods that identify offensive language in order to remove/ hide them from other users [6].
   Systems for the automatic identification of hate and offensive language in Natural Language Processing
(NLP) usually fall under one of the following categories: i) feature-based linear classifiers [7, 8], ii) neural
network architectures such as CNN, RNN [9-11] and, iii) fine-tuned pre-trained language models, such as
BERT (Bidirectional Encoder Representation from Transformer) and RoBERTa (Robustly Optimized


Forum for Information Retrieval Evaluation, December 13-17, 2021, India
EMAIL: mallinishanth72@gmail.com( Malliga Subramanian);kogilavani.sv@gmail.com(Kogilavani Shanmuga Vadivel);
antonette.shibani@gmail.com (Antonette Shibani)
ORCID: https://orcid.org/0000-0003-3263-0376 (Malliga Subramanian)
             © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
   BERT Pretraining Approach), among others [12, 13]. Building on this work, we present machine
learning and multilingual transformer models to automatically detect YouTube user comments as offensive
or Not-Offensive texts for one of the tasks assigned under HASOC-Offensive Language Identification in
Dravidian Code-Mix FIRE 20211. We specifically focus on Tamil, a classical language originating in India
and spoken in Tamil Nadu, India's southernmost state, as well as Sri Lanka, Malaysia, and Singapore
   The remainder of the paper is structured as follows: Section 2 reviews background work on hate speech
and offensive language detection by surveying existing research on the topic. Section 3 provides a task
description, a summary of the dataset used in the study and 4 explains our proposed models, followed by
results and discussion in Section 4. Finally, we conclude the paper in Section 5 by summarizing the findings
and implications of the work.

2. Literature Survey
    The sheer volume and the growing number of people who use social media make it implausible for
manual detection and removal of objectionable content through moderators. There is a high demand for
tools and techniques that automatically detect offensive content on the internet to reduce its spread quickly
and effectively. Recent years have witnessed a surge in the development of automated systems that filter
offensive language on social media platforms, with identified challenges in classifying nuances due to the
subjective and context-dependent nature of texts [6, 14]. Additional challenges emerge when working with
texts containing more than one language –referred to as code-mixed data henceforth, which arises from
multilingual users. Many previous systems eliminate data in languages other than English [14, 15],
inducing scarcity in research in this area for low-resource languages.
    As the user-generated content in social-media is typically code-mixed and not well studied for under-
resourced languages, recent works have involved the development of systems on offensive text detection
for such languages [16-18]. Additionally, the findings of the first shared task on Offensive Language
Identification in Tamil, Malayalam, and Kannada, which relied on a thoroughly annotated data set based
on human judgments was presented by [16]. The task setup allowed for testing of the model in multilingual
contexts as well as the code-mixing phenomenon.
    A novel attempt in [17] included a multimodal classification problem in the shared task "Troll Meme
Classification in Tamil" introducing an enhanced TamilMeme dataset that now includes Tamil text from
memes. This work while presenting a multimodal classification problem also discussed difficulties in
natural language processing of a low-resourced language and code-mixed dataset. Moreover, work by [18]
describe the shared task of machine translation for Dravidian (Tamil) languages presented at the first
workshop on Speech and Language Technologies for Dravidian Technologies, whereby the best
performing systems showcased a high Bilingual Evaluation Understudy Score despite a shortage of training
data. Further work is essential in the detection of offensive language in under-resourced Tamil, Malayalam
and Kannada languages to add to this growing area.
    Next, we highlight the approaches based on machine learning and natural language processing that have
made significant progress in automatically detecting offensive language on web platforms. Traditional
machine learning approaches employ features such as word-level and character-level n-grams, amongst
others. Using a multi-class classifier, Davidson et. al.[19] classified tweets as hate speech, offensive,
neither hate nor offensive using Naive Bayes, Decision Trees, Random Forests etc. with 5-fold cross
validation and claimed model performance metrics as follows: precision 0.91, recall 0.90, and F1 score
0.90. Gibert et al. [20] created a manually labelled hate speech dataset from Stormfront, a white supremacist
online forum that contains hateful and non-hateful sentences. To annotate the test data based on hand-
annotated training data, Support Vector Machines (SVM), Convolutional Neural Networks (CNN), and
Recurrent Neural Networks have been used.
    When comparing linear classifiers to neural networks, the results were inconsistent across datasets and
architectures, with linear classifiers proving to be very competitive, if not superior. Systems based on pre-
trained language models, on the other hand, have proven to have the best performance in this area,
achieving new state-of-the-art results. However, one limitation in the case of these pre-trained models is
that they are suited only for general-purpose language comprehension tasks because of their training
language variety. To test machine learning models for hate speech classification, the authors have provided
hate speech benchmark datasets [21]. In addition, the advantages and disadvantages of single and hybrid

     1
         http://fire.irsi.res.in/fire/2021/home
machine learning methods in the classification of hate speech were discussed in [21]. Besides the existing
approaches and datasets for hate speech detection, MacAvaney et al. [22] investigated the challenges of
automatic approaches for hate speech detection online. This study proposed a multi-view SVM approach
that outperforms neural methods while being simpler and more interpretable. The experiments use datasets
such as HatebaseTwitter, Stormfront, and TRAC [22]. Using bidirectional LSTM, Garain and Basu [23]
described their system to detect offensive language in Twitter.
    Although numerous feature extraction methods have been utilised in machine learning-based
approaches, what they require in common is a well-defined feature extraction strategy. To increase the
performance of hate speech and offensive content detection models, neural network models now use text
representation and deep learning methodologies such as CNNs , Bi-directional Long Short-Term Memory
Networks (LSTMs), and BERT [24]. A study in [24] employed pre-trained BERT and multilingualBERT
to detect hate speech and offensive content in English, German, and Hindi. The authors of [12]
experimented with the following classifiers: a linear model containing features from word unigrams,
word2vec, and Hatebase, word-based LSTM and a fine-tuned BERT. For this work, the Offensive
Language Identification Dataset (OLID) is gathered via the Twitter API by searching a specific set of terms.
Another study proposed BERT, a transfer learning approach based on an existing pre-trained language
bidirectional encoder representation models [25]. In particular, this work studied BERT's ability to capture
hateful contexts in social media content using new fine-tuning strategies based on transfer learning, and
employed two publicly available datasets annotated for racism, sexism, hate, or offensive content on
Twitter to assess the proposed approach.
    While multilingual transformer models were seen to be successful for the shared task provided in [16],
a few machine learning models also performed well using hybrid approaches for feature
selection/extraction from texts rather than single feature selection/extraction algorithms. In comparison to
utilising a transformer, the advantage of employing machine learning models with feature selection
techniques is that the model remains simple for easier interpretation, and hence form the basis of the work
we describe in this paper. We demonstrate four different machine learning algorithms to categorize
YouTube comments as offensive or Not-Offensive in addition to RoBERTa. Based on the no free lunch
theorem [26] and our review of the current works with disparity in datasets, we note that, a method that
works well for one dataset might not be appropriate for another. This means that there might not be a single
classifier which best performs on all kinds of datasets and hence, we apply several different classifiers on
a feature vector to observe which one provides better results. We provide more details of our approach in
the next section.

3. Materials and Methods

3.1 Taskset Description
    Offensive texts in this study adopt the common definition for offensive language often expressed as
‘flames’, which refers to “offensive messages or remarks that in some circumstances are inappropriate,
exhibit a lack of respect towards certain groups of people or are just rude in general”. The dataset used to
identify offensive language content consists of code-mixed data of Tamil comments collected from
YouTube media, made available through HASOC-Offensive Language Identification track in Dravidian
Code-Mix FIRE 20211. The comments are in code-mixed form (Mixture of Native and Roman Script)
comprising of Tamil, English and Malayalam languages [27] .
     There are comments in Tamil, English, and Malayalam in the dataset, and none of them are totally in
English or Malayalam, but are mixed with Tamil. The comments in dataset contains more than one
sentence, but the corpora's average sentence length is one [28]. Each comment in the dataset was annotated
as either Offensive (OFF), Not-Offensive (NOT) or Non-Tamil. The dataset contained 5880 Tamil texts
for training and 654 Tamil texts for testing with class labels as Offensive, Not-Offensive and Not-Tamil.
The training set has 1153 texts under Offensive and 4724 as Not-Offensive texts. Three more texts mixed
with Tamil and English/Malayalam were omitted. Since the dataset was imbalanced, we augmented
available data by shuffling the texts to generate new texts. This was done by shuffling the words in each
sentence to generate a new one [29]. We also reduced the imbalance by applying Synthetic Minority
Oversampling TEchnique (SMOTE) [30], which is an oversampling approach that creates synthetic
samples from the minority class. SMOTE is used to generate a synthetically or nearly class-balanced
training set, which is subsequently utilized to train the classifier and works by selecting instances in the
feature space that are near together. Sample Offensive and Not-Offensive texts from the dataset are
presented in Table 1.

Table 1
Sample training texts from the dataset

 Documents            Texts                                                                             Label
 Document[10]         படம் அழகாக இருக்குங் க அத விட எதார்த்தமாயிருக்குங் க                              NOT
 Document[11]         இந்த படம் மாபபரும் பெற் றி பபற் று தமிழ் சினிமாவில்
                      ஆதிக்கம் பெலுத்தும் ,,,இது பபான்ற பமலும் பல படங் களை
                      இயக்க ொழ் த்துக்கை்                                                              NOT
 Document[13]         தமிழக மக்கை் ொர்பாக ொழ் த்துக்கை் ... ொதிகை்
                      இல் ளலயடி பாப் பா. ெலுளககை் நிளறய பெண்டுமடி
                      ((ஓட்டுக்காக)) பாப் பா..                                                          NOT
 Document[352]         இப் புடி படிெ்ெ பெங் க எல் லாம் பெனானு ஓடி ஓடி
                       தான்யா நாபட நாெமா பபாயிருெ்சு....                                                OFF
 Document[62]          குரூமா குருப் பு குண்டி பெந்பத ொகபபாரானுக
                       ொழ் த்துக்கை் பமாகன் ொர்....                                                   OFF

3.2 Pre-processing and Feature Extraction
   Preprocessing the corpus (or data cleaning) is the first step to build any classifier. As the corpus
contained emojis, punctuation characters and texts that are not in Tamil, these were removed by
preprocessing them. The emojis were converted into text using the Emoji for Python library and
CountVectorizer was used to convert the text corpus to a vector of term/token counts. The CountVectorizer
from Python scikit-learn library looks after controlling n-gram size, custom preprocessing, tokenization,
stop words, and vocabulary size, making it a versatile feature representation module for texts. The result
of the vectorizer is an encoded vector with the vocabulary length and an integer count of each word's
appearances in the document. Sample results from CountVectorizer for the dataset take is shown in Table
2. Here, we took the top 1050 features with word length more than or equal to 3. Note that CountVectorizer
does not store these words as strings. Rather, they are assigned an index value. In this case, ‘படம் ' has
index 0, ‘அழகாக' has index 1, and so on. Using fit() and transform() methods, each vector's value is
calculated for each comment in the training set before inputting them to the classifiers.
Table 2
 Results from Count Vectorizer

  Text         படம்           அழகாக இருக்குங் க எதார்த்தமாயிருக்குங் க பாப் பா
  Document[10]  1               1       1                1               0
  Document[11]  1               0       0                0               0
  Document[13]  0               0       0                0               2

3.3. Proposed Classifiers

   Having described the text to feature transformation, we present the classification algorithms used for
offensive text detection in our study. We propose five classifiers based on machine learning namely, Naïve
Bayes Multinomial classifier, SVM, Logistic Regression and KNN and a transformer model called
 RoBERTa. The output from count vectorizer is fed as input to these classifiers and the output is the
 classification identified for each text. The scikit-learn libraries in Python have been used to implement the
 proposed models.
    We now explain the rationale behind the use of the above models. The Naive Bayes Multinomial
 Classifier is a probabilistic model and a specialized version of the Naive Bayes algorithm. Simple Naive
 Bayes models a document for the presence or absence of specific words, whereas Multinomial classifier
 explicitly models the word counts and works well on small amounts of training data and trains relatively
 quickly when compared to other models [31]. Support-vector machines (SVM) are supervised
 classification algorithms that learn from training data to construct an optimal hyperplane that separates the
 categories while classifying new data. It handles a large number of samples well as SVMs are designed to
 find a hyperplane that maximizes the marginal distance between classes. In binary classification, the
 support vectors generate a hyperplane that divides the cases into two non-overlapping classes. SVM
 classifiers perform admirably in text classification tasks [31]. SVC() from the scikit-learn library has been
 used in our experiments with Radial Basis Function (RBF) as the kernel function.
     Another model called Logistic regression uses a sigmoid function to explain the relationship between
 one independent variable and one or more independent variables. Here, we employed L2 regularization
 for this model to deal with data that contains a binary dependent variable, such as offensive or not offensive
 It calculates the probability of an output by combining the independent or prediction variables in a linear
 fashion. KNN is a simple text classification algorithm that categorizes new data by comparing it to all
 available data using some similarity measure. While KNN is a simple algorithm that applies to both
 classification and regression tasks and makes no assumptions about the data, it is memory and time
 intensive because all training data must be stored. All the classifiers give different values for performance
 metrics, as they perform differently based on their features discussed above.
    RoBERTa is one of the most intriguing architectures derived from the BERT revolution. According to
the authors of [32], while BERT exhibited a substantial performance improvement across various tasks, it
was undertrained. RoBERTa improves the training technique by removing the next sentence prediction
task from BERT's pre-training and introducing dynamic masking, which causes the masked token to change
during the training epochs. Larger batch-training sizes have also been found to be more useful in the
training operation. This enables RoBERTa to outperform BERT on the masked language modelling
objective, resulting in higher downstream task performance.

3.4 Model implementation
    Python programming language and its Sci-kit learn library have been used to implement the
classification models. Google Co-laboratory (Colab) was used to train and test the proposed models. Colab
is a fully cloud-based Jupyter notebook environment that doesn't require any desktop setup, hence it can
run on any browser. The code-base is released open-source on Github.

4. Results and Discussion
   In this section, we discuss the results of the classification models built for automatic detection of
Offensive and Not-Offensive Tamil texts using the Dravidian Code-Mix FIRE 20211 dataset. We train each
classifier using the features extracted from the training set and test the models using the test dataset
provided.

4.1 Performance Metrics
    The performance of the different models used for the classification problem have been evaluated using
the following metrics: Accuracy, Precision, Recall and F1-Score [21]. These metrics commonly used for
the evaluation of classifiers are defined as follows.
    Accuracy is defined as the number of texts correctly classified as belonging to a specific class
 divided by the total number of texts in that class and is calculated by Equation (1).
                  Accuracy = (TP + TN)/(TP + TN + FP + FN)                                           (1)

    Where, TP = True positive (the number of correctly classified texts for each class), TN = True
 Negative (the number of texts correctly classified in other class except the correct class), FP = False
 Positive (number of texts misclassified in other class except the right class) and FN = False Negative
 (the number of texts misclassified in the relevant class) in the confusion matrix.
    The number of texts correctly categorized as a certain class out of the total number of actual texts in
that class is defined as Recall (also known as Sensitivity or True Positive Rate) and is computed using
Equation (2).
                                     Recall = TP /(TP + FN)                                          (2)

   Precision (Positive Predictive Value) is defined as the number of texts accurately categorized as a
specific class out of the total number of texts categorized as that class, and is given by Equation (3).

                                  Precision = TP /(TP + FP)                                            (3)

   F1-Score is defined as the harmonic average of the Precision and Recall, that is, the weighted average
of Precision and Recall. It is calculated as in Equation (4).

                      F1-Score = (2*Precision*Recall)/(Precision + Recall)                             (4)

    The values of the performance metrics obtained for the test dataset for each of the classifiers are
presented in Table 3. Based on the indices, we present the generated confusion matrix, a model matrix that
compares the actual target values with those predicted by a machine learning model that helps us perform
error analysis of the proposed models. The confusion matrix for the proposed models is shown in Figure
1. The diagonal elements of the confusion matrix represent correct classifications. The predicted classes
are represented by X axis, and the actual classes are represented by Y axis. For instance, Figure 1(a) shows
that the naïve bayes model classified 50 offensive texts correctly as offensive and 83 offensive texts
incorrectly as Not-Offensive.
    Table 3 presents the results of classifiers in terms of the performance measures. RoBERTa provides the
highest accuracy in comparison to other classifiers. This high accuracy from RoBERTa is because of the
significant improvement in the model by training it longer with larger batches over more data, removing
the next sentence prediction objective,; training on longer sequences (takes complete sentences as input to
the model, as opposed to the base model BERT),; dynamically changing the masking pattern applied to the
training data (RoBERTa can make several more distinct masking patterns in the same sequence which
reduces the need to significantly increase the number of instances of training) and creating a large
vocabulary using Byte-Pair Encoding. However, this model is much more complex and requires a large


                  (a) Multinomial NB                                          (b) LR
                        ( c) SVM                                                (d) KNN

Figure 1. Confusion Matrix

amount of time to run. But, [32] recognizes that RoBERTa has a bigger vocabulary, which allows the
model to represent any word resulting in more parameters, and the increase in complexity is justified by
gain in performance.

Table 3
Performance of classifiers
  Classifiers     Class Labels            Accuracy (%)          Precision      Recall       F1-Score
                                                                (%)            (%)          (%)
  Naïve Bayes       Not-Offensive                               84.515         86.948       85.714
                                          76.911
  Multinomial       Offensive                                   42.373         37.594       39.841
  SVM               Not-Offensive                               94.216         84.167       88.908
                                          80.733
                    Offensive                                   19.492         42.593       26.744
  KNN               Not-Offensive                               87.500         86.691       87.094
                                          78.746
                    Offensive                                   38.983         40.708       39.827
  Logistic          Not-Offensive                               84.515         87.791       86.`122
                                          80.733
  Regression        Offensive                                   46.610         39.855       42.969
  RoBERTa           Not-Offensive         91.390                96.455         84.339       89.991
                    Offensive                                   18.644         53.659       27.673

    For offensive class, the Recall and F1-score had low values compared to Not-Offensive classes. We
understand that this is due to the smaller number of texts under this class. We have augmented texts for
this class using SMOTE techniques, however, the low values for Not-Offensive class persisted. We used
count vectors to extract features from the texts in this work - one issue with simple counts is that frequently
used words, such as "an" and "the," make their high counts meaningless in the encoded vectors. An alternate
option is to use TF-IDF to calculate word frequencies, which might be superior to Count Vectorizers
because it not only considers the frequency of words in the corpus but also their importance. Future work
can remove the words that are less important for analysis, reducing the input dimensions and making the
model building less complex.

5.    Conclusion
This paper presented experimental work and respective results of the task to detect offensive content in
code-mixed dataset of Dravidian languages to tackle the problem of offensive language in social media.
We provided an in-depth review of past techniques in offensive text classification, and new models to
automatically detect offensive texts in Tamil. Count vector was used to extract features from the dataset
and different classifiers such as Naïve Bayes Multinomial model, SVM, KNN, Logistic Regression and
RoBERTa were built for this task. Of these models, RoBERTa exhibited higher accuracy than other
models. Our work could be further extended using other possible numerical/vectorial representation of
texts and new classifiers based on neural networks, which have advanced linguistic features. The study
contributes to research in offensive text detection for under resourced languages like Tamil, which we hope
can inform future work in this area.


References
[1].   J. Blair, "New breed of bullies torment their peers on the Internet," Education Week, 22 (2003):
       6.
[2]    S.-H. Lee and H.-W. Kim, "Why people post benevolent and malicious comments online,"
       Communications of the ACM, 58(2015) : 74-79.
[3]    A. M. Obadimu, "Assessing the Role of Social Media Platforms in the Propagation of Toxicity,"
       University of Arkansas at Little Rock, 2020.
[4]    T. De Smedt, S. Jaki, E. Kotzé, L. Saoud, M. Gwóźdź, G. De Pauw, et al., "Multilingual cross-
       domain perspectives on online hate speech," arXiv preprint arXiv:1809.03944, 2018.
[5]    N. Alkiviadou, "Hate speech on social media networks: towards a regulatory framework?,"
       Information & Communications Technology Law, 28 (2019): 19-35.
[6]    B. Vandersmissen, "Automated detection of offensive language behavior on social networking
       sites," IEEE Transaction, 2012, http://lib.ugent.be/catalog/rug01:001887239
[7]    Z. Waseem and D. Hovy, Hateful symbols or hateful people? predictive features for hate speech
       detection on twitter, in: Proceedings of the NAACL student research workshop, (2016):88-93.
[8]    M. H. Ribeiro, P. H. Calais, Y. A. Santos, V. A. Almeida, and W. Meira Jr, "Characterizing and
       detecting hateful users on twitter, in Twelfth international AAAI conference on web and social
       media, 2018.
[9]    R. Kshirsagar, T. Cukuvac, K. McKeown, and S. McGregor, "Predictive embeddings for hate
       speech detection on twitter," arXiv preprint arXiv:1809.10644, (2018)
[10]   P. Mishra, H. Yannakoudakis, and E. Shutova, "Neural character-based composition models for
       abuse detection," arXiv preprint arXiv:1809.00378,(2018).
[11]   J. Mitrović, B. Birkeneder, and M. Granitzer, nlpUP at SemEval-2019 task 6: A deep neural
       language model for offensive language detection, in: Proceedings of the 13th International
       Workshop on Semantic Evaluation, 2019.
[12]   P. Liu, W. Li, and L. Zou, NULI at SemEval-2019 task 6: Transfer learning for offensive
       language detection using bidirectional transformers, in: Proceedings of the 13th international
       workshop on semantic evaluation, (2019): 87-91.
[13]   S. D. Swamy, A. Jamatia, and B. Gambäck, "Studying generalisability across abusive language
       detection datasets," in Proceedings of the 23rd conference on computational natural language
       learning (CoNLL), 2019, pp. 940-950.
[14]   A. Schmidt and M. Wiegand, A survey on hate speech detection using natural language
       processing, in: Proceedings of the fifth international workshop on natural language processing
       for social media, 2017, pp. 1-10.
[15]   C. Cieri, M. Maxwell, S. Strassel, and J. Tracey, Selection criteria for low resource language
       programs, in: Proceedings of the Tenth International Conference on Language Resources and
       Evaluation (LREC'16), 2016, pp. 4543-4549.
[16]   B. R. Chakravarthi, R. Priyadharshini, N. Jose, T. Mandl, P. K. Kumaresan, R. Ponnusamy, et
       al., Findings of the shared task on offensive language identification in Tamil, Malayalam, and
       Kannada,in: Proceedings of the First Workshop on Speech and Language Technologies for
       Dravidian Languages, 2021, pp. 133-145.
[17]   S. Suryawanshi and B. R. Chakravarthi, Findings of the shared task on Troll Meme
       Classification in Tamil, in: Proceedings of the First Workshop on Speech and Language
       Technologies for Dravidian Languages, 2021, pp. 126-132.
[18]   B. R. Chakravarthi, R. Priyadharshini, S. Banerjee, R. Saldanha, J. P. McCrae, P.
       Krishnamurthy, et al., Findings of the Shared Task on Machine Translation in Dravidian
       languages, in: Proceedings of the First Workshop on Speech and Language Technologies for
       Dravidian Languages, 2021, pp. 119-125.
[19]   T. Davidson, D. Warmsley, M. Macy, and I. Weber, Automated hate speech detection and the
       problem of offensive language, in : Proceedings of the International AAAI Conference on Web
       and Social Media, 2017.
[20]   A. Gaydhani, V. Doma, S. Kendre, and L. Bhagwat, "Detecting hate speech and offensive
       language on twitter using machine learning: An n-gram and tfidf based approach," arXiv
       preprint arXiv:1809.08651, 2018.
[21]   F. E. Ayo, O. Folorunso, F. T. Ibharalu, and I. A. Osinuga, "Machine learning techniques for
       hate speech classification of twitter data: State-of-the-art, future challenges and research
       directions," Computer Science Review, vol. 38, p. 100311, 2020.
[22]   S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, and O. Frieder, "Hate speech
       detection: Challenges and solutions," PloS one, vol. 14, p. e0221152, 2019.
[23]   M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar, "Semeval-2019 task
       6: Identifying and categorizing offensive language in social media (offenseval)," arXiv preprint
       arXiv:1903.08983, 2019.
[24]   S. Dowlagar and R. Mamidi, "HASOCOne@ FIRE-HASOC2020: Using BERT and
       Multilingual BERT models for Hate Speech Detection," arXiv preprint arXiv:2101.09007,
       2021.
[25]   M. Mozafari, R. Farahbakhsh, and N. Crespi, A BERT-based transfer learning approach for hate
       speech detection in online social media, in: International Conference on Complex Networks and
       Their Applications, 2019, pp. 928-940.
[26]   Y.-C. Ho and D. L. Pepyne, "Simple explanation of the no-free-lunch theorem and its
       implications," Journal of optimization theory and applications, 115(2002): 549-570.
[27]   B. R. Chakravarthi, R. Sakuntharaj, A. K. Madasamy, S. Thavareesan, and S. C. N. P. B, J. P.
       McCrae, T. Mandl,, "Overview of the HASOC-DravidianCodeMix Shared Task on Offensive
       Language Detection in Tamil and Malayalam,," Working Notes of FIRE 2021 - Forum for
       Information Retrieval Evaluation, CEUR, 2021.
[28]   B. R. Chakravarthi and V. Muralidaran, Findings of the shared task on Hope Speech Detection
       for Equality, Diversity, and Inclusion, in: Proceedings of the First Workshop on Language
       Technology for Equality, Diversity and Inclusion, 2021, pp. 61-72.
[29]   Data Augmentation in NLP. 2021 URL: https://neptune.ai/blog/data-augmentation-nlp
[30]   N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority
       over-sampling technique," Journal of artificial intelligence research, 16(2002): 321-357..
[31]   S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, "Supervised machine learning: A review of
       classification techniques," Emerging artificial intelligence applications in computer
       engineering, 160(2007): 3-24.
[32]   Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, et al., "Roberta: A robustly optimized bert
       pretraining approach," arXiv preprint arXiv:1907.11692, 2019.
.

</pre>