=Paper= {{Paper |id=Vol-3159/T1-46 |storemode=property |title=Hate Speech and Offensive Content Identification in Hindi and Marathi Language Tweets using Ensemble Techniques |pdfUrl=https://ceur-ws.org/Vol-3159/T1-46.pdf |volume=Vol-3159 |authors=Ratnavel Rajalakshmi,Faerie Mattins,Srivarshan S,Preethi Reddy,M Anand Kumar |dblpUrl=https://dblp.org/rec/conf/fire/RajalakshmiMSRK21 }} ==Hate Speech and Offensive Content Identification in Hindi and Marathi Language Tweets using Ensemble Techniques== https://ceur-ws.org/Vol-3159/T1-46.pdf
Hate Speech and Offensive Content Identification in
Hindi and Marathi Language Tweets using Ensemble
Techniques
Ratnavel Rajalakshmi1 , Faerie Mattins1 , S Srivarshan1 , L Preethi Reddy1 and
Anand Kumar M.2
1
    School of Computer Science and Engineering, Vellore Institute of Technology, Chennai
2
    Department of Information Technology, National Institute of Technology Karnataka (NITK), Surathkal


                                         Abstract
                                         Hate Speech is described as any form of speech in which speakers attempt to ridicule, humiliate, or
                                         inculcate hatred in someone else’s minds based on characteristics such as religion, the colour of skin,
                                         race, or sexual preference. In recent years, social networking sites have been a major source of excessive
                                         amounts of hate speech. If unaddressed, these might cause anxiety and despair in the affected individuals
                                         or groups. As a result, the above-mentioned social networks utilize an assortment of algorithms to
                                         identify such hate speech. Detecting Hate Speech in English texts has been one of the hottest topics
                                         in recent years, with multiple types of research being published. However, in regional and indigenous
                                         languages, hate speech detection is a recent area with not much research being conducted. It is difficult
                                         to perform hate speech detection using data in regional languages due to a lack of large enough training
                                         data and a lack of resources about that domain. The HASOC [1] 2021 Hate Speech Detection Task solves
                                         one of the problems. It provides a dataset containing Tweet data in English, Hindi [2] and Marathi [3]
                                         languages. There were two subtasks as part of the main task. The subtask was to classify the hate speech
                                         and offensive texts in the Hindi and Marathi tweet dataset as Hate Speech (HATE), Offensive (OFFN) or
                                         Profane (PRF). This work compares the performance of different models on both subtasks and provides a
                                         conclusion on the best performing model. The Random Forest Classifier reports the most remarkable
                                         accuracy on the first subtask with a macro F1 score of 75.19% and 73.12% on the Marathi and Hindi
                                         tweet datasets. The XGBoost algorithm is the best performing algorithm on the second subtask with a
                                         46.5% macro F1 score. Overall any of these models can get satisfactory results when dealing with hate
                                         speech detection in regional language. This work has been submitted to the FIRE2021 shared task, Hate
                                         Speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC-2021) by
                                         team DLRG.

                                         Keywords
                                         Hate speech detection, Multilingual tweets, Machine learning, XGBoost, Majority voting, Random forest




1. Introduction
Hate speech is defined as any communication, whether spoken, written, or physical, that criti-
cises or uses discriminating, or disparaging terminology about a person ( or group) based on
Forum for Information Retrieval Evaluation, December 13-17, 2021, India
$ rajalakshmi.r@vit.ac.in (R. Rajalakshmi); faeriemattins.r2019@vitstudent.ac.in (F. Mattins);
srivarshan.2019@vitstudent.ac.in (S. Srivarshan); preethireddi17@gmail.com (L. P. Reddy);
m1_anandkumar@nitk.edu.in (A. K. M.)
 0000-0002-6570-483X (R. Rajalakshmi)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
who they are, such as colour, religion, nationality, ethnicity, race, sexuality, descent, or another
identifying feature. Nowadays, social media has evolved into a popular forum for ordinary
people to share their feelings on any topic. It has its own set of advantages and disadvantages.
Everyone understands their point of view and perceives the world in a new light when they
express themselves. However, this might also imply that people have the right to share whatever
they wish, even offensive content that is harsh and upsetting to people. It impacts not only the
feelings of that particular group but also their self-esteem and dignity. As a result, maintaining
a courteous demeanour while portraying one’s viewpoint on any social media site is essential.
Unfortunately, preventing people from posting or tweeting hate speech on social media is quite
tricky. Thence, every social media company globally is attempting to build a revolutionary
approach for detecting and preventing hate speech. It has been done to some level in English,
as English is a widely known and spoken language worldwide. However, this is not the case in
Indian regional languages such as Hindi, Marathi, Tamil, and others. Therefore, it is critical to
have a solution in place.

Machine learning is an application of data analysis that facilitates the creation of analytical
models. Artificial intelligence is predicated on the idea that computers can recognise patterns,
learn from data, and make decisions without the need for humans. Machine learning has been
utilised to solve problems in many science and technology domains, including social media.
Many academics are now working on multiple machine learning models to detect highly accu-
rate hate speech and an optimised model. However, there has not been much advancement in
the subject of detecting hate speech in regional languages. Because it is challenging to train
a model in a language that isn’t widely spoken and hence lacks the necessary dataset and
procedures to train the model effectively. In this study, data embedding and multiple classifier
models were employed to determine whether or not a tweet contains hate and offensive content
in two Indian regional languages, namely Hindi and Marathi. This research is split into two
sections: subtask A and subtask B. Binary classification models were used to determine if the
tweet contained hate speech in Hindi and Marathi. The multiclass classification was used for
subtask B to classify hate speech further, and they are: hate speech, profane or offensive content
in Hindi. Our team DLRG participated in Subtask A-Marathi, Subtask-A and B for Hindi.

In section 2 of this paper, the related works have been analysed and discussed. In section
3, the dataset utilised in both subtasks is discussed. Section 4 includes an explanation of the
proposed methodology. In section 5, the findings are presented and discussed, and in section 6,
the conclusion and future work are presented.


2. Related Work
In earlier works, different term weighting methods have been applied for web page categorization
using TF-IDF based methods [4]. In [5], the authors used unigrams and to detect the tweet is
offensive or not( binary classification) ,patterns are automatically collected from the training
set. These patterns and unigrams are later used, among others, as features to train a machine
learning algorithm. In this method, they have not considered platforms like Facebook. In
[6], traditional machine learning techniques were applied with SVM for URL based web page
classification. In many recent works, the deep learning algorithms such as Convolutional Neural
Networks, Recurrent Neural Networks are are explored for various applications [7, 8]. Sentiment
analysis in English is a most common problem and many approaches are suggested by various
researchers by applying both machine learning and deep learning techniques [9, 10]. This paper
projects the various works submitted as part of the HASOC FIRE’20 track[11]. Ensemble based
approaches were also reported in [12] for offensive speech detection. A novel relevance factor
has been proposed in [13] to address the challenges in code-mixed Hindi-English tweets. In
[14], Hindi English Code Mixed Hate Speech Detection is performed using Character Level
Embedding. The authors have used different deep learning models, in which Attention Model
with GRU Hybridisation performed the best. However, instead of word embeddings , the authors
have used Character-Level Embeddings which made model adaptable to the common defect. In
[15], the authors applied the Multi Input Multi Channel Transfer Learning based model to detect
hate speech or abusive Hinglish tweets from the Hinglish Offensive Tweet (HOT) dataset.The
authors used transfer learning with multiple feature inputs. They achieved peak performance
using the SVM model. In [16], this paper explains an increment to the state of the art in hate
speech detection for Hindi English code mixed tweets. The authors compared three typical deep
learning models using domain-specific embeddings. However, the proposed method does not
take consideration into code mixed data. In [17], the authors performed hate speech detection
on datasets of two Dravidian languages that are Tamil and Malayalam . They obtained an F1
score 0.77, and similarly, for the Tamil language ,the best performing model obtained an F1
score of 0.87.


3. Dataset
The dataset was obtained as part of the HASOC ’21 competition at FIRE 2021. These datasets
consist of tweets sampled from the Twitter social media platform. The data set consisted of
English, Hindi and Marathi tweets. For this work, the English tweets were disregarded. The
hashtags, URLs and keywords prevalent in regular tweets are also present in this data. For
Subtask A, the dataset was broadly divided into two categories, NOT tweets that contained
non-offensive content and HOF tweets that contained offensive or profane content. The HOF
tweets in the Hindi tweet data were further divided into Hate Speech (HATE), Offensive (OFFN)
and Profane (PRF) as part of Subtask B. The dataset was in a CSV file with each entry containing
the ID, Tweet ID, Text and corresponding labels. The Hindi tweet data consisted of 4594 entries,
while the Marathi data consisted of 1874 entries.


4. Methodology
The proposed methodology for all the subtasks are detailed in the following sections.
4.1. Preprocessing
In the preprocessing of the data, the first step performed was removing noise. The URL links
and Twitter handle names were the primary sources of noise in the dataset. All the words
are removed that followed @ symbol using regular expressions. After the above step, Twitter
handle names are removed and to remove URL links, regular expressions were used to remove
anything that followed HTTP/HTTPS. Then the stop words like a, an, the, is, etc., followed by
punctuation marks and special characters, are removed. These stop words are not necessary to
find the statement’s sentiment as they do not carry any significant meaning. The punctuation
and special characters cannot be used to identify the sentiment.

We removed these as they are unnecessary, and we can work solely on the essential characters
and words for better performance. Further, Stemming is performed to reduce the inflection in
words, and the words can be brought to their respective base forms. Stemming helps deal with
spelt words and remove unwanted suffixes, which is very common in a social media text. We
used a snowball stemmer imported from the NLTK package, and it is based on a programming
language called "Snowball". Finally, we removed high-frequency words. Based on the term
frequency method, we removed the most frequently occurred 1000 words.

4.2. Feature Extraction
The Scikit-Learn [18] library has several vectorizers to process and tokenize text in the same
function, and it involves converting word characters into integers. After the preprocessing
of data, the clean tweets are obtained. By treating these as a sequence of words, the features
were extracted. We used a Count vectorizer for feature extraction and Machine Learning models.

Countvectorizer: We made a loop over n_gram to create unigram, bigram, trigram till the
value of n to 5. The Countvectorizer tokenizes the text by breaking down a sentence into words,
and that vocabulary is used to encode new texts. For the n-gram of 1 to 5, the texts are broken
down into words with the integer count for the number of times it appeared and is returned in
the encoded vector.

4.3. Classifiers
The process by which a group of data is divided into a set of categories is known as classification.
This process can be performed on both unstructured and structured data. The most crucial step
in this process is predicting the category by performing some calculations on the data points
provided. This proposed model uses the following classifiers: Logistic Regression, Support
Vector Machine, Stochastic Gradient Descent, Random Forest, Ensemble, and XGBoost.

4.3.1. Binary Classification
Classifying the labels of a collection into two groups using a classification model is known as
binary classification. In most binary classification problems, one class represents the normal
state, and the other represents the aberrant state. The regular state class is assigned the class
label 0, while the aberrant state class is labelled 1. Logistic Regression, k-Nearest Neighbours,
Decision Trees, Support Vector Machines, and Naive Bayes are the most prominent binary
classification techniques. In this research, subtask A consists of binary classification with the
labels Hate and offensive (HOF) speech or Not hate speech (NONE).

4.3.2. Multiclass Classification
Classifying the labels of a collection into three or more groups using a classification algorithm
is known as multiclass classification. In most multiclass classification problems, the different
classes represent the various labels that are to be assigned to each set of input variables. Gener-
ally, when label encoding is done for multiclass problems, each class would be converted into a
binary number. This method of having multiple output variables instead of multiple outputs
simplifies the process of training. Logistic Regression, Support Vector Machines, K Nearest
Neighbours, Naive Bayes and Decision Trees are some of the most prominent classification
techniques that can also be extended to multiclass problems. In this research, subtask B consists
of multiclass classification with the labels Hate Speech (HATE), Offensive (OFFN) and Profane
(PRF).

4.4. Stochastic Gradient Descent
Gradient descent is an iterative procedure that begins at a random position on the slope of a
function and gradually falls until it reaches its lowest point. Stochastic Gradient Descent is
an optimisation approach that takes one data point at random from the entire data set at each
iteration to minimise computations drastically. SGD is an easy algorithm to implement as well
as an efficient method to find the minimum error point. Because of the randomness of SGD’s
descent, it usually takes more iterations to reach the minima. Even though it takes more rounds
to reach the minima than normal Gradient Descent, it is computationally considerably less
expensive. Stochastic Gradient Descent has been used in both subtasks A and B. This model
was chosen due to the fact that, it takes very little amount of time to train the model. That and
the reason that this model is more likely to reach the local minima of the error function than
other models, makes it an interesting prospect. Python’s Scikit-learn package loads the SGD
model. After that, the model is trained on the training data, and the results are verified and
documented.

4.5. Logistic Regression
This is a popular machine learning algorithm that has seen its fair share of usage as a classifier.
The base for Logistic Regression can be traced to its conception from statistical mathematics.
The model is trained on data where the dependent variables are categorical. It is derived from
the Linear Regression model. The output obtained from the Linear Regression Model is fitted
into a Logistic Function, which predicts the target variable. This model makes use of a decision
boundary. It sets a threshold that differentiates one class of variables from another. The linearity
or the non-linearity of the decision boundaries depend on the input variables. The activation
function used is a Sigmoid activation function. This constraint the output obtained between the
numbers 0 and 1. Hence, the output becomes an estimated probability, which gives the final
class prediction when subjected to the decision boundary.

This model was chosen due to the fact that it is one of the most simple and basic classifiers
that can be implemented and it would provide a good base for comparison against the other
approaches. The Scikit-Learn [19] library for Python provides a Class for initializing a Logistic
Regression Model. It also provides functions for training the model and obtaining predictions
from it. The vectorized input variables and the corresponding labels constitute the training data
on which the Logistic Regression model is loaded and trained. After training the model, it is
validated using the validation data set, and the results are recorded.

4.6. Support Vector Machine
A support vector machine (SVM) is a type of machine learning technique that evaluates data for
categorization and regression analysis. A supervised learning algorithm called a support vector
machine sorts data into two groups. An SVM algorithm’s job is to figure out which category a
new data point fits in. As a result, SVM is classified as a non-binary linear classifier. An SVM’s
output is a map of the sorted data with a separation between both sides. The SVM method
represents each piece of data as a position in n-dimensional space, where each feature is the
value of the specified coordinate. Later, the classification process is then completed by selecting
the hyper-plane that best differentiates the 2 classes. In the SVM classifier, creating a linear
hyper-plane between these two classes is easy. The SVM kernel is a function that transforms
a low-dimensional input space into a higher-dimensional space to convert a non-separable
problem into a separable one. SVM is efficient in high-dimensional spaces and works well with
a clear separation margin. Here in this research, SVM is used in both subtasks. This model was
chosen for the reason that it works extremely well in cases where there are a large number
of features when compared to the training set. This comes in handy when dealing with data
vectorized with TF-IDF where a large number of features are extracted. It also is highly memory
efficient. Python’s Scikit-learn package is used to load the SVM model. After that, the model is
trained on the training data, and the results are verified and documented.

4.7. Ensemble - Majority Voting
When individual classifiers do not perform well on a given dataset, it is common practice to
combine the models in a process called Ensembling. In most cases, this combined model would
perform as well as or even better than the individual classifiers. For t One such Ensembling
technique explored for this task was Majority Voting. It is an ensemble technique where the
predictions from various weak classifiers for a given set of input variables are considered votes.
These votes are tallied together, and the final outputs are obtained from the tallied votes. In the
current Binary Classification problem, the same process occurs. The predictions of all the weak
learners are considered as votes. The number of votes obtained by every output class are tallied.
The output class that gather the highest number of votes will be considered the output for that
particular set of input variables.

The combining classifiers used for this task are Support Vector Machine, Logistic Regression,
Stochastic Gradient Descent, K Nearest neighbours, Naive Bayes, Random Forest and Decision
Tree. Instances of each of the above models are initially trained on the Vectorized input data
and their corresponding labels. Then the validation data is passed to all seven models, and the
predictions are obtained. Majority voting is performed on these predicted labels, and the final
predicted labels from Ensembling are obtained. These predicted labels are compared with the
actual labels to obtain the performance metrics of the model. A similar process is also carried
out for the test dataset.

4.8. XGBoost
XGBoost is a popular Machine Learning algorithm which works as decision tree-based ensemble
using a gradient boosting framework. The XGBoost algorithm takes a text string as input
and loads an XGBoost model trained to predict its label. Bagging and boosting are the most
commonly used ensemble learning techniques. The XGBoost classifier is robust and even in
distributed environments XGBoost yields efficient results.In the Scikit-learn framework,the
algorithm provides a wrapper class by allowing models to be treated like classifiers or regressors
. With this we can use the entire sci-kit-learn library with XGBoost models. The ensemble tree
methods such as XGBoost and Gradient Boosting Machines use gradient descent architecture to
boost weak learners principles.This model was chosen as a fact that it works well with large
number of texts in training dataset. As an advantage, At each iteration the XGBoost algorithm
has a built in method,which is cross validation and it is made easy by not specifying the exact
count of boosting iterations required during a single run.

4.9. Random Forest
Random Forest is an advanced machine learning technique that may be applied to various tasks,
including regression and classification. It is an ensemble method, which means a random forest
resulting in a combination of many small decision trees called estimators, each of which makes
its predictions. To provide a more accurate prediction, the random forest model integrates the
estimators’ predictions. The problem of standard decision tree classifiers is that they are prone
to overfitting the training set. The ensemble design of the random forest compensates for this
and allows the random forest to generalise effectively to anonymous data, including data with
missing values. Random forests can also handle enormous datasets with much dimensionality
and several feature types. While expanding the trees, the random forest adds more randomness
to the model. When splitting a node, it looks for the best feature from a random subset of
features rather than the essential feature. As a result, there is much variety, leading to a better
model. Random forest is used in both subtasks in this study. The Random forest model is loaded
using Python’s Scikit-learn package. The model is then trained using the training data, with the
results being validated and reported.
5. Results and Discussion
5.1. Marathi - Subtask - A
A comparison was performed on various models that were trained on the Marathi dataset. The
different classifiers used after cleaning and vectorising the Marathi tweet dataset are, Logistic
Regression, XGBoost, Support Vector Machine, Stochastic Gradient Descent, Majority Voting
and Random Forest. A comparison of the performance metrics of all the different classifiers
used can be viewed in Table 2. As the table explains, the highest performance was obtained with
the Random Forest classifier model. The model attained an F1 score of 0.7519. Out of all the
classification models implemented, the Logistic regression model achieved minor performance
with an F1 score of 0.6924. It can be explained because the Logistic regression algorithm does
not work well with non-linear data. It also does not bode well with outliers in the training data.
Since the data considered here is mostly text, it is valid to assume that it contains multiple cases
of outliers being present in it.

The Logistic regression algorithm also finds it challenging to accommodate the distributed data,
which might explain its diminished performance. As expected, the Majority Voting technique
performs better than its weak constituent learners due to reasons explained in section 4.7.
However, it can be observed that the usage of Random Forests improves upon this technique. It
can be explained by the fact that Random Forest is by itself an Ensembling algorithm. Tree type
algorithms are generally better than probabilistic models in handling outliers and noise in the
input data. They are also less impacted by the noise. A unique feature of the Tree type models
and incredibly Random Forests is the innate ability to handle missing data independently. The
embedded data is also non-linear. This negatively affects all probabilistic models, as that is the
area in which they thrive. Random Forests are not impacted by non-linear data as much as the
probabilistic models. The comparison between all the different models used and the F1 score
procured by them is illustrated in Figure 1.

5.2. Hindi - Subtask - A
A comparison of several models trained on the Hindi dataset was carried out. After cleaning
and vectorising the Hindi tweet dataset, the following classifiers were used: XGBoost, Logis-
tic Regression, Stochastic Gradient Descent, Support Vector Machine, Majority Voting, and
Random Forest. Table 2 contains a comparison of the performance metrics of all the various
classifiers utilised. The Random Forest classifier model produced the best results, as seen in
the table. The model achieved an F1 score of 0.7312, and the XGBoost model performed the
most unsatisfactory out of all the classification models tested, with an F1 score of 0.6628. This
low performance could be explained because the boosting algorithm does not work well with
distributed data. It’s also bad news for anomalies in the training data. Given that the data
in question is largely text, it’s reasonable to presume that it contains several instances of outliers.

As predicted, the Majority Voting approach outperforms its weak learners, for reasons stated in
section 4.7. However, it can be shown that using Random Forests improves on this method. It
may be explained by the fact that Random Forest is an Ensembling algorithm in and of itself.
In general, tree-type algorithms outperform probabilistic models in dealing with outliers and
noise in input data. They are also less influenced by noise. The capacity to manage missing
data on its own is a specific property of Tree type models, particularly Random Forests. The
embedded data is also non-linear and harms all probabilistic models because it is the domain
in which they thrive. Random Forests are less affected by non-linear data than probabilistic
models. Figure 2 depicts a comparison of all the different models employed and the F1 score
obtained while using those classifiers.




Figure 1: Performance of different classifiers for subtask A : Marathi




Figure 2: Performance of different classifiers for subtask A : Hindi
Figure 3: Performance of different classifiers for subtask B : Hindi


Table 1
The Submissions made for HASOC’21
             Submission name             SubTask           Classifier   Macro F1-Score
                DLRG_M_S1           Subtask A : Marathi     XGBoost         73.38%
                   HTA1              Subtask A : Hindi      XGBoost         66.28%
               DLRG_HT2_S1           Subtask B : Hindi      XGBoost         46.58%


5.3. Hindi - Subtask - B
Several machine learning models were trained on the Hindi dataset and then compared. The
input data was obtained after cleaning the tweets and vectorising them. The various machine
learning algorithms used are XGBoost, Logistic Regression, Stochastic Gradient Descent, Sup-
port Vector Machine, Majority Voting and Random Forest. A comparison of performance metrics
for all the above models is shown in Table 2. For the given subtask, XGBoost performed well
with the highest F1 score of 0.4658, followed by the Ensemble model with an F1 score of 0.4425.
The model with the lowest performance observed was the Random forest, with a macro F1 score
of 0.405. This can be attributed to the fact that as the number of features increase, Random forest
algorithm is more prone to overfitting. Following Random forest, the next least performance
was showed by Support Vector machine with a macro F1 score of 0.4225. This performance
achieved by Support Vector Machine can be explained because it does not perform well for
large datasets or datasets with more noise. This is especially true with text data as it is prone to
the presence of noise.

For multiclass classification, XGBoost performs well. At each iteration of the boosting pro-
cess,the XGBoost algorithm runs cross validation, and in a single run the exact optimum number
Table 2
Performance Analysis - A comparative study of different classifiers for all the subtasks
       SubTask               Classifier        Macro Precision      Macro Recall      Macro F1-Score
                                SGD                  70.43%             69.04%             69.58%
                         Logistic Regression         78.17%             67.77%             69.24%
 Subtask A : Marathi            SVM                  77.33%             70.44%             72.02%
                              Ensemble               74.69%             73.01%             73.68%
                              XGBoost                74.84%             71.98%             73.38%
                           Random Forest             76.55%             74.33%             75.19%
                                SGD                  72.40%             69.36%             70.32%
                         Logistic Regression         76.08%             67.68%             68.97%
  Subtask A : Hindi             SVM                  76.37%             67.52%             68.92%
                              Ensemble               74.90%             70.98%             72.17%
                              XGBoost                67.96%             64.68%             66.28%
                           Random Forest             76.98%             71.65%             73.12%
                                SGD                  45.25%             42.35%             43.75%
                         Logistic Regression           52%              38.13%               44%
  Subtask B : Hindi             SVM                   43.5%             41.25%             42.25%
                              Ensemble               51.75%             38.65%             44.25%
                           Random Forest             52.25%             33.06%              40.5%
                              XGBoost                54.92%             40.44%             46.58%


of boosting iterations are obtained. It leads to good results being obtained. The Majority Voting
performed well after XGBoost due to the reasons explained in section 4.7. Therefore for the
Hindi subtask B dataset, XGBoost produced the best performance and Random forest exhibited
the least performance. Figure 3 illustrates the macro F1 scores obtained by different models
in subtask B. Table 1 shows the various submissions made to the FIRE2021 shared task, Hate
Speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC-
2021), along with their subtask names, classifiers and macro F1-scores. We have extended the
experiments by applying other classifiers and tried to improve the performance, the final results
are tabulated in Table 2.


6. Conclusion and Future Scope
This work was submitted to the FIRE2021 shared task, Hate Speech and Offensive Content Iden-
tification in English and Indo-Aryan Languages (HASOC-2021). The challenge of recognising
conversational hate-speech in Indian regional languages has been empirically studied in this
research utilising binary and multiclass classifiers for Marathi and Hindi tweets. This research
has projected a complete method from analysing the tweets to comprehending the results. In
subtask A, Random forest showed the best Macro F1-score of 75.19% and 73.12% respectively for
both Marathi and Hindi tweets. With a Macro F1-score of 46.58% in subtask B for Hindi tweets,
it is evident that XGBoost produced the most significant results. Overall, it can be inferred
that Random Forest works well for binary classification, whereas XGBoost is a better classifier
model for multiclass classification. In this work, only one type of feature extraction method has
been implemented. In the future, multiple other types of embeddings can be implemented to
get a better model by combining it with deep learning algorithms.


References
 [1] S. Modha, T. Mandl, G. K. Shahi, H. Madhu, S. Satapara, T. Ranasinghe, M. Zampieri,
     Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content
     Identification in English and Indo-Aryan Languages and Conversational Hate Speech, in:
     FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event, 13th-17th December
     2021, ACM, 2021.
 [2] T. Mandl, S. Modha, G. K. Shahi, H. Madhu, S. Satapara, P. Majumder, J. Schäfer, T. Ranas-
     inghe, M. Zampieri, D. Nandini, A. K. Jaiswal, Overview of the HASOC subtrack at FIRE
     2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Lan-
     guages, in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation,
     CEUR, 2021. URL: http://ceur-ws.org/.
 [3] S. Gaikwad, T. Ranasinghe, M. Zampieri, C. M. Homan, Cross-lingual offensive language
     identification for low resource languages: The case of marathi, in: Proceedings of RANLP,
     2021.
 [4] R. Rajalakshmi, Supervised term weighting methods for url classification, Journal of
     Computer Science 10 (2014) 1969–1976. doi:10.3844/jcssp.2014.1969.1976.
 [5] H. Watanabe, M. Bouazizi, T. Ohtsuki, Hate speech on twitter: A pragmatic approach to
     collect hateful and offensive expressions and perform hate speech detection, IEEE access 6
     (2018) 13825–13835.
 [6] R. Rajalakshmi, C. Aravindan, An effective and discriminative feature learning for url
     based web page classification, 2018. doi:10.1109/SMC.2018.00240.
 [7] R. Rajalakshmi, H. Tiwari, J. Patel, A. Kumar, R. Karthik., Design of kids-specific url
     classifier using recurrent convolutional neural network, Procedia Computer Science 167
     (2020) 2124–2131. doi:https://doi.org/10.1016/j.procs.2020.03.260.
 [8] V. Swaminathan, S. Arora, R. Bansal, R. Rajalakshmi, Autonomous driving system with road
     sign recognition using convolutional neural networks, in: 2019 International Conference
     on Computational Intelligence in Data Science (ICCIDS), 2019, pp. 1–4. doi:10.1109/
     ICCIDS.2019.8862152.
 [9] V. Ganganwar, R. Rajalakshmi, Implicit aspect extraction for sentiment analysis: A
     survey of recent approaches, Procedia Computer Science 165 (2019) 485–491. URL: https:
     //www.sciencedirect.com/science/article/pii/S1877050920300181. doi:https://doi.org/
     10.1016/j.procs.2020.01.010.
[10] S. Sivakumar, R. R, Analysis of sentiment on movie reviews using word embedding self-
     attentive lstm, 2021. doi:10.4018/IJACI.2021040103.
[11] T. Mandl, S. Modha, A. Kumar M, B. R. Chakravarthi, Overview of the hasoc track at fire
     2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english
     and german, in: Forum for Information Retrieval Evaluation, FIRE 2020, Association for
     Computing Machinery, New York, NY, USA, 2020, p. 29–32. URL: https://doi.org/10.1145/
     3441501.3441517. doi:10.1145/3441501.3441517.
[12] R. Rajalakshmi, B. Y. Reddy, Dlrg@hasoc 2019: An enhanced ensemble classifier for hate
     and offensive content identification, in: FIRE, 2019.
[13] R. Rajalakshmi, R. Agrawal, Borrowing likeliness ranking based on relevance factor,
     in: Proceedings of the Fourth ACM IKDD Conferences on Data Sciences, CODS ’17,
     Association for Computing Machinery, New York, NY, USA, 2017. URL: https://doi.org/10.
     1145/3041823.3067694. doi:10.1145/3041823.3067694.
[14] Rahul, V. Gupta, V. Sehra, Y. R. Vardhan, Hindi-english code mixed hate speech detection
     using character level embeddings, in: 2021 5th International Conference on Comput-
     ing Methodologies and Communication (ICCMC), 2021, pp. 1112–1118. doi:10.1109/
     ICCMC51019.2021.9418261.
[15] P. Mathur, R. Sawhney, M. Ayyar, R. Shah, Did you offend me? classification of offensive
     tweets in hinglish language, in: Proceedings of the 2nd Workshop on Abusive Language
     Online (ALW2), 2018, pp. 138–148.
[16] S. Kamble, A. Joshi, Hate speech detection from code-mixed hindi-english tweets using
     deep learning models, arXiv preprint arXiv:1811.05145 (2018).
[17] V. Pathak, M. Joshi, P. Joshi, M. Mundada, T. Joshi, Kbcnmujal@ hasoc-dravidian-codemix-
     fire2020: Using machine learning for detection of hate speech and offensive code-mixed
     social media text, arXiv preprint arXiv:2102.09866 (2021).
[18] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
     P. Prettenhofer, R. Weiss, V. Dubourg, J. VanderPlas, A. Passos, D. Cournapeau, M. Brucher,
     M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in python, CoRR abs/1201.0490
     (2012). URL: http://arxiv.org/abs/1201.0490. arXiv:1201.0490.
[19] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
     P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,
     M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine
     Learning Research 12 (2011) 2825–2830.