<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>India
* Corresponding author.
$ kirti@iiitranchi.ac.in (K. Kumari); jps@nitp.ac.in (J. P. Singh)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Machine Learning Approach for Hate Speech and Ofensive Content Identification in English and Indo Aryan Code-Mixed Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kirti Kumari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jyoti Prakash Singh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Information Technology Ranchi</institution>
          ,
          <addr-line>Ranchi, Jharkhand</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Technology Patna</institution>
          ,
          <addr-line>Patna, Bihar</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>In current times, social media is the most widely used platform, and everyone has the right to express their speculations, ideas and thoughts. In such a case, it is often seen that hate speech and ofensive contents are spreading like wildfire, making a detrimental impact on the world. It is important to identify and eradicate such ofensive content from social media. This paper is a contribution to the Hate Speech and Ofensive Content Identification in English and Indo-Aryan Languages (HASOC) 2022 shared task by the _ _ ℎ team. We experimented with machine learning models to detect hate speech and ofensive content in all three code-mixed languages English, German and Marathi as provided. Our experimental results show that a Logistic Regression, Support Vector Machine and Random Forest classifier can achieve good results for multilingual hate speech and ofensive content identification. Overall, our team participated on all the tasks and ranked 3,5ℎ7ℎ on Marathi C, Marathi B and Marathi A tasks respectively. Our team ranked 8ℎ and 9ℎ on ICHCL-Multiclass and ICHCL-Binary class shared tasks, respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;HASOC</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Logistic Regression</kwd>
        <kwd>Support Vector Machine</kwd>
        <kwd>Random Forest</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>People are voicing themselves through social media sites such as Twitter and Facebook, which
are user-friendly and easily available. People of various ages use all these sites to continue
sharing every detail of their daily life, filling them with personal data and which gives us
a huge pool of data. Every technology has advantages and disadvantages, and social media
platforms are no exception. The prevalence of hate speech and other ofensive and objectionable
information on the web has posed an enormous threat to society. Derogatory, hurtful, insulting,
or obscene language directed from one person to another person and also openly available to
others impairs the objectivity of conversations. As this kind of communication becomes more
prevalent online, disputes become more extreme. The democratic process may be threatened by
objectionable content. Open societies must also come up with an appropriate remedy to such
content that avoids enforcing strict censorship laws.</p>
      <p>
        Study of hate speech and abusive language Identification is gradually gaining momentum,
mostly as a result of the aggregation of numerous shared tasks [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ]. The Hate Speech
and Ofensive Content Identification in English and Indo-Aryan Languages (HASOC) 2022 is
also continuation and addition to previous shared task. This has prompted many social media
networks to scrutinize what people are posting. As a result, techniques to detect questionable
posts automatically become essential.
      </p>
      <p>The current HASOC 2019, HASOC 2020, HASOC 2021 and HASOC 2022 shared tasks given
the opportunity for the researchers to cope with code-mixing and script mixing diferent
multilingual Indian languages.</p>
      <p>In this work, we tried machine learning approach for the HASOC 2022 all the shared tasks.
We tried for all the given tasks which are in code-mixed of English, Hindi and Marathi languages
and achieved the good results by the team _ _ ℎ.</p>
      <p>The rest of the paper organized as follows: Section 2 discuss a brief about related works done
in the area of Hate and Ofensive language identification. Section 3 discussed the dataset as well
as task description. Section 4 provides the detail about proposed approach. Section 5 presents
the results and finding of our work. Finally, we concluded in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Automatic Hate Speech and Ofensive language identification is an active area from the Natural
Language Processing (NLP) research community [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. The wide range of interrelated
previous works have been done in this field [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] but early works on these fields are mainly for
mono-lingual English language. Recently some of the shared tasks are focused on code-mixed
and multilingual regional languages [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref7 ref8 ref9">1, 2, 3, 7, 8, 9</xref>
        ]. In the above mentioned shared tasks
were tried to address the multilingual problems on automated identification of Hate Speech,
Aggression and Ofensive languages. The HASOC 2019 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and HASOC 2020 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] shared tasks
are focused on three languages: English, Hindi and German with similar tasks as current task.
Next, HASOC 2021 [
        <xref ref-type="bibr" rid="ref10 ref3">3, 10</xref>
        ] added a one more Marathi language; which is similar to Hindi, spoken
by millions of Indian people and also added the one more task: Conversational Hate Speech
detection [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The TRAC 2018 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], TRACK 2020 and TRAC 2022[
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] Some of the potential
works in Hate Speech and Aggression identification areas are [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15">12, 13, 14, 15</xref>
        ]. In this work, we
also tried to address same issue using machine learning approach, which discussed in subsequent
sections. The wide range of interrelated previous works have been done in this field [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] but
early works on these fields are mainly for mono-lingual English language. Recently some of the
shared tasks are focused on code-mixed and multilingual regional languages [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref7 ref8 ref9">1, 2, 3, 7, 8, 9</xref>
        ]. In
the above mentioned shared tasks were tried to address the multilingual problems on automated
identification of Hate Speech, Aggression and Ofensive languages. The HASOC 2019 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
HASOC 2020 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] shared tasks are focused on three languages: English, Hindi and German with
similar tasks as current task. Next, HASOC 2021 [
        <xref ref-type="bibr" rid="ref10 ref3">3, 10</xref>
        ] added a one more Marathi language;
which is similar to Hindi, spoken by millions of Indian people and also added the one more task:
Conversational Hate Speech detection [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The TRAC 2018 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], TRACK 2020 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and TRAC
2022 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] are focused on diferent types of aggression detection in multilingual scenarios.
      </p>
      <p>
        Some of the potential works in Hate Speech and Aggression identification areas are [
        <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 13, 14</xref>
        ]
      </p>
      <p>Class
HOF
NOT
NONE
SHOF
CHOF
NOT
OFF
NONE
UNT</p>
      <p>
        TIN
NONE
IND
GPR
OTH
are applied deep learning approaches with diferent embedding techniques and achieved good
results. A recent work on aggression identification [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] utilized the machine leaning approach
and ranked first position on TRAC 2022 shared task. So, motivating with the work [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], we tried
machine learning approach to tackle HASOC 2022 shared tasks in this work, which discussed
in subsequent sections.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>The HASOC 2022 shared task1 has two main tasks which are:
• Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL)
• Ofensive Language Identification in Marathi</p>
      <p>Identification of Conversational Hate-Speech in Code-Mixed Languages : this task has
two subtasks: Subtask 1 and Subtask 2. Subtask 1 is about Binary classification and these include
two categories Hate Speech and Ofensive (HOF) and Not Hate Speech and Ofensive (NOT).
The comment includes the Hindi and English (Hinglish) as well as the German Languages
words. Subtask 2 is about multiclass problems and contains Non-Hate (NONE), Contextual Hate
(CHOF) and Standalone Hate(SHOF). Includes only the Hinglish Language in the Dataset.</p>
      <p>Ofensive Language Identification in Marathi : this has three subtasks Task 3A, Task 3B
and Task 3C. Task 3A contains NOT and OFF named classes. Task 3B contains TIN (targeted
insult) and UNT (untargeted insult) classes. Task 3C contains IND (individual), GRP (group)
and OTH (others) classes.</p>
      <p>The detail distribution of samples for each tasks can be seen in Table1.</p>
      <p>
        More details about the all shared tasks and datasets used can be seen in [
        <xref ref-type="bibr" rid="ref10 ref11 ref16 ref17 ref18 ref19 ref20 ref4">16, 4, 17, 18, 10, 11,
19, 20</xref>
        ].
1https://hasocfire.github.io/hasoc/2022/dataset.html
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>This section describes the methods used in this work on the given HASOC 2022 shared tasks
2. The subsequent content describes the approach used for the further classification of hate
speech into diferent categories as explained in the previous section. We begin by explaining
the each steps of the dataset for each of the three languages followed by the machine learning
models used.</p>
      <sec id="sec-4-1">
        <title>4.1. Preprocessing</title>
        <p>The preprocessing of text data for three languages has been done in the following ways. For
the Hinglish language, we first converted the texts to lowercase, and texts such as URLs and
punctuation symbols. Stemming was done on the dataset using ‘SnowballStemmer’. Every tweet
had comments and replies and every comment and tweet is to be predicted, so the comments
and the replies were padded with the original tweet so the correct meaning of the tweets and
comments is revealed. All the sentences in Marathi were lemmatized. Lemmatization is a
part of stemming, stemming truncates the words harshly, but lemmatization keeps the word
meaningful. All the emojis were removed from the sentences using regular expressions.</p>
        <p>Some other preprocessing were done as: Stopwords removal, Stemming and Tweets processing
are discussed in the following subsections.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Removal of Stopwords</title>
          <p>The stopwords of the English language and Hindi languages are removed from the dataset. Our
observation on this dataset that during ofensive language detection, stopwords do not play any
important role. So, we removed here.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Stemming</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.1.3. Tweets Processing</title>
          <p>We used the stemmer to stem from the root word, which increased the eficiency of the model
greatly.</p>
          <p>Every tweet had comments and replies, so the comments and the replies were padded with the
original tweet so the correct meaning of the sentences is revealed. We used TF-IDF-Vectorizer
to minimize the running time of the code. We used a train test split to split the training data as
80:20 ratio for our validation phase.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Models Used</title>
        <p>We tried diferent types of machine learning classifiers such as Support Vector Machine, Logistic
Regression, Multinomial Naïve Bayes, Decision Tree and Random Forest. We found that Random
2https://hasocfire.github.io/hasoc/2022/ichcl.html
Class
NONE
SHOF
SHOF</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Model selection</title>
        <p>The tasks which had binary classification problem, logistic regression gave the better result,
as the Sigmoid function used in the logistic regression function which predicts zero or one.
The dataset was also linearly separable into two classes, which was also a reason why Logistic
Regression performed so well in our experimentations. For other two datasets, Random Forest
worked better as data was high dimensional data and Random Forest works with subsets of data.
It is faster to train than Decision Trees because we are working only on a subset of features in
this model, so we can easily work with hundreds of features. Those tasks which had more than
two classes Support Vector Machine performed very good in those cases as there was a clear
separation between the classes, and the dataset was suficient large to train the model.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <p>In this section we presented our experimental results as well as analysis of our models.</p>
      <p>Before the organisers made the test set accessible, we assessed the performance of our
suggested models using validation data (20% of random data taken from training set of each
shared task). We used the aforementioned validation data to develop the model when unlabeled
test data was released, and final predictions are made utilising such models.</p>
      <p>Our experimental results shown in the Table 2 and Table 3 for Task 1 (ICHCL-Binary Task)
and Task 2 (ICHCL-Multiclass Task). We found that Random Forest is performing better for
Task 1 and Support Vector Machine for Task 2 from the models that we experimented with.</p>
      <p>Our further experimental results shown in the Table 4, Table 5 and Table 6 for Task 3A, Task
3B and Task 3C, respectively. We found that Logistic Regression is performing best for Task 3A
and Task 3B and Support Vector Machine for Task 3C.
Class
NaN
TIN
UNT
Class
NaN
IND
GRP
OTH</p>
      <p>LR</p>
      <p>RF</p>
      <p>For each tasks, we present the results for the evolution of experimented models and the final
model of each shared tasks. The observations are analyzed and compared in greater detail and
after that the best model was submitted based on a comparison of our model’s performance. A
summary of the results for each of the tasks are evaluated using the average macro F1-Score.
The best three models results can be seen in Table 7 and Table 8 on validation data and testing
data, respectively. In Table 8, blank shows that we have missed the submission on test data
for that specific classifier due to lack of time. We can observed from the Table 7 and Table 8,
Random Forest classifier is performing better for ICHCL-Binary task, Support Vector Machine
for ICHCL-Multiclass and 3C- Marathi tasks and Logistic Regression for 3A Marathi and 3B
Marathi tasks that we experimented with.</p>
      <p>The reason for Random Forest classifier out-performing the other algorithms on binary class
problem is because it ofers us relative feature importance which allows us to select the most
contributing features. The dificulty faced during experimentation’s that we were not able to
properly pre-process the German data of ICHCL tasks due to lack of resources and lack of time.
The dificulty faced for Marathi tasks that we have not able to pre-process some parts such as
stopwords removal could be done for the Marathi language which led to low F1 Scores.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this work we, _ _ ℎ team participated on all the shared tasks as very few
teams participated on all the shared tasks. Here, we have presented our machine learning
approach to address all the five diferent shared tasks of HASOC 2022. We found that Logistic
Regression, Support Vector Machine and Random Forest classifiers are performing better in
our case of experiments. Overall, our top models ranked 3,5ℎ and 7ℎ on Marathi C, Marathi
B and Marathi A tasks, respectively. Our team ranked 8ℎ and 9ℎ on ICHCL-Multiclass and
ICHCL-Binary class shared tasks, respectively.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We are thank full to our undergraduate students Ayush Kumar Singh and Mrinmoy Mahato
for their help in prepossessing steps. A very special thanks to academics and Management of
Indian Institute of Information Technology Ranchi for providing the necessary resources and
encouragement.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mandlia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>Overview of the hasoc track at fire 2019: Hate speech and ofensive content identification in indo-european languages</article-title>
          ,
          <source>in: Proceedings of the 11th forum for information retrieval evaluation</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mandalia</surname>
          </string-name>
          ,
          <article-title>Detecting and visualizing hate speech in social media: A cyber watchdog for surveillance</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>161</volume>
          (
          <year>2020</year>
          )
          <fpage>113725</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zampieri, Overview of the hasoc subtrack at fire 2021: Hate speech and ofensive content identification in english and indo-aryan languages and conversational hate speech</article-title>
          ,
          <source>in: Forum for Information Retrieval Evaluation</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          , K. North,
          <string-name>
            <given-names>D.</given-names>
            <surname>Premasiri</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zampieri, Overview of the HASOC subtrack at FIRE 2022: Ofensive Language Identification in Marathi</article-title>
          , in: Working Notes of FIRE 2022 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <article-title>A survey on hate speech detection using natural language processing</article-title>
          ,
          <source>in: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media</source>
          , Association for Computational Linguistics, Valencia, Spain,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://aclanthology.org/W17-1101. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W17</fpage>
          -1101.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <article-title>A survey on automatic detection of hate speech in text, ACM Computing Surveys (CSUR) 51 (</article-title>
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Reganti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          , T. Maheshwari,
          <article-title>Aggression-annotated Corpus of HindiEnglish Code-mixed Data</article-title>
          , in: N. C. C. chair), K. Choukri,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Declerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hasida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Isahara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mazo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Odijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Piperidis</surname>
          </string-name>
          , T. Tokunaga (Eds.),
          <source>Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ),
          <article-title>European Language Resources Association (ELRA), Miyazaki</article-title>
          , Japan,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Ojha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <article-title>Evaluating aggression identification in social media</article-title>
          ,
          <source>in: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, European Language Resources Association (ELRA)</source>
          , Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .trac-
          <volume>1</volume>
          .1.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ratan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Nandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. N.</given-names>
            <surname>Devi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhagat</surname>
          </string-name>
          , Y. Dawer, b. lahiri, A.
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ojha</surname>
          </string-name>
          ,
          <source>The comma dataset v0</source>
          .
          <article-title>2: Annotating aggression and bias in multilingual social media discourse</article-title>
          ,
          <source>in: Proceedings of the Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>4149</fpage>
          -
          <lpage>4161</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>441</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Modha</surname>
          </string-name>
          , Sandip and Mandl, Thomas and Shahi, Gautam Kishore and
          <article-title>Madhu, Hiren and Satapara, Shrey and Ranasinghe, Tharindu and Zampieri, Marcos, Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Ofensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech</article-title>
          , in: FIRE 2021:
          <article-title>Forum for Information Retrieval Evaluation, Virtual Event</article-title>
          ,
          <fpage>13th</fpage>
          -17th
          <source>December</source>
          <year>2021</year>
          , CEUR,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          , Shrey,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC Subtrack at FIRE 2021: Conversational Hate Speech Detection in Code-mixed language</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kumari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <source>Ai ml nit patna at hasoc</source>
          <year>2019</year>
          :
          <article-title>Deep learning approach for identification of abusive content</article-title>
          .,
          <source>FIRE (working notes)</source>
          <volume>2517</volume>
          (
          <year>2019</year>
          )
          <fpage>328</fpage>
          -
          <lpage>335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kumari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Ai_ml_nit_patna@ hasoc 2020: Bert models for hate speech identification in indo-european languages</article-title>
          .,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>319</fpage>
          -
          <lpage>324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kumari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>AI_</surname>
          </string-name>
          <article-title>ML_NIT_Patna @ TRAC - 2: Deep learning approach for multilingual aggression identification</article-title>
          ,
          <source>in: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, European Language Resources Association (ELRA)</source>
          , Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>113</fpage>
          -
          <lpage>119</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .trac-
          <volume>1</volume>
          .
          <fpage>18</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kumari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Srivastav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Suman</surname>
          </string-name>
          ,
          <article-title>Bias, threat and aggression identification using machine learning techniques on multilingual comments</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC</source>
          <year>2022</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Gyeongju, Republic of Korea,
          <year>2022</year>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>36</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .trac-
          <volume>1</volume>
          .4.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Satapara</surname>
          </string-name>
          , Shrey and Majumder, Prasenjit and Mandl, Thomas and Modha, Sandip and Madhu, Hiren and Ranasinghe, Tharindu and Zampieri, Marcos and North, Kai and Premasiri, Damith,
          <source>Overview of the HASOC Subtrack at FIRE</source>
          <year>2022</year>
          :
          <article-title>Hate Speech and Ofensive Content Identification in English and Indo-Aryan Languages</article-title>
          , in: FIRE 2022:
          <article-title>Forum for Information Retrieval Evaluation, Virtual Event</article-title>
          ,
          <fpage>9th</fpage>
          -13th
          <source>December</source>
          <year>2022</year>
          , ACM,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC Subtrack at FIRE 2022: Identification of Conversational Hate-Speech in HindiEnglish Code-Mixed and German Language</article-title>
          , in: Working Notes of FIRE 2022 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chaudhari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaikwad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paygude</surname>
          </string-name>
          ,
          <article-title>Predicting the type and target of ofensive social media posts in marathi</article-title>
          ,
          <source>Social Network Analysis and Mining</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <article-title>77</article-title>
          . URL: https://doi.org/10.1007/s13278-022-00906-8. doi:
          <volume>10</volume>
          . 1007/s13278-022-00906-8.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC subtrack at FIRE 2021: Hate Speech and Ofensive Content Identification in English and Indo-Aryan Languages</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Gaikwad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Homan</surname>
          </string-name>
          ,
          <article-title>Cross-lingual ofensive language identification for low resource languages: The case of Marathi</article-title>
          ,
          <source>in: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP</source>
          <year>2021</year>
          ), INCOMA Ltd.,
          <string-name>
            <surname>Held</surname>
            <given-names>Online</given-names>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>437</fpage>
          -
          <lpage>443</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . ranlp-
          <volume>1</volume>
          .
          <fpage>50</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>