<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>at FIRE 2023: Hate-Speech Identification in Sinhala and Gujarati</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shrey Satapara</string-name>
          <email>shreysatapara@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hiren Madhu</string-name>
          <email>hirenmadhu16@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tharindu Ranasinghe</string-name>
          <email>t.ranasinghe@aston.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alphaeus Eric Dmonte</string-name>
          <email>admonte@gmu.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcos Zampieri</string-name>
          <email>marcos.zampieri@rit.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavan Pandya</string-name>
          <email>pavanpandya1311@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nisarg Shah</string-name>
          <email>nisarg0606@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sandip Modha</string-name>
          <email>sjmodha@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prasenjit Majumder</string-name>
          <email>p_majumder@daiict.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Mandl</string-name>
          <email>mandl@uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff8">8</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Evaluation, Benchmark</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aston University</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>DA-IICT</institution>
          ,
          <addr-line>Gandhinagar</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>George Mason University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Hate Speech, Social NLP, Social Media</institution>
          ,
          <addr-line>Language Resource, Deep Learning, Low-Resource Language</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Indian Institute of Science</institution>
          ,
          <addr-line>Bangalore</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Indian Institute of Technology</institution>
          ,
          <addr-line>Hyderabad</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>Indiana Bloomington University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff7">
          <label>7</label>
          <institution>LDRP-ITR</institution>
          ,
          <addr-line>Gandhinagar</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff8">
          <label>8</label>
          <institution>University of Hildesheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>5</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>Detecting ofensive and hateful content in low-resource languages poses a significant challenge due to the limited availability of benchmark datasets. It is crucial to address this gap by creating benchmark datasets tailored to these languages. This not only enhances the accuracy of detection but also provides valuable insights into the eficacy of identifying problematic content in comparison to high-resource languages. In line with this commitment to advancing research on low-resource languages, the Hate Speech and Ofensive Content Identification (HASOC) shared task introduced a dedicated subtrack for Hate Speech Identification in Sinhala and Gujarati in 2023. This paper outlines the objectives of the task, discusses the characteristics of the data involved, and presents an analysis of the participants' submissions. For Task 1a, we utilized an existing Sinhala dataset (SOLD) consisting of 10,000 tweets. Meanwhile, for Task 1b, focused on Gujarati, we curated a new dataset comprising 1,020 tweets. A total of 16 teams submitted experiments for Sinhala, with the leading team achieving an impressive F1 score of 0.83. In the case of the Gujarati task, 17 teams participated, and the highest-performing team achieved an F1 score of 0.84. These results highlight the significance of tailored datasets in facilitating the efective detection of ofensive content in low-resource languages.</p>
      </abstract>
      <kwd-group>
        <kwd>Gujarati</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org
of the
CEUR
Workshop
Proceedings</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Hate speech is a global problem that plagues social media platforms in many countries [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Hate speech can ultimately also lead to violent hate crimes [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Consequently, detection and
moderation are necessary to maintain a rational discourse that allows an exchange of arguments.
Reduced eforts in content moderation can lead to the proliferation of hate speech, as the case
of Twitter has shown recently [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        The initiative Hate Speech and Ofensive Content Identification (HASOC) has organized
shared tasks since 2019 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and created resources for several languages. The eforts for the
creation of language resources for low-resource languages are of special importance. Research
needs to analyze which resources are beneficial for such languages. Is it better to develop
specific resources, or is it better to use resources from high-resource languages like English
and exploit this knowledge for a low-resource context (e.g., by translating content or transfer
learning between languages) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Many ofensive language detection benchmarks are available for English and other
highresource languages. However, in the last few years, the NLP community has focused on creating
more datasets for low-resource languages such as Marathi [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Oromo [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Swahili [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] Greek
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Danish [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and Albanian [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Supporting this, the last two editions of HASOC contained
shared tasks on identifying ofensive language in Marathi.
      </p>
      <p>In 2023, the HASOC subtask 1 focused on identifying hate speech, ofensive language, and
profanity in Sinhala and Gujarati. Sinhala is a low-resource Indo-Aryan language spoken by
around 16 million people, mainly in Sri Lanka. Gujarati is also a low-resource Indo-Aryan
language spoken by approximately 50 million people, mainly in North-Western India.</p>
      <p>
        Task 1A deals with identifying hate and ofensive content in Sinhala. The task involves
classifying tweets into Hate and Ofensive (HOF) or Non-Hate and Ofensive (NOT). The dataset
for this task is based on the Sinhala Ofensive Language Detection dataset (SOLD) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Task
1B focuses on identifying hate and ofensive content in Gujarati, which was similar to task 1A,
where the participants need to classify tweets into HOF or NOT categories. We created a new
dataset for Gujarati with 1020 annotated tweets. More details about the dataset is available in
Section 2.
      </p>
      <p>Overall, both tasks were highly successful and gained attention of the NLP community.
The interest demonstrated last year continued this year, too, with 16 teams participating in
the Sinhala task and 17 teams participating in the Gujarati task. Furthermore, it should be
highlighted that this is the first-ever shared task organised for Sinhala. We believe that this
shared task would open many research avenues for low-resource languages like Sinhala and
Gujarati.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Data</title>
      <sec id="sec-3-1">
        <title>2.1. Sinhala dataset</title>
        <p>
          The data used for subtask 1A are from the Sinhala Ofensive Language Dataset: SOLD1 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The
dataset consists of 10,000 annotated Twitter posts aimed to detect Sinhala ofensive text. The
1https://huggingface.co/datasets/sinhala-nlp/SOLD
dataset has two splits, the training and test sets, containing 7500 and 2500 tweets, respectively.
The initial dataset consisted of two annotation levels, which were sentence and token-level
annotations. They have followed the OLID [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] Task A annotation for sentence level annotations,
which we utilized for our subtask 1A. The original paper demonstrates 0.7 - 0.8 Fleiss’ Kappa
Inter Annotator Agreement to this dataset. Class distribution of the dataset is shown in Table 1.
        </p>
        <sec id="sec-3-1-1">
          <title>Class HOF NOT</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Train</title>
          <p>3176
4324</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Gujarati dataset</title>
        <p>We created a new Gujarati ofensive language detection dataset for subtask 1B. We used the
Sinhala dataset from subtask 1A to create the dataset. We first collected all the unique ofensive
tokens from the Sinhala dataset. These tokens were then automatically translated to Gujarati.
From the translations, we manually selected 45 tokens. We also collected ofensive tokens from
various websites (eg: https://www.youswear.com/index.asp?language=Gujarati ) and manually
selected the ones that were appropriate for our problem statement. We then used an in-house
web scraper to scrape the tweets using those keywords.</p>
        <p>We present the dataset statistics for the Gujarati dataset in Table 2. As we can see, we
only provide the participants with 200 labelled text samples. This was done to encourage
participants to develop innovative techniques in Zero-Shot and Few-Shot learning that make
use of high-resource datasets. For the annotations, the inter-annotator agreement was 0.7474.</p>
        <sec id="sec-3-2-1">
          <title>Class</title>
          <p>HOF
NOT</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Total</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>Train 100 100 200</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Results</title>
      <p>The results for Subtask 1A are presented in Table 3. A total of 52 systems were submitted
from 16 teams. The best-performing system by each team is displayed in table 3, ranked by
their F1 scores. The performance of the top-5 teams was very similar. Most of the top teams
utilized pre-trained transformer models that support Sinhala, such as XLM-R. Several teams
used sentence embeddings such as SBERT and LABSE in their experiments. Interestingly, some
teams used mBERT, which is not trained on Sinhala text but could also achieve mid-table finishes.
Team FiRC-NLP had the best performing system, with an F1 score of 0.8382, followed by ”Krispy
Mango” and ”AiAlchemist”, with F1 scores of 0.8371 and 0.8355, respectively. The last-ranked
team had an F1 score of 0.5574.</p>
      <p>Table 4 presents the results of Subtask 1B. Notably, a total of 54 submissions were received
from 17 diferent teams. The team ”FiRC-NLP” achieved the highest F1 score (0.8488) with their
submission named ”no-kfold,” demonstrating excellent precision (0.8392) and recall (0.8638)
by fine-tuning XLM-RoBERTa large checkpoint. Following closely, ”SATLab” secured the
second position with their submission ”HasocT1bR4,” earning an F1 score of 0.8383, by learning
character level ngrams with classical machine learning classifiers. The team ”Krispy Mango”
by fine-tuning XLM-RoBERTa, ranked third with an F1 score of 0.7956. The table provides an
insightful overview of the competition results, highlighting the strong performance of many
teams and their respective submissions.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion and Future Work</title>
      <p>We presented the results of HASOC 2023 Task 1, which featured datasets in two low-resource
Indo-Aryan languages; Sinhala and Gujarati. A total of 16 teams submitted experiments for
Sinhala and 17 teams participated for Gujarati. The wide participation in the task allowed us
to compare a number of approaches. We observed that the best systems for both languages
used pre-trained transformers that support Sinhala and Gujarati, such as XLM-R and mBERT.
Furthermore, since Gujarati only contained a limited number of training instances, several
teams utilised cross-lingual transfer learning to improve their performance. Despite being
lowresource language, top teams produced competitive results that are comparable to high-resource
languages.</p>
      <p>We plan to extend the task in several ways. First, we plan to organize an ofensive spans
detection task for these two languages that will improve the explainability of the ofensive
language detection models. Secondly, we hope to add more Indo-Aryan languages that are
less researched in the NLP community. HASOC 2023 is the first-ever shared task organised
for Sinhala and one of the few shared tasks organized for Gujarati. However, we believe that
in light of HASOC 2023, many shared tasks will be created for these languages in the future,
improving the involvement of NLP researchers in these low-resource languages.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments References</title>
      <p>This work was partially supported by a grant from the Artificial Intelligence Journal (AIJ) for
sponsoring IA research (28th call for sponsorship).
[15] M. K. Sathya, K. Gopalakrishnan, M. PA, P. Balasundaram, Sinhala and gujarati hate speech
detection, in: Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation,
CEUR, 2023.
[16] C. Muhammad Awais, J. Raj, Breaking Barriers: Multilingual Toxicity Analysis for Hate
Speech and Ofensive Language in Low-Resource Indo-Aryan Languages, in: Working
Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.
[17] Y. Bestgen, Using Only Character Ngrams for Hate Speech and Ofensive Content
Identiifcation in Five Low-Ressource Languages, in: Working Notes of FIRE 2023 - Forum for
Information Retrieval Evaluation, CEUR, 2023.
[18] N. Narayan, M. Biswal, P. Goyal, A. Panigrahi, Hate Speech and Ofensive Content
Detection in Indo-Aryan Languages: A Battle of LSTM and Transformers, in: Working
Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.
[19] M. Rostamkhani, S. Eetemadi, Detecting hate speech and ofensive content in english
and indo-aryan texts, in: Working Notes of FIRE 2023 - Forum for Information Retrieval
Evaluation, CEUR, 2023.
[20] M. D. M. Qureshi, M. Sawant, M. A. Qureshi, W. Rashwan, A. Younus, S. Caton, Hate
speech classification for sinhalese and gujarati, in: Working Notes of FIRE 2023 - Forum
for Information Retrieval Evaluation, CEUR, 2023.
[21] S. G GNANA, A. Venkatesh, K. N, O. M, B. V. A, P. Balasundaram, Enhancing hate speech
detection in sinhala and gujarati: Leveraging bert models and linguistic constraints, in:
Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.
[22] S. Chanda, A. Dhaka, S. Pal, Crossing borders: Multilingual hate speech detection, in:</p>
      <p>Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.
[23] G. Kalita, E. Halder, C. Taparia, A. Vetagiri, D. P. Pakray, Examining Hate Speech Detection
Across Multiple Indo-Aryan Languages in Tasks 1 &amp; 4, in: Working Notes of FIRE 2023
Forum for Information Retrieval Evaluation, CEUR, 2023.
[24] S. Agustian, Z. Idhafi, A. F. Rihardi, Improving detection of hate speech, ofensive language
and profanity in short texts with svm classifier, in: Working Notes of FIRE 2023 - Forum
for Information Retrieval Evaluation, CEUR, 2023.
[25] O. E. Ojo, O. O. Adebanji, H. Calvo, A. Gelbukh, A. Feldman, G. SIDOROV, Hate and
ofensive content identification in indo-aryan languages using transformer-based models,
in: Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation, CEUR, 2023.
[26] A. Joshi, R. Joshi, Harnessing Pre-Trained Sentence Transformers for Ofensive Language
Detection in Indian Languages, in: Working Notes of FIRE 2023 - Forum for Information
Retrieval Evaluation, CEUR, 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Di Fátima</surname>
          </string-name>
          ,
          <article-title>Hate Speech on Social Media: A Global Approach</article-title>
          , LabCom Books &amp; EdiPUCE, Covilhã, Portugal,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .25768/
          <fpage>654</fpage>
          - 916- 9.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schwarz</surname>
          </string-name>
          ,
          <article-title>From hashtag to hate crime: Twitter and antiminority sentiment</article-title>
          ,
          <source>American Economic Journal: Applied Economics</source>
          <volume>15</volume>
          (
          <year>2023</year>
          )
          <fpage>270</fpage>
          -
          <lpage>312</lpage>
          . doi:
          <volume>10</volume>
          .1257/app. 20210211.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hickey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fessler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Smaldino</surname>
          </string-name>
          , G. Muric,
          <string-name>
            <given-names>K.</given-names>
            <surname>Burghardt</surname>
          </string-name>
          ,
          <article-title>Auditing Elon Musk's impact on hate speech and bots</article-title>
          ,
          <source>in: Proceedings of the International AAAI Conference on Web and Social Media</source>
          , volume
          <volume>17</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>1133</fpage>
          -
          <lpage>1137</lpage>
          . doi:
          <volume>10</volume>
          .1609/icwsm. v17i1.
          <fpage>22222</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mandalia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC track at FIRE 2019: Hate Speech and Ofensive Content Identification in IndoEuropean Languages</article-title>
          , in: P. Majumder,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gangopadhyay</surname>
          </string-name>
          , P. Mehta (Eds.),
          <source>FIRE '19: Forum for Information Retrieval Evaluation</source>
          , Kolkata, India, December,
          <year>2019</year>
          , ACM,
          <year>2019</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>17</lpage>
          . URL: https://doi.org/10.1145/3368567.3368584. doi:
          <volume>10</volume>
          .1145/3368567.3368584.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. U.</given-names>
            <surname>Arshad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Beg</surname>
          </string-name>
          , W. Shahzad,
          <article-title>Uhated: hate speech detection in urdu language using transfer learning</article-title>
          ,
          <source>Lang. Resour. Evaluation</source>
          <volume>57</volume>
          (
          <year>2023</year>
          )
          <fpage>713</fpage>
          -
          <lpage>732</lpage>
          . URL: https://doi.org/10.1007/s10579-023-09642-7. doi:
          <volume>10</volume>
          .1007/s10579-023-09642-7.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Gaikwad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Homan</surname>
          </string-name>
          ,
          <article-title>Cross-lingual Ofensive Language Identification for Low Resource Languages: The Case of Marathi</article-title>
          , in: G. Angelova,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kunilovskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mitkov</surname>
          </string-name>
          , I. Nikolova-Koleva (Eds.),
          <source>Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP</source>
          <year>2021</year>
          ), Held Online,
          <fpage>1</fpage>
          -
          <lpage>3September</lpage>
          ,
          <year>2021</year>
          ,
          <string-name>
            <given-names>INCOMA</given-names>
            <surname>Ltd</surname>
          </string-name>
          .,
          <year>2021</year>
          , pp.
          <fpage>437</fpage>
          -
          <lpage>443</lpage>
          . URL: https://aclanthology. org/
          <year>2021</year>
          .ranlp-
          <volume>1</volume>
          .
          <fpage>50</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N. B.</given-names>
            <surname>Defersha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Abawajy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kekeba</surname>
          </string-name>
          ,
          <article-title>Deep learning based multilabel hateful speech text comments recognition and classification model for resource scarce ethiopian language: The case of afaan oromo</article-title>
          ,
          <source>in: IEEE International Conference on Current Development in Engineering and Technology (CCET)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . doi:
          <volume>10</volume>
          .1109/CCET56606.
          <year>2022</year>
          .
          <volume>10080837</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ombui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Muchemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wagacha</surname>
          </string-name>
          ,
          <article-title>Building and annotating a codeswitched hate speech corpora</article-title>
          ,
          <source>Int. J. Inf. Technol. Comput. Sci</source>
          <volume>3</volume>
          (
          <year>2021</year>
          )
          <fpage>33</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pitenis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , T. Ranasinghe, Ofensive Language Identification in Greek,
          <source>in: Proceedings of The 12th Language Resources and Evaluation Conference</source>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2020</year>
          , Marseille, France, May
          <volume>11</volume>
          -16,
          <year>2020</year>
          ,
          <string-name>
            <given-names>European</given-names>
            <surname>Language Resources Association</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>5113</fpage>
          -
          <lpage>5119</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .lrec-
          <volume>1</volume>
          .629/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G. I.</given-names>
            <surname>Sigurbergsson</surname>
          </string-name>
          , L. Derczynski,
          <article-title>Ofensive Language and Hate Speech Detection for Danish</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Béchet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Blache</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cieri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Goggi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Isahara</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mazo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>Proceedings of The 12th Language Resources and Evaluation Conference</source>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2020</year>
          , Marseille, France, May
          <volume>11</volume>
          -16,
          <year>2020</year>
          ,
          <string-name>
            <given-names>European</given-names>
            <surname>Language Resources Association</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>3498</fpage>
          -
          <lpage>3508</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .lrec-
          <volume>1</volume>
          .430/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kaziaj</surname>
          </string-name>
          , FUELLING hate:
          <article-title>Hate speech towards women in online news websites in Albania, in: Gender and Sexuality in the European Media</article-title>
          , Routledge,
          <year>2021</year>
          , pp.
          <fpage>100</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Anuradha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Premasiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hettiarachchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Uyangodage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , Sold:
          <article-title>Sinhala ofensive language dataset</article-title>
          ,
          <source>arXiv preprint arXiv:2212.00851</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Farra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Predicting the type and target of ofensive posts in social media, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics</article-title>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>1415</fpage>
          -
          <lpage>1420</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1144.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Jahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Aransa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bouchekif</surname>
          </string-name>
          ,
          <article-title>Multilingual Hate Speech Detection Using Ensemble of Transformer Models</article-title>
          , in: Working Notes of FIRE 2023 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>